Lest I Forget: International Conference on Web Engineering - Day 1 Research Tracks

This past week I took part in the International Conference on Web Engineering or ICWE for short. I’d like to round up the lectures I attended, demos I watched and posters I saw. While I still remember, I want to write about my impressions about each individual one, at least at the very shortest. Of course I won’t be able to cover it all, because I wasn’t able to attend every session for obvious reasons and at the same time I didn’t find everything equally interesting. I’ll do my best nonetheless.

Day 1

Throughout the three conference days I mostly attended the Research Tracks. They contain a whole lot of knowledge, new ideas and ingenious tools. Here’s what caught my attention on the first day.

icwe program cover

The keynote - Quo vadis Google Knowledge Graph

The conference started with a talk by Xin Luna Dong about the Google Knowledge Graph. It was very interesting to see how Google employs a data model similar in concept to RDF to build it’s massive knowledge base, how Freebase is currently being replaced by Google Knowledge Vault and learn about (Lightweight) Verticals, which are one how Google collects its data. However, as one listener points out:

CTAT: Tilt-and-Tap Across Devices

This interesting work by Linda Di Geronimo, Maria Husmann, Abhimanyu Patel, Can Tuerk and Moira C. Norrie was awarded the Best Paper Award for good reason. During the talk we could watch what great interactions can be achieve by communicating multiple mobile device equipped with accelerometers. That has some proper SCI-FI potential. And just thing about all the games. Well, unfortunately we’ve heard that lagging would be too great for dynamic real-time interaction, but some less intensive activities? After all we all have a universal Wii-like controller in our pockets. It just so happens that it can take calls.

Revisting Web Data Extraction using In-Browser Structural Analysis and Visual Cues in Modern Web Designs

In his presentation, Alfonso Murolo showcased his Chrome extension called DeepDesign, which uses some state-of-the-art techniques to aid extracting (scraping) data from websites. The extension takes advantage of structural and visual cues to automate the creation of wrappers for extracting data from web pages. I only wish that the extension was made public and included a crawler so that entire websites can be scraped. Currently it requires manual action.

Unfortunately, currently trying DeepDesign won’t be possible.

Clustering-Aided Page Object Generation for Web Testing

First day, and yet another award - for best student paper. In this presentation the audience was introduced to APOGEN or Automatic Page Objects Generator. It is a Java tool, which crawls a website to create Page Objects for its pages. However it does by combining multiple instances of the same page into clusters. It allows correcting these clusters with a simple UI. That UI even shows small renderings of crawled pages. There is so much usefulness in this tool and it is already available as open source (just why SVN?).

How cool is that? If I ever hear that scientific conferences don’t have anything practical offer to I will slap that person across their face :wink:.

QwwwQ: Querying Wikipedia without writing queries

This is another cool Chrome extension I’m very excited about, which unfortunately is not available publicly just yet. QwwwQ (pron. quick) is an ingenious tool for querying DBpedia in a way I would describe as a mix of query-by-example and maybe faceted search. It would allow non-technical users to explore the wealth of data stored on wikipedia and also help developers build SPARQL queries for DBpedia with a nice GUI instead of a text editor. In their paper the authors (Massimiliano Battan and Marco Ronchetti) mention future plans for allowing JOIN operations to traverse relations. I would add to that list the possibility to retrieve the underlying query for further customization.

What I found most interesting is that QwwwQ cites a 1975 paper A psychological study of query by example by John C. Thomas and John D. Gould. Isn’t that cool?

Aspect-Based Sentiment Analysis on the Web using Rhetorical Structure Theory

Another award that day - Distinguished Paper Award. Although the topic is quite advanced, the presenters succinctly explained their application of Rhetorical Structure Theory or RST in deriving sentiment from product reviews written in English. What does it mean? It means that by deconstructing and classifying parts of a review multiple algorithms devised by the authors can determine what positive and negative sentiments the reviewer expressed. Interestingly, the results are slightly worse for negative reviews (allegedly, because positive words are often used to express negative sentiments) and results are better for reviews consisting of multiple sentences.

Diversity in Social Media Urban Analytics

An fairly interesting paper, which shows how social media activity data (from Twitter and Instagram in this case) can be used to determine patterns of their respective users in cities. The authors analyzed the usage of said social networks in four European cities: Amsterdam, London, Paris and Rome. Among the findings the authors discovered that

  • Instagram users concentrate more in the vicinity of tourist locations
  • The same can be said for tourists as opposed to residents
  • Instagram shows more uniform activity over time in various cities that Twitter (for example Amsterdam Twitter activity dies out much quicker in evenings)
  • Both social networks’ activity is proportional to city size
  • The activity of people aged 45+ is very low

As unsurprising as they may seem, these results are not really meant to reflect the reality in 100%. The authors are aware of the challenges and acknowledge shortcomings of their techniques. This paper is however a good exploration of various possibilities which await a data scientist who is interested in analyzing social behaviour on the web.

Design of CQA Systems for Flexible and Scalable Deployment and Evaluation

Another work with so much potential for practical application. Long ago I have been looking for a good, open source alternative to Stack Exchange. It turns out that there is one such open-source project being developed at Slovak University of Technology in Bratislava. Its focus is on educational use and has unique features for teachers/students and integrates with edX and MOOC. It is also used on the University in Lugano.

Oh, and did I mention that it is open source and hosted on GitHub? Go give it a try.