Staś Małolepszy

Clustory — reinventing the tabs in the browser

This post was migrated from my old blog. Some links might not work.

My answer to this summer’s Design Challenge is: forget about tabs completely.

Clustory: reinventing the tabs in the browser (Mozilla Concept Series) from Staś Małolepszy on Vimeo.

In the video above (slightly too lengthy, I admit), I try to explore a different approach to managing visited pages. I suggest using the loads of data stored in the browsing history (mining for patterns!) and present it in a way that exposes two new dimensions: time and tasks.

The clustory screen (a portmanteau word from “cluster” and “history”) would present a timeline of the visited pages, organized by themes, topics, or actions. Each horizontal line represents a cluster, and the pages belonging to the clusters are portrayed using screenshot thumbnails or favicons. At the top, there’s a timeline indicating the time of visits, and giving some other contextual information like the weather or even (not illustrated in the mock-up) wi-fi hotspot names (SSIDs), which you could identify as your own, your company’s or your school’s.

Picture 1

The sidebar on the right gives you some details about each cluster, such as:

  • a user-given name,
  • most important keyword as identified by the clustering algorithm,
  • last activity time,
  • total time spent on the pages belonging to the cluster,
  • action buttons: forget, bookmark, share (all pages in the cluster).

The page that you were viewing most recently is always the biggest thumbnail on the screen, to make it easy to click on it and leave the clustory screen. Moreover, it never disappears from the screen, even if you pan to see older visits or clusters that are below the page fold. Its meta-information is always visible, too. To get more information about other clusters, just hover over them, to expand them and show bigger thumbnails, as well some additional metadata.

Picture 2

You can drag and drop pages across clusters, for example when the algorithm isn’t accurate enough and made a mistake. Dropping a page into a new cluster would make the algorithm recalculate the probabilities of each word indicating that a new page belongs to it. Pages currently in the cluster would stay in it.

Which brings me to data mining. I believe that it would be sufficient to represent web documents as unordered collections of words that occur in them (the bag-of-words model). You could even go as far as checking only if a word is present in the document, without counting its occurrences. This would lead to a very simple binary representation that could then be used by a Naive Bayes classifier. The advantage of such approach is that we would be storing minimal amount of data per visited page and we wouldn’t need to recalculate the weights for each word and each document after opening a new page (as using an idf-based weight would require).

Please watch the video, download the Keynote presentation and share your thoughts in the comments. Thanks!

Published on 21.06.2009

Staś Małolepszy

Thoughts about the Internet, the information society, Mozilla and human-computer interactions.

Latest notes