Monday, April 5, 2010

Using salience to segment desktop activity into projects

Daniel Lowd - University of Washington, Seattle, WA, USA
Nicholas Kushmerick - Decho Corporation, Seattle, WA, USA

paper link:

This paper outlines research that is a part of Smart Desktop which is an application for information management. The research itself is concerned with providing functions and algorithms for "predicting the project associated with each action a user performs on a desktop." The main goal of these methods is to incorporate salience which claims that more recent information is more informative.

Actions done within the Smart Desktop application are captured by the algorithm and marks the resources and information involved in that operation including timestamps, what actions were done, and which project the actions and resources are involved in.

By capturing and mining these resources for information management related knowledge, users can have access to useful data more quickly, making the users more efficient.

Resource Features: (R)
Resources mined from the SmartDesktop application including web browsers, email clients, and office applications.

Past Project Features: (P)
Resources mined from the previous project that the user was working. These features help to predict the kinds of actions that the user plans to perform.

Salience Features: (S)
Information mined from current actions and how they related to resource features. Salient features define a current relationship between actions, programs, and resources.

Shared Salience Features:
The above features are used to construct a full feature vector with associated weights to projects. However that creates a large overhead and "overfitting" that prevents the ability to generalize new projects or different users.
So the algorithms develop looked at the shared salience features between projects.

The algorithms testing the salience metrics were:
Naive Bayes (NB)
Passive Aggressive (PA)
Logistic regression (LR)
Support Vector Machines (SVM)
Expert System (Expert)

The methodology for testing the system tested several users within several companies. The data mined can be very personal, so it was obfuscated. Each algorithm was evaluated on the user data with different feature combinations.

Results for the Errors of each algorithm are shown in the table below:

The results of their study showed that the logistic regression and support vector machine algorithms were the best where SVMs were slightly advantageous. Since these algorithms supported salience, their good performance indicates that salience is an important metric to implement for smart system.

The passive-aggressive algorithm was more accurate than the Naive Bayesian algorithm for the salience based input metrics even though it seemed to distract PA from providing good information.

My spill:

It was difficult to tell what exactly the paper was aiming to produce within the Smart Desktop application. However, it was clear that providing efficient prediction methods to enable information workers is important and that providing salience metrics improves most algorithms' performances.

The future work for developing these better algorithms for these metrics would clearly be to train the SVM algorithm or logistic regression algorithm using an expert like system for each user.

It seems like from the data that adding combinations of feature data into the algorithms doesn't help their accuracy.

I very much like the idea of having (smart) predictive office applications that lessens the tedium of computer based office work and enhances decision making.

No comments:

Post a Comment