Wednesday, April 14, 2010

CHI '08: From meiwaku to tokushita!: lessons for digital money design from japan

Scott Mainwaring Intel Research, Portland, OR, USA
Wendy March Intel Research, Portland, OR, USA
Bill Maurer UC Irvine, Irvine, CA, USA

Paper Link:∂=series&WantType=Proceedings&title=CHI&CFID=://

Mainwaring et. al. discuss the finding of an ethnographic study on the effects of e-money in Japan, particularly Tokyo and Okinawa (the Hawaii of Japan).
The main focus of the ethnographic study of e-money was on Near-Field Communication (NFC) enabled into cards, passes, and mobile devices. The reason the team chose Japan as the place for the ethnography is that it already has a high adoption rate of various forms of e-money. Mainwaring et. al. studied 3 different brands of emoney:

The main results of the study found that the reason for the high rate of adoption of e-money was the deeply engrained Japanese wish to reduce "meiwaku" 迷惑 which means "nuisance" or "bother." This plays heavily into Japanese society where the needs of the community often trump individual concerns and not wishing to stand out or bother others. This sense of meiwaku also highlights why only 1/10th of transactions in Japan are done via credit card whereas in the U.S. it's 1/4th of all transactions.

With suica, people could simply move past a turnstile and have their card automatically charged without holding up the flow of traffic.

While Edy also offers auto-charging with NFC technology, it could also increase meiwaku by holding up lines when the e-money ran out. Furthermore, putting more money into the account required finding charging stations of the same brand and on top of that, the card can only be used in stores supporting the brand. Finally, by law, money converted into e-money cannot be converted back into regular cash.

The other main theme found in the use of E-money is "Tokushita" 得した or well done/advantage gained. This refers to the rewards gained by using the e-money through rewards programs and gaining "something for nothing" out of using the card. Suica for example, allowed customers to earn travel miles for a certain amount spent. The study found that people would go out of their way to use their cards out of a sense of tokushita and to gain rewards.

The design considerations that the ethnography suggests that e-cash systems should:
1) result in a net decrease in commotion, before, during, and after point of sale.
2) Be designed for public use and take into account the environment of the transaction.
3) support management of their money without either introducing new burdens nor decreasing friction to a point of invisible spending
4) Subtly engage multiple senses, both for practical and aesthetic issues.
5) Leave room for dreams, irrationality, and for tokushita! Money is not just about exactness and frugality; it's also about fun. If e-money brightens your day then it might also fit into your life.

My spill:

I was interested in this study primarily because I'm studying Japanese currently and thought I'd like to hear some of the cultural implications in spending. I can't say that I learned just a whole lot but it was interesting. think we can all appreciate not wanting to burden or be a nuisance on people and keeping that in mind for designing any technology is important.

I would like to see future work address their design considerations listed in the end. Particularly in how NFC can be employed such that people aren't charged accidentally and being able to reverse a transaction should that happen. Also, I'd like to know whether it's possible to convert e-cash into say a credit card in the U.S. or if that's just a Japanese law.

Incorporating a rewards system for these kinds of transaction is a smart business move, I think. It keeps people motivated for using your card.

The authors also mentioned that the Japanese really focus on delivering aesthetic satisfaction in using their products. I think we should do that more in the states.

Tuesday, April 13, 2010

IUI '08: Designing and assessing an intelligent e-tool for deaf children

Rosella Gennari Free University of Bozen-Bolzano, Bolzano, Italy
Ornella Mich Free University of Bozen-Bolzano, Bolzano, Italy

Paper Link:

In this paper, Gennari and Mich designed an intelligent e-web based program called LODE (LOgic based e-tool for DEaf children) that aimed at cultivating the reading and reasoning skills of deaf children.

The aim for their system is best understood in the light of the difficulties that the deaf experience in language. Deaf children have difficulty developing their reading and reasoning skills as they are largely deprived of the constant exposure to language. Deaf people encode information differently from those that can hear and organize and access knowledge in different ways. The deaf focus on details and images as opposed to relations amongst concepts.

Specifically, their system focused on "stimulating global deductive reasoning" on entire narratives. Their LODE system does this by extracting temporally sensitive words using a logic system and automated temporal reasoning. The system can logically arrange the given input language (Italian in this case) and generate global deductive reasoning questions based on the story.

The architecture of the system is based on a web client-server model composed of several modules:
1) e-stories database
2) Automated reasoner made up of:
a) ECLiPSe - constrainst based programming system
b) a knowledge base for ECLiPSe
c) domain knowldge of constraint problems formalizing the temporal information of the e-stories

The GUI consists of a simple page framed in yellow (for concentration) with a picture and a sentence from the story on a blue background (for calmness) along with buttons to go to the next and previous pages and a dictionary to look-up difficult words. Temporal words are highlighted in orange to draw attention to temporal concepts that the user should remember.

They tested their system with bringing together LIS interpreters, a logopaedist, a linguist expert of deaf studies, a cognitive psychologist expert of deaf studies, and two deaf children.

One kid who was 13 years of age completed the stories easily while another 8 year old kid had trouble navigating the interface. Feedback from the experts was positive.

My Spill:

It's great that they're constructing advanced educational tools for deaf children. It seems like a great system for exceptionally young children, but I would think the questions for older children using the system would need to have stories and questions crafted by a human.

Their testing of the system was terribly insufficient. They needed to test more children using their system. Of course the experts on deaf studies will approve the system. After all, they are interesting in promoting work in their own fields.

IUI '08 (assignment): Temporal semantic compression for video browsing

Brett Adams Curtin University of Technology, Perth, W. Australia
Stewart Greenhill Curtin University of Technology, Perth, W. Australia
Svetha Venkatesh Curtin University of Technology, Perth, W. Australia

Paper Link:∂=series&WantType=Proceedings&title=IUI&CFID=81639924&CFTOKEN=12013848

Adams et. al. set out a video browsing approach known as Temporal Semantic Compression (TSC) that allows for unique ways of browsing and playing video data based on tempo and interest algorithms.

With interest algorithms, which can be installed to the browser using customizable plug-ins, a video can be filtered in terms of what the user is looking for in the video. An interesting application highlighted in the paper is that of applying different interest algorithms based on the genre.

For example, we could use:
excitement algorithms for sports
anxiety for home surveillance and news story change
attention for home home videos

The controls for the temporal compression based video browser employ a 2d spatial control on the display screen where the horizontal axis controls the point in the video whereas the vertical axis controls the compression. (compression is the amount of the video remaining from the original. i.e. 20% compression leaves 20% of the shots from the original video. 100% compression only leaves the "most intersting" frame of the video.)

The main measure of interest to derive which frames are selected in compression is calculated by determining the tempo. Tempo is determined by the director of the video by using action, music, dialog to affect the audiences sense of the time in the film. This video compression browser measures tempo by the pan, tilt, volume.
The calculation is as follows:

3 timescales:
1) Frame level features are in the timescale in the original movie. Adjusts playback point.
2) Shot level features are in the timescale that weights the timescale durations as being equal.
3) Compression level is where the compression functions can be changed.

Example compression functions:

Default (linear) - playback is in a linear pace much like the regular playback and fast forward functions.

Midshot - takes a constant amount from each shot (section) chosen by the pacing algorithm

Pace Proportional - uses the pacing tempo to continuously vary the playback speed. When the tempo is low the playback increases leading to more playback from higher tempo sections. (i.e. the more important sections are favored for playback)

Interesting shots - Applies speed up and compression and entire shots that consist of lower tempos are left out.

Adams et. al. tested their system on several movies, news shows, commercial, cartoons and talk shows and found that their compression algorithm could successfully pull out meaningful and interesting chunks of shots from the clips.

Video: (should make it easier to understand)

My Spill:

The Temporal Semantic Compression scheme is a great idea from my perspective. Most media players only support regular playback and fastforward and scene selection but I've never seen a browsing tool for choosing interesting parts of the video.
That's really cool.

The plugable functions could make the user able to search for different points of interest. (maybe I just want to find the action scenes in a movie.)

The real improvement in their interface would be to reduce the amount of metrics are shown so that screen space can be maximized.

IUI '08: Multimodal Chinese text entry with speech and keypad on mobile devices

Yingying Jiang Chinese Academy of Sciences, Beijing, China
Xugang Wang Chinese Academy of Sciences, Beijing, China and Ministry of information Industry Software and Integrated Circuit Promotion Center
Feng Tian Chinese Academy of Sciences, Beijing, China
Xiang Ao Ministry of information Industry Software and Integrated Circuit Promotion Center
Guozhong Dai Chinese Academy of Sciences, Beijing, China
Hongan Wang Chinese Academy of Sciences, Beijing, China

Paper Link:

In this paper Jiang et. al. created a multimodal text entry system that uses both keypad and speech entry to reduce the amount of key-presses, time to enter the characters, and number of resulting possible characters to choose from when using a mobile device.

Jiang et. al. identify the problem of chinese text entry on mobile keypads as slow and arduous and set out to improve the input method for these characters. The current method is called T9 in which roman phonetic characters (pinyin) corresponding to the sound of the chinese characters are input and then the desired characters are selected from a list of homophones. However this is slow and arduous so the Jiang et. al. proposed a method called "Jianpin" where the initial sound of the each chinese character the user wants is input via keyboard while the user simultaneously says the word they wish to enter.

For example, if the user wants to enter "wang luo" 网络 (network) into a mobile phone using Jianpin, the user presses "95" which corresponds to "w.l" while saying "wang luo" then the user selects 网络 from several other homophones.

Here is an overview of the input method:

A user study was run with 4 college students where 50 words were inputted in both the T9 method and the "Jianpin" method. They measured the number of key presses it took to complete the 50 words with each method. The results are as follows:

My spill:
The Jianpin input system sounds like a great way to reduce ambiguity in the selection set as well as speed up input.
My only bone to pick is that the input scheme requires voice input. I can imagine being on a crowded street in china with hundreds of Chinese entering voice input into their cell phones just so they can text.
It's just more noise pollution that way.
If they can make a faster system without voice input, I'll be impressed.

Knowing Japanese, I was really interested in how the Chinese entered text since they don't have a phonetic system like the Japanese. In the end, it really isn't all that different.

Monday, April 12, 2010

CHI '08 (assignment): Reality-based interaction: a framework for post-WIMP interfaces

(Comment left on Brandon Jarratt's blog

Robert J.K. Jacob Tufts University, Medford, MA, USA
Audrey Girouard Tufts University, Medford, MA, USA
Leanne M. Hirshfield Tufts University, Medford, MA, USA
Michael S. Horn Tufts University, Medford, MA, USA
Orit Shaer Tufts University, Medford, MA, USA
Erin Treacy Solovey Tufts University, Medford, MA, USA
Jamie Zigelbaum MIT Media Lab, Cambridge, MA, USA

Paper Link:

In this paper, Jacob et. al. discuss the emerging methods of human computer broadly referred to as reality based interfaces (RBI) and identify the unifying themes and concepts of these methods.

The research team first notes that human computer interaction was initially done via command line instructions that were typed in through a keyboard. This method of interaction was cumbersome and relied on knowledge of the command the computer would accept. It was difficult to use in part because users could not use preconceived notions of interaction.

Next they identified that the current generation of HCI is direct manipulation of 2 widgets commonly known as window, icon, menu, pointing device (WIMP) interfaces.

Finally the emerging methods of interaction are reality based interaction (RBI) that they define as drawing from four overarching themes:
1) Naive Physics
2) Body Awareness & Skills
3) Environment Awareness & Skills
4) Social Awareness & Skills

The team notes that using RBI themes may enhance or inhibit:
Expressive Power

The team uses Superman as an analogy saying that a strictly reality based representation of Superman would only allow Superman to walk and see like a regular man, but instead reality is traded off for the extra functionality of flight and X-ray vision.

The team demonstrates the four themes of RBI and the resulting tradeoffs in several case studies:
1) URP (a tangible user interface for urban planning)
2) Apple iPhone
3) Electronic Tourist Guide
4) Visual-Cliff Virtual Environment

The research team hopes this paper provides a scheme that unites the divergent user interfaces into a common framework that will be adopted by interface designers to create better systems in the future and that their research also provides a method to analyze future interfaces.

My Spill:

While their work is an interesting summary of reality based interfaces, I feel like this research didn't generate anything we didn't already know.
Reality is an ever emerging theme in CHI
and using reality based interfaces introduces several considerations and tradeoffs.

That's essentially all this paper was.
I'd like to see them present a set of ideal interfaces for a system or something.

The Superman analogy was nice.

Sunday, April 11, 2010

Rich interfaces for reading news on the web

Earl J. Wagner Northwestern University, Evanston, IL, USA
Jiahui Liu Northwestern University, Evanston, IL, USA
Larry Birnbaum Northwestern University, Evanston, IL, USA
Kenneth D. Forbus Northwestern University, Evanston, IL, USA

paper link:

In this paper, Wagner et al. created the "Brussell" system which is an interface that compiles summary information on a news article.
Brussell also gathers background information from on the article from related articles and links and can construct a kind of summary of information leading to the main article. It does this by searching for related links and cross referencing the information against other articles to remove extraneous and possible erroneous information.

By giving a summary of an article, at a quick glance users can quickly assimilate news, background information, and current information on certain events.
Even more important is the fact that the system can work off a knowledge base construct a net of references to older material when looking at a current article.

To test the system the team created templates for several kinds of articles and defined a set of information that the system looks to fill in for the template.
The team also used a database of older articles that gave the system a knowledge base for the Brussell system. Then the system was run over 100 different news stories to measure the number of references found by the system. The system found an average of 4.1 references per article.

My Spill:

The Brussell system creates an interesting addition to the data mining community by allowing casual users to weave a web of references and background information for news articles. I think the idea for the system is great. Allowing people to have a summarized view of current events could make the general populace more informed on current issues if the system is strong enough.

But that makes me think that the average user might not be motivated enough to use the system to become more educated on current issues, even though the given implementation may be easy enough to use. If the system could provide a means of rewarding the user for taking advantage of the system and reviewing material, then I think this kind of thing could be revolutionary.

I really wonder how "smart" the system really is...

Saturday, April 10, 2010

Obedience to Authority

Book Title:
Obedience to Authority: The Experiment that Challenged Human Nature

Stanley Milgram

This book presents the methodology, results, and formulated knowledge from the famous Stanley Milgram shock experiments and presents the thoughts of the man who created it.

Milgram begins by discussing the nature of obedience as it relates to everyday social life as well as the not-soon-forgotten implications of obedience as was displayed in the historical debacle known as WWII and the holocaust.

Milgram then sets up a method of inquiry into the nature of obedience with this Shock experiments in which a subject "teacher" at the command of an experimenter is to administer shocks to a "learner" who must answer a word pairing set up. The experiment is set up such that the learner who gets shocked is a paid actor who is faking pain and eventual death. The experimenter only replies to the complaints of the subject by telling him that the "experiment must continue" and that "there is no permanent tissue damage."

The shocks start small and increase in intensity until the shocks reach fatal levels.
The results of the baseline experiment showed that 65% of the subjects tested were obedient (remained in the experiment until fatal shocks were delivered)
Most subjects displayed extreme tension and anxiety during the experiment.

Several variations of the experiment were conducted including variations where the subject had to hold the learners hands to electrified plates to variations where the only signal from the learner was a light to indicate an answer. The variations of the experiment showed that increased proximity to the victim increased disobedience while increased proximity to the experimenter increased obedience.

Variations also included changes of personnel and personality types in the different roles and even multiple teachers. If the experimenter was not authoritative or a professional, disobedience would increase. If both learner and experimenter were professional authorities, the experiment would halt immediately.

In examining the results, Milgram found that people transition between two states of operations:
1) Autonomous State
2) Agentic State

In the autonomous state, the person is an individual where motivation and responsibility for one's actions are derived from one's own self. Here the overriding determinant of morality is the self which generally means that harm of others is avoided at all costs.

However, in the agentic state, the individual relinquishes responsibility for his or her actions onto the authority who issues commands presumably for a justifiable cause that benefits society in some way, whether the society is immediate or at some nebulous level. Since responsibility is with the authority, the judging of the morality of one's actions is bypassed and entrusted to the authority. Milgram argues that we are predisposed to obey the authority to preserve the structure of society. Morality is now viewed in terms of obedience, loyalty, duty, discipline, and self sacrifice.

Immediate Antecedent Conditions: (to entering authority)
Perception of Authority
Entry into Authority System
Coordination of Command with the Function of Authority
Overarching Ideology

Binding Factors:
Sequential Nature of the Action
Situational Obligations

Resolution of Strain:
Physical Conversion

Milgram notes that the steps to disobeying an authority are psychically painful and is done only as a last resort.
The steps toward dissobedience are:
Inner doubt
Externalization of doubt

Milgram also mentions several stain-resolving mechanisms that help an individual remain obedient.
My Spill:
Obedience to Authority, despite being several decades old, the book and the associated research has not lost any of the potency to shock (no pun intended) the reader at the apparent hardheartedness of humanity.

Milgram deconstructs and carefully examines each component of his experiment and comes up with a thorough theory of obedience that I think does much to explain the nature of authority.

With this viewpoint of authority, we begin to see man in a different light. A light where sources of authority should be held with great distrust as they hold with it the actions of every man under that authority.

The idea that really strikes me is how when switching to the Agentic State the nature of how one interprets morality is changed from one's own morals to the 'virtues' (perhaps principles is a better word) of obedience, loyalty, duty, discipline, and self sacrifice.

However, the Agentic virtues are only virtues when the aim of the authority is
toward benevolent ends accompanied with benevolent means.
Malicious ends and malicious means should be rebelled against!

Those in places of authority and power MUST act with morality.
In the end, we come to the quote (of debated origins),
"With great power comes great responsibility."
or as Jesus Christ says in the Gospel of Luke, chapter 12, verse 48: "For unto whomsoever much is given, of him shall be much required: and to whom men have committed much, of him they will ask the more."

Here's some more food for thought:
"...those of us who heedlessly accept the commands of authority cannot yet claim to be civilized men." -Harold J. Laski (Not that I'm a proponent of the Labour Party)

“Unthinking respect for authority is the greatest enemy of truth.” -Albert Einstein

Monday, April 5, 2010

Using salience to segment desktop activity into projects

Daniel Lowd - University of Washington, Seattle, WA, USA
Nicholas Kushmerick - Decho Corporation, Seattle, WA, USA

paper link:

This paper outlines research that is a part of Smart Desktop which is an application for information management. The research itself is concerned with providing functions and algorithms for "predicting the project associated with each action a user performs on a desktop." The main goal of these methods is to incorporate salience which claims that more recent information is more informative.

Actions done within the Smart Desktop application are captured by the algorithm and marks the resources and information involved in that operation including timestamps, what actions were done, and which project the actions and resources are involved in.

By capturing and mining these resources for information management related knowledge, users can have access to useful data more quickly, making the users more efficient.

Resource Features: (R)
Resources mined from the SmartDesktop application including web browsers, email clients, and office applications.

Past Project Features: (P)
Resources mined from the previous project that the user was working. These features help to predict the kinds of actions that the user plans to perform.

Salience Features: (S)
Information mined from current actions and how they related to resource features. Salient features define a current relationship between actions, programs, and resources.

Shared Salience Features:
The above features are used to construct a full feature vector with associated weights to projects. However that creates a large overhead and "overfitting" that prevents the ability to generalize new projects or different users.
So the algorithms develop looked at the shared salience features between projects.

The algorithms testing the salience metrics were:
Naive Bayes (NB)
Passive Aggressive (PA)
Logistic regression (LR)
Support Vector Machines (SVM)
Expert System (Expert)

The methodology for testing the system tested several users within several companies. The data mined can be very personal, so it was obfuscated. Each algorithm was evaluated on the user data with different feature combinations.

Results for the Errors of each algorithm are shown in the table below:

The results of their study showed that the logistic regression and support vector machine algorithms were the best where SVMs were slightly advantageous. Since these algorithms supported salience, their good performance indicates that salience is an important metric to implement for smart system.

The passive-aggressive algorithm was more accurate than the Naive Bayesian algorithm for the salience based input metrics even though it seemed to distract PA from providing good information.

My spill:

It was difficult to tell what exactly the paper was aiming to produce within the Smart Desktop application. However, it was clear that providing efficient prediction methods to enable information workers is important and that providing salience metrics improves most algorithms' performances.

The future work for developing these better algorithms for these metrics would clearly be to train the SVM algorithm or logistic regression algorithm using an expert like system for each user.

It seems like from the data that adding combinations of feature data into the algorithms doesn't help their accuracy.

I very much like the idea of having (smart) predictive office applications that lessens the tedium of computer based office work and enhances decision making.

MediaGLOW: organizing photos in a graph-based workspace

Andreas Girgensohn FX Palo Alto Laboratory, Palo Alto, CA, USA
Frank Shipman Texas A&M University, College Station, CA, USA
Lynn Wilcox FX Palo Alto Laboratory, Palo Alto, CA, USA
Thea Turner FX Palo Alto Laboratory, Palo Alto, CA, USA
Matthew Cooper FX Palo Alto Laboratory, Palo Alto, CA, USA

paper link:

MediaGLOW is a graph-based interactive workspace for organizing photos.
GLOW stands for Graph Layout Organization Workspace. Photos can be organized into stacks based on relatedness and distance to produce glowing areas of relatedness called "neighborhoods." Neighborhoods are indicated by a colored halo where the photos are located. Photos can only belong to a single neighborhoods however, their relatedness to other neighborhoods can be shown by overlapping neighborhood halos.

Relatedness of the nodes/photos in the graph are indicated by manually entered tag data associated which each photo. Related photos and neighborhoods are moved when a related area is moved to maintain the visual relatedness. Relatedness of photos can also be determined by the time the photos were taken or by geographical location of where the photos were taken. (Temporal and Geographical) Photos anchored to a node based on relatedness are indicated by a blue lines called "spring."

The user can zoom in and out of the workspace and conduct standard selection gestures as well as use a get all button to select all photos based on tags, places, or dates.

A user study was conducted using both a traditional photo organization program and media glow where users had to organize 450 photos into categories of:
Then they organized 3 photos from each category into a travel brochure.

The study showed that while the mediaGLOW interface was not as efficient as the traditional program, users stated that mediaGlow was more fun.

My spill:

MediaGLOW is a visually interesting program that makes use of some good metrics for organization, but the fact that they're placing photos on a blank interface makes the program LESS organized than other photo organizers that use a grid.

I like the idea of making interfaces more fun, but I think that makes the interface only useful for novice photo organizers where the more advanced metrics won't be as appreciated.

I like the idea of overlapping halos and geographic metrics for photos and having a clear interface probably keeps the workspace from being too obfuscating or overwhelming.

For future mediaGLOW work, I'd like to see the photos based on the relatedness based on inherent photo content somewhat like what the google image search does.