HCI and Evaluation
Information Visualization

John Alexis Guerra Gómez | john.guerra[at]gmail.com | @duto_guerra
https://johnguerra.co/lectures/information_visualization_spring2023/12_Evaluation/

Partially based on slides from Tamara Munzner
and previous adaptations with Andy Reagan

What We Are Going to Learn

Human Computer Interaction
Techniques for evaluation
- Controlled experiments
- Interviews
- Surveys
- Case studies
Usability studies
- Running the usability study
- Choosing tasks
- Prioritization
- Likert scales

Evaluation

Controlled experiments
Natural settings
Any setting not involving users (expert reviews)

Expert Reviews

Design experts
Visualization experts
Usability experts
Domain experts

Types of Expert Reviews

Heuristic evaluation (golden rules)
Guidelines review
Consistency inspection
Cognitive walkthrough
Metaphors of human thinking
Formal usability inspection (courtroom style)
Accesibility inspection

Eight Golden Rules of Design

Strive for consistency
Cater for universal usability
Offer informative feedback
Design dialogs to yield closure
Prevent errors
Permit easy reversal of actions
Support internal locus of control
Reduce short-term memory load

Controlled Experiments

Experiments in the lab
Controlled confounding variables
Measure one or more quantitative variables
- Usability testing
- Living labs

What to Measure?

Time to learn
Speed of performance
Rate of errors
Retention over time
Subject satisfaction

Variables

Independent variables (causes)
Dependent variables (effect)
Extraneous variables (that can affect the experiment)

Controlled Experiment example

Effects of distribution on Effectivenes of dataviz

In addition to the choice of visual encodings, the effectiveness of a data visualization may vary with the analytical task being performed and the distribution of data values.

Are marks and channels the most important factor on effectivenes?

Experiment

To better assess these effects and create refined rankings of visual encodings, we conduct an experiment measuring subject performance across task types (e.g., comparing individual versus aggregate values) and data distributions (e.g., with varied cardinalities and entropies).

Measure performance ⊗ different tasks ⊗ data cardinality ⊗ data entropy

Tasks and conditions

Kim, Y. and Heer, J. (2018), Assessing Effects of Task and Data Distribution on the Effectiveness of Visual Encodings. Computer Graphics Forum, 37: 157-167. doi:10.1111/cgf.13409

Controlling extraneous variables

Tasks

Results

Effects of tasks

Position (x, y) conveys the primary quantities well.
Size encoding performs well for summary tasks.
Color encoding performs well for Compare Averages.
Size & color exhibit asymmetric effects for Q1 vs. Q2.
Faceted charts exhibit asymmetric performance.

Usability Testing

Natural Settings Involving Users

Observation
Interviews
Logging

Triangulation

Different researchers observe the same effect.

Interviews

Unstructured
Structured
Semi-structured
Focus group
Telephone/online interviews

Questionnaire

Like interviews but without the researcher present

Likert Scales

Likert Scale

What do you think?

Strongly disagree
Disagree
Okay
Agree
Strongly agree

More About Likert Scales

Can be 3, 5, 7, or more responses
Continuous or discrete
Middle response is the balance

Likert Scales d3

Likert Scales Vega-Lite

https://vega.github.io/vega-lite/examples/layer_likert.html

Likert Scales Vega-L ite (cont.)

https://vega.github.io/vega-lite/examples/concat_layer_voyager_result.html

Other Methods

Observation

User's setting
Can be direct or indirect

Direct Observation in the Field

Ethnography

Direct Observation in Controlled Environments

Think aloud techniques

Direct Observation: Tracking Users

Diaries
Interaction logs and web analytics

MILCS

Multi-dimensional
In-depth
Long-term
Case studies

TreeVersity MILCS

Thirteen different case studies with nine agencies
TreeVersity 2: UMD Budget 2010 - 2012

Focus groups

One researcher, many attendees

Prototyping

Low vs. high fidelity?
Read data
Build scenarios, tell a story

Quatitative evaluation

Analysis of Variance (ANOVA) for comparing the means

Running a Usability Study

Validity Checks

Earlier stages:
- Observe and interview target users (needs assessment)
- Design data abstraction/operation (data types, transformation, operations)
- Justify encoding/interaction design (design heuristics, perception research)
- Informal analysis/qualitative analysis of prototypes (task-based)
- Algorithm complexity analysis/evaluation
Mid- and later stages:
- Qualitative analysis of system (task-based)
- Algorithm performance analysis
- Lab or crowdsourced user study
- Field study of the deployed system

Formal Usability Study

Goal: Does the visualization allow the user/analyst to perform key tasks?

Task-Oriented Visual Insights

Basic insights:
- Read a value
- Identify extrema
- Characterize distribution
- Describe correlation
Comparative insights:
- Compare values
- Compare extrema
- Compare distribution
- Compare correlation

Usability Study: Logistics

You will need:
- Visualization with test data loaded
- Consent form (if required)
- Task list
- Protocol (study procedures and debrief questions)
- Surveys/interviews and any additional data-collection instruments
- Audio or video recorder, notepad

How Many People Do You Need?

"Lab" Doesn’t Need to Mean a Formal Lab

Software for Collecting Audio/Video

Video of user
Screencapture of user actions
Audio of entire session

Online Tools

Surveys
Mouse tracking/navigation tracking

Prioritization

You’ve Collected Data

Task completion
Time on task
Notes
Interview responses
Survey responses
...Then what?

Table source: Stasko, J., Catrambone, R., Guzdial, M., & McDonald, K. (2000). An evaluation of space-filling information visualizations for depicting hierarchical structures.

What is the Analyst’s Information Scent?

A term originating from Peter Pirolli and Stuart Card PARC looking at meaning Accounts for not just completion but also situating WHAT the person was doing, may wish to do think aloud as well. It’s not always obvious what leads to the challenge. Like a detective, not always obvious. Transating to design.

MoSCoW Prioritization

Must
Should
Could
Won't

Severity Ratings

Not a real problem
Cosmetic
Minor usability issue
Major usability issue
Critical issue

Limitations

Ecological validity
Are performance-oriented tasks the complete story?

Usability Study Demo

References

What We Learned

Human Computer Interaction
Techniques for evaluation
- Controlled experiments
- Interviews
- Surveys
- Case studies
Usability studies
- Running the usability study
- Choosing tasks
- Prioritization
- Likert scales

HCI and EvaluationInformation Visualization

What We Are Going to Learn

Evaluation

Expert Reviews

Types of Expert Reviews

Eight Golden Rules of Design

Controlled Experiments

What to Measure?

Variables

Controlled Experiment example

Effects of distribution on Effectivenes of dataviz

Experiment

Tasks and conditions

Controlling extraneous variables

Tasks

Results

Usability Testing

Usability Testing

Natural Settings Involving Users

Triangulation

Interviews

Questionnaire

Likert Scales

Likert Scale

More About Likert Scales

Likert Scales d3

Likert Scales Vega-Lite

Likert Scales Vega-L ite (cont.)

Other Methods

Observation

Direct Observation in the Field

Direct Observation in Controlled Environments

Direct Observation: Tracking Users

MILCS

TreeVersity MILCS

Focus groups

Prototyping

Quatitative evaluation

Running a Usability Study

Validity Checks

Formal Usability Study

Goal: Does the visualization allow the user/analyst to perform key tasks?

Task-Oriented Visual Insights

Usability Study: Logistics

How Many People Do You Need?

"Lab" Doesn’t Need to Mean a Formal Lab

Software for Collecting Audio/Video

Online Tools

Prioritization

You’ve Collected Data

What is the Analyst’s Information Scent?

MoSCoW Prioritization

Severity Ratings

Limitations

Usability Study Demo

References

What We Learned

HCI and Evaluation
Information Visualization