HCI and Evaluation
Information Visualization

What We Are Going to Learn

  • Human Computer Interaction
  • Techniques for evaluation
    • Controlled experiments
    • Interviews
    • Surveys
    • Case studies
  • Usability studies
    • Running the usability study
    • Choosing tasks
    • Prioritization
    • Likert scales

Evaluation

  • Controlled experiments
  • Natural settings
  • Any setting not involving users (expert reviews)

Expert Reviews

  • Design experts
  • Visualization experts
  • Usability experts
  • Domain experts

Types of Expert Reviews

  • Heuristic evaluation (golden rules)
  • Guidelines review
  • Consistency inspection
  • Cognitive walkthrough
  • Metaphors of human thinking
  • Formal usability inspection (courtroom style)
  • Accesibility inspection

Eight Golden Rules of Design

  • Strive for consistency
  • Cater for universal usability
  • Offer informative feedback
  • Design dialogs to yield closure
  • Prevent errors
  • Permit easy reversal of actions
  • Support internal locus of control
  • Reduce short-term memory load

Controlled Experiments

  • Experiments in the lab
  • Controlled confounding variables
  • Measure one or more quantitative variables
    • Usability testing
    • Living labs

What to Measure?

  • Time to learn
  • Speed of performance
  • Rate of errors
  • Retention over time
  • Subject satisfaction

Variables

  • Independent variables (causes)
  • Dependent variables (effect)
  • Extraneous variables (that can affect the experiment)

Controlled Experiment example

Effects of distribution on Effectivenes of dataviz

In addition to the choice of visual encodings, the effectiveness of a data visualization may vary with the analytical task being performed and the distribution of data values.

Are marks and channels the most important factor on effectivenes?

Experiment

To better assess these effects and create refined rankings of visual encodings, we conduct an experiment measuring subject performance across task types (e.g., comparing individual versus aggregate values) and data distributions (e.g., with varied cardinalities and entropies).

Measure performance ⊗ different tasks ⊗ data cardinality ⊗ data entropy

Tasks and conditions

Kim, Y. and Heer, J. (2018), Assessing Effects of Task and Data Distribution on the Effectiveness of Visual Encodings. Computer Graphics Forum, 37: 157-167. doi:10.1111/cgf.13409

Controlling extraneous variables

Kim, Y. and Heer, J. (2018), Assessing Effects of Task and Data Distribution on the Effectiveness of Visual Encodings. Computer Graphics Forum, 37: 157-167. doi:10.1111/cgf.13409

Tasks

Kim, Y. and Heer, J. (2018), Assessing Effects of Task and Data Distribution on the Effectiveness of Visual Encodings. Computer Graphics Forum, 37: 157-167. doi:10.1111/cgf.13409

Results

Kim, Y. and Heer, J. (2018), Assessing Effects of Task and Data Distribution on the Effectiveness of Visual Encodings. Computer Graphics Forum, 37: 157-167. doi:10.1111/cgf.13409
Effects of tasks
  • Position (x, y) conveys the primary quantities well.
  • Size encoding performs well for summary tasks.
  • Color encoding performs well for Compare Averages.
  • Size & color exhibit asymmetric effects for Q1 vs. Q2.
  • Faceted charts exhibit asymmetric performance.

Usability Testing

Usability Testing

Brighton Uni Usability Lab

Natural Settings Involving Users

  • Observation
  • Interviews
  • Logging

Triangulation

Different researchers observe the same effect.

Interviews

  • Unstructured
  • Structured
  • Semi-structured
  • Focus group
  • Telephone/online interviews

Questionnaire

Like interviews but without the researcher present

Likert Scales

Likert Scale

What do you think?

  • Strongly disagree
  • Disagree
  • Okay
  • Agree
  • Strongly agree

More About Likert Scales

  • Can be 3, 5, 7, or more responses
  • Continuous or discrete
  • Middle response is the balance

Likert Scales d3

Likert Scales Vega-Lite

https://vega.github.io/vega-lite/examples/layer_likert.html

Likert Scales Vega-L ite (cont.)

https://vega.github.io/vega-lite/examples/concat_layer_voyager_result.html

Other Methods

Observation

  • User's setting
  • Can be direct or indirect

Direct Observation in the Field

Ethnography

Direct Observation in Controlled Environments

  • Think aloud techniques

Direct Observation: Tracking Users

  • Diaries
  • Interaction logs and web analytics

MILCS

  • Multi-dimensional
  • In-depth
  • Long-term
  • Case studies

TreeVersity MILCS

Focus groups

One researcher, many attendees

Prototyping

  • Low vs. high fidelity?
  • Read data
  • Build scenarios, tell a story

Quatitative evaluation

Analysis of Variance (ANOVA) for comparing the means

Running a Usability Study

Validity Checks

  • Earlier stages:
    • Observe and interview target users (needs assessment)
    • Design data abstraction/operation (data types, transformation, operations)
    • Justify encoding/interaction design (design heuristics, perception research)
    • Informal analysis/qualitative analysis of prototypes (task-based)
    • Algorithm complexity analysis/evaluation
  • Mid- and later stages:
    • Qualitative analysis of system (task-based)
    • Algorithm performance analysis
    • Lab or crowdsourced user study
    • Field study of the deployed system

Formal Usability Study

Goal: Does the visualization allow the user/analyst to perform key tasks?

Task-Oriented Visual Insights

  • Basic insights:
    • Read a value
    • Identify extrema
    • Characterize distribution
    • Describe correlation
  • Comparative insights:
    • Compare values
    • Compare extrema
    • Compare distribution
    • Compare correlation

Usability Study: Logistics

  • You will need:
    • Visualization with test data loaded
    • Consent form (if required)
    • Task list
    • Protocol (study procedures and debrief questions)
    • Surveys/interviews and any additional data-collection instruments
    • Audio or video recorder, notepad

How Many People Do You Need?

"Lab" Doesn’t Need to Mean a Formal Lab

Software for Collecting Audio/Video

  • Video of user
  • Screencapture of user actions
  • Audio of entire session

Online Tools

  • Surveys
  • Mouse tracking/navigation tracking

Prioritization

You’ve Collected Data

What is the Analyst’s Information Scent?

A term originating from Peter Pirolli and Stuart Card PARC looking at meaning Accounts for not just completion but also situating WHAT the person was doing, may wish to do think aloud as well. It’s not always obvious what leads to the challenge. Like a detective, not always obvious. Transating to design.

MoSCoW Prioritization

  • Must
  • Should
  • Could
  • Won't

Severity Ratings

  1. Not a real problem
  2. Cosmetic
  3. Minor usability issue
  4. Major usability issue
  5. Critical issue

Limitations

  1. Ecological validity
  2. Are performance-oriented tasks the complete story?

Usability Study Demo

References

What We Learned

  • Human Computer Interaction
  • Techniques for evaluation
    • Controlled experiments
    • Interviews
    • Surveys
    • Case studies
  • Usability studies
    • Running the usability study
    • Choosing tasks
    • Prioritization
    • Likert scales