ISIS 4822: Visual Analytics


John Alexis Guerra Gómez| ja.guerrag[at]uniandes.edu.co| @duto_guerra
Jose Tiberio Hernandez | jhernand[at]uniandes.edu.co
Universidad de los Andes


http://johnguerra.co/lectures/visualAnalytics_fall2019/01_Introduction/

Based onslides from Tamara Munzner

Syllabus

Book

Visualization Analysis and Design, Tamara Munzner

Projects & Homeworks

  • Homeworks (5)
  • Individual mid class project
  • Group final project (3 people), 30% of the grade

Readings and participation

  • Each class has assigned readings
  • We will play the lottery at the beginning of class
  • I will ask about the readings and what we have learned

Lottery (Grading)

  • -2 Wasn't there when called upon on class
  • -1 Answered wrong
  • 0 Regular answer
  • 1 Kind of good answer
  • 2 Great answer

Introduction

Definitions

Defining Visualization (vis)

Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively.

Why?

Have the human in the loop

Visualization is suitable when there is a need to augment human capabilities rather than replace people with computational decision-making methods.

When don't use vis?

Don’t need vis when fully automatic solution exists and is trusted

But

  • Many analysis problems are ill-specified
    • Don’t know exactly what questions to ask in advance

Vis allows for

  • Long-term use for end users (e.g. exploratory analysis of scientific data)
  • Presentation of known results
  • Stepping stone to better understanding of requirements before developing models
  • Help developers of automatic solution refine/debug, determine parameters
  • Help end users of automatic solutions verify, build trust

Why use an external representation?

Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively.

Why use an external representation?

External representation: replace cognition with perception

[Cerebral: Visualizing Multiple Experimental Conditions on a Graph with Biological Context. Barsky, Munzner, Gardy, and Kincaid. IEEE TVCG (Proc. InfoVis) 14(6):1253-1260, 2008.]

Why use computer in the loop?

Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively.

Why have a computer in the loop?

  • Beyond human patience
  • Scale to large datasets
  • Support interactivity

Why depend on vision?

Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively.

Why depend on vision?

  • Human visual system is high-bandwidth channel to brain
    • Overview possible due to background processing
    • Subjective experience of seeing everything simultaneously
    • Significant processing occurs in parallel and pre-attentively
  • sound: lower bandwidth and different semantics
    • overview not supported
    • subjective experience of sequential stream
  • touch/haptics: impoverished record/replay capacity
    • only very low-bandwidth communication thus far
  • taste, smell: no viable record/replay devices

Why show data in detail?

  • Summaries lose information
    • Confirm expected and find unexpected patterns
    • Assess validity of statistical model

Anscombe's quartet

I II III IV
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
Property Value
Mean of x 9
Variance of x 11
Mean of y 7.50
Variance of y 4.125
Correlation between x and y 0.816
Linear regression y = 3.00 + 0.500x
Coefficient of determination of the linear regression 0.67

Anscombe's visualized

More examples same stats

https://dabblingwithdata.wordpress.com/2017/05/03/the-datasaurus-a-monstrous-anscombe-for-the-21st-century/

Datasaurus!

https://dabblingwithdata.wordpress.com/2017/05/03/the-datasaurus-a-monstrous-anscombe-for-the-21st-century/

Idioms

Distinct approach to creating or manipulating visual representations

Exercise

Let's find in how many ways we can visualize two numbers 13 and 23

Idiom design space

The design space of possible vis idioms is huge, and includes the considerations of both how to create and how to interact with visual representations

Idioms

  • How to draw it: visual encoding idiom
    • Many possibilities for how to create
  • How to manipulate it: interaction idiom
    • Even more possibilities
    • Make single idiom dynamic
    • Link multiple idioms together through interaction

Why focus in tasks and effectiveness?

Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively.

Why focus in tasks and effectiveness?

  • Tasks serve as constraint on design (as does data)
    • Idioms do not serve all tasks equally!
    • Challenge: recast tasks from domain-specific vocabulary to abstract forms
  • Most possibilities ineffective
    • Validation is necessary, but tricky
    • Increases chance of finding good solutions if you understand full space of possibilities

What counts as effective?

  • Novel: enable entirely new kinds of analysis
  • Faster: speed up existing workflows

Resource limitations

Vis designers must take into account three very different kinds of resource limitations: those of computers, of humans, and of displays.

Computational limits

  • Processing time
  • System memory

Human limits

  • Human attention
  • Memory
  • Retention

Display limits

  • Pixels are precious resource, the most constrained resource
  • Information density: ratio of space used to encode info vs unused whitespace
    • Tradeoff between clutter and wasting space, find sweet spot between dense and sparse

Visual Analytics

How to do data analytics?

  • Statistical Analysis
  • Machine Learning and Artificial Intelligence
  • Visual Analytics (and data analytics)

Data Mining/Machine Learning

Information Visualization

Visual Analytics

Traditional

  • Query for known patterns
  • Display results using traditional techniques

Pros:
  • Many solutions
  • Easier to implement

Cons:
  • Can’t search for the unexpected

Data Mining/ML

  • Based on statistics
  • Black box approach
  • Output outliers and correlations
  • Human out of the loop

Pros:
  • Scalable

Cons:
  • Analysts have to make sense of the results
  • Makes assumptions on the data

InfoVis

  • Visual Interactive Interfaces
  • Human in the loop

Pros:
  • Visual bandwidth is enormous
  • Experts decided what to search for
  • Identify unknown patterns and errors in the data

Cons
  • Scalability can be an issue

In Infovis we look for insights

  • Deep understanding
  • Meaningful
  • Non obvious
  • Actionable
  • Based on data

An insight is:

  • something that the user can learn from the data using the dataviz
  • which she didn't know/expect,
  • also, is useful/needed for her,
  • moreover, she didn't know of it,
  • and that she can leverage

Insights

FDA

Task: Change in drug's adverse effects reports

User: FDA Analysts

State of the art

https://treeversity.cattlab.umd.edu/

Health insurance claims

Task: Detect fraud networks

User: Undisclosed Analysts

Clustering

Force in a box

Overview

Ego distance

Tweetometro

Task: Twitter behavior during Presidential Elections

User: Me

http://tweetometro.co

Normal tweets

Weird tweets?

Creation dates

Number of followers

What car to buy?

Task: What's the best car to buy?

User: Me

Normal procedure

Ask friends and family

Problem

That's inferring statistics from a sample n=1

Better approach

Data based decisions

http://tucarro.com

Take home message

  • Visualization: Computer + Visuals + Data + Human + Tasks