What: abstracting the data
Information Visualization

What we are going to learn

  • Abstracting the data
  • Types of datasets
  • Types of attributes
  • Sequential, diverging, cyclical
  • Data aggregation (temporal and geo)
  • Deriving new data

Abstracting the data

Why abstract the data?

  • Different attribute types 👉 different representations
  • Different dataset types 👉 different idioms available

What do you need to abstract?

What_highlighted Created with Sketch.
  • Dataset type: (e.g., table, network, temporal ...)
  • Attribute types: (e.g. categorical, ordinal, quantitative)
  • Ordering direction: (e.g. sequential, diverging, cyclical)
  • Data availability: (e.g. dynamic, static)

Types of Datasets

Dataset Types: tables, network/trees fields geometry
+ Temporal!

Tables

  • The vast majority of datasets
  • Attributes (columns) + items (rows)
  • Other dataset types are usually stored as a combination of tables
price brand model title_status mileage color lot
21500 ford f-150 clean vehicle 76876 gray 167783132
7500 nissan door clean vehicle 58126 black 167598085
10700 chevrolet trax clean vehicle 44280 red 167792666
23100 dodge charger clean vehicle 12265 silver 167735423
33800 dodge durango clean vehicle 15003 black 167734879
4140 ford door clean vehicle 40747 white 167656519
13900 nissan rogue clean vehicle 38760 white 167762102
15700 ford cutaway clean vehicle 75862 white 167780452
1150 ford door salvage insurance 123349 red 167652717
26100 ford f-150 clean vehicle 32149 white 167741409
14000 ford fusion clean vehicle 50513 white 167749355
22500 ford door clean vehicle 43646 black 167780692

File formats

Comma Separated File (CSV)

price,brand,model,title_status,mileage,color,lot
21500,ford,f-150,clean vehicle,76876,gray,167783132
7500,nissan,door,clean vehicle,58126,black,167598085
10700,chevrolet,trax,clean vehicle,44280,red,167792666
23100,dodge,charger,clean vehicle,12265,silver,167735423
33800,dodge,durango,clean vehicle,15003,black,167734879
4140,ford,door,clean vehicle,40747,white,167656519
13900,nissan,rogue,clean vehicle,38760,white,167762102
15700,ford,cutaway,clean vehicle,75862,white,167780452
1150,ford,door,salvage insurance,123349,red,167652717
26100,ford,f-150,clean vehicle,32149,white,167741409
14000,ford,fusion,clean vehicle,50513,white,167749355
22500,ford,door,clean vehicle,43646,black,167780692

JSON

[
  {
    "price": 21500,
    "brand": "ford",
    "model": "f-150",
    "title_status": "clean vehicle",
    "mileage": 76876,
    "color": "gray",
    "lot": 167783132
  },
  {
    "price": 7500,
    "brand": "nissan",
    "model": "door",
    "title_status": "clean vehicle",
    "mileage": 58126,
    "color": "black",
    "lot": 167598085
  },
  {
    "price": 10700,
    "brand": "chevrolet",
    "model": "trax",
    "title_status": "clean vehicle",
    "mileage": 44280,
    "color": "red",
    "lot": 167792666
  },
  {
    "price": 23100,
    "brand": "dodge",
    "model": "charger",
    "title_status": "clean vehicle",
    "mileage": 12265,
    "color": "silver",
    "lot": 167735423
  },
  {
    "price": 33800,
    "brand": "dodge",
    "model": "durango",
    "title_status": "clean vehicle",
    "mileage": 15003,
    "color": "black",
    "lot": 167734879
  },
  {
    "price": 4140,
    "brand": "ford",
    "model": "door",
    "title_status": "clean vehicle",
    "mileage": 40747,
    "color": "white",
    "lot": 167656519
  },
  {
    "price": 13900,
    "brand": "nissan",
    "model": "rogue",
    "title_status": "clean vehicle",
    "mileage": 38760,
    "color": "white",
    "lot": 167762102
  },
  {
    "price": 15700,
    "brand": "ford",
    "model": "cutaway",
    "title_status": "clean vehicle",
    "mileage": 75862,
    "color": "white",
    "lot": 167780452
  },
  {
    "price": 1150,
    "brand": "ford",
    "model": "door",
    "title_status": "salvage insurance",
    "mileage": 123349,
    "color": "red",
    "lot": 167652717
  },
  {
    "price": 26100,
    "brand": "ford",
    "model": "f-150",
    "title_status": "clean vehicle",
    "mileage": 32149,
    "color": "white",
    "lot": 167741409
  },
  {
    "price": 14000,
    "brand": "ford",
    "model": "fusion",
    "title_status": "clean vehicle",
    "mileage": 50513,
    "color": "white",
    "lot": 167749355
  },
  {
    "price": 22500,
    "brand": "ford",
    "model": "door",
    "title_status": "clean vehicle",
    "mileage": 43646,
    "color": "black",
    "lot": 167780692
  }
]

Temporal

  • Tabular + time-related attribute
  • Aggregations
  • Cyclical
  • Events or intervals
  • Seasonality patterns

Networks

  • Nodes + links
  • Attributes on both
  • Network idioms
  • Network analytics

How are networks stored?

  • Two files
  • JSON file
  • Adjacency matrix
  • Generated from any data

Trees

How are trees stored?

  • Table with path attribute
  • JSON file
  • Generated from any data

Geometry

  • Location data (usually latitude/longitude)
  • 2D or 3D
  • Inherent shape
  • Points, routes, shapes

Fields

  • Continuous
  • Sampled
  • Example: brain scan

Data availability

  • Static 👉 data don't change
  • Dynamic 👉 data are constantly updating (e.g., stock prices)
  • For dynamic data, you will need a data endpoint

Types of Attributes

Data attribute types: categorical, ordinal and quantitative

Categorical

  • No order
  • E.g., names, countries, types
  • Must be represented with visual channels that don't convey order

Ordinal

  • Has implicit order
  • But, you can't do arithmetic
  • Can even be numerical
  • E.g., t-shirt sizes, school years, rankings

Quantitative

  • Also ordered
  • You can perform arithmetic
  • Can be divergent or sequential
  • E.g., age, temperature, earnings

Ordering direction

Data ordering types: sequential, diverging and Cyclical

Sequential

  • The is a full range with a clear minimum
  • You can perform arithmetic
  • E.g., age, goals ⚽️, price 💰

Divergent

  • There is a middle point
  • And two opposite directions
  • Sometimes the middle point is not zero
  • E.g., temperature, earnings, political affiliation index

Cyclical

  • There is a cycle in the data
  • No starting point (?)
  • Can be represented with cyclical channels
  • E.g., Days of the week, hours in the day

Data Aggregation

Aggregating Spatial data

E.g., count votes by:
  • County
  • City
  • State
  • Country
  • Zipcode? Can span multiple counties 🤷
  • Custom areas

Aggregating Temporal data

By truncating the date
  • 2020 June 27 5:25PM
  • 2020 June 27 5:00PM
  • 2020 June 27
  • 2020 June
  • 2020 Q1
  • 2020
Take a look at the date object

Aggregating Temporal data

By extracting parts of the date
  • 2020 June 27 5:25PM
  • Week 26 of the year
  • 5:00 - 6:00 PM
  • Saturday
  • June

What we learned

What we learned

  • Abstracting the data
  • Types of datasets
  • Types of attributes
  • Sequential, diverging, cyclical
  • Data aggregation (temporal and geo)

Practice

Introduction to JS

https://observablehq.com/@berkeleyvis/learn-js-data

JSON vs. CSV

Comma Separated File (CSV)

price,brand,model,title_status,mileage,color,lot
21500,ford,f-150,clean vehicle,76876,gray,167783132
7500,nissan,door,clean vehicle,58126,black,167598085
10700,chevrolet,trax,clean vehicle,44280,red,167792666
23100,dodge,charger,clean vehicle,12265,silver,167735423
33800,dodge,durango,clean vehicle,15003,black,167734879
4140,ford,door,clean vehicle,40747,white,167656519
13900,nissan,rogue,clean vehicle,38760,white,167762102
15700,ford,cutaway,clean vehicle,75862,white,167780452
1150,ford,door,salvage insurance,123349,red,167652717
26100,ford,f-150,clean vehicle,32149,white,167741409
14000,ford,fusion,clean vehicle,50513,white,167749355
22500,ford,door,clean vehicle,43646,black,167780692

JSON

[
  {
    "price": 21500,
    "brand": "ford",
    "model": "f-150",
    "title_status": "clean vehicle",
    "mileage": 76876,
    "color": "gray",
    "lot": 167783132
  },
  {
    "price": 7500,
    "brand": "nissan",
    "model": "door",
    "title_status": "clean vehicle",
    "mileage": 58126,
    "color": "black",
    "lot": 167598085
  },
  {
    "price": 10700,
    "brand": "chevrolet",
    "model": "trax",
    "title_status": "clean vehicle",
    "mileage": 44280,
    "color": "red",
    "lot": 167792666
  },
  {
    "price": 23100,
    "brand": "dodge",
    "model": "charger",
    "title_status": "clean vehicle",
    "mileage": 12265,
    "color": "silver",
    "lot": 167735423
  },
  {
    "price": 33800,
    "brand": "dodge",
    "model": "durango",
    "title_status": "clean vehicle",
    "mileage": 15003,
    "color": "black",
    "lot": 167734879
  },
  {
    "price": 4140,
    "brand": "ford",
    "model": "door",
    "title_status": "clean vehicle",
    "mileage": 40747,
    "color": "white",
    "lot": 167656519
  },
  {
    "price": 13900,
    "brand": "nissan",
    "model": "rogue",
    "title_status": "clean vehicle",
    "mileage": 38760,
    "color": "white",
    "lot": 167762102
  },
  {
    "price": 15700,
    "brand": "ford",
    "model": "cutaway",
    "title_status": "clean vehicle",
    "mileage": 75862,
    "color": "white",
    "lot": 167780452
  },
  {
    "price": 1150,
    "brand": "ford",
    "model": "door",
    "title_status": "salvage insurance",
    "mileage": 123349,
    "color": "red",
    "lot": 167652717
  },
  {
    "price": 26100,
    "brand": "ford",
    "model": "f-150",
    "title_status": "clean vehicle",
    "mileage": 32149,
    "color": "white",
    "lot": 167741409
  },
  {
    "price": 14000,
    "brand": "ford",
    "model": "fusion",
    "title_status": "clean vehicle",
    "mileage": 50513,
    "color": "white",
    "lot": 167749355
  },
  {
    "price": 22500,
    "brand": "ford",
    "model": "door",
    "title_status": "clean vehicle",
    "mileage": 43646,
    "color": "black",
    "lot": 167780692
  }
]
https://observablehq.com/@berkeleyvis/reading-in-data-learn-js-data

Array functions (map, filter, sort)

https://observablehq.com/@berkeleyvis/iterating-over-and-reducing-data-learn-js-data

Nesting, Folding, Pivoting, etc. in JS

https://observablehq.com/@berkeleyvis/grouping-data-learn-js-data