Reducing Attributes and Rows
Information Visualization

What We Are Going to Learn

  • Reduce
    • Items
    • Attributes
  • Aggregation
    • Item
    • Spatial
    • Time
  • Dimensionality Reduction
  • Embed, Focus and Context
  • Exploratory Data Analysis

Reduce Items and Attributes

Reduce Items and Attributes

  • Reduce/increase: inverses
  • Filter
    • Pro: straightforward and intuitive
      • To understand and compute
    • Con: out of sight, out of mind
  • Aggregation
    • Pro: inform about whole set
    • Con: difficult to avoid losing signal
  • Not mutually exclusive
    • Combine filter, aggregate
    • Combine reduce, change, facet
Filter by items and by attributes; aggregate by items and by attributes; reduce filter and aggregate

Item Filtering

Crossfiltering

  • Item filtering
  • Coordinated views/controls combined
  • All scented histogram bisliders update when any ranges change

Faceted Search

Idiom: Scented Widgets

Scented Widgets Paper

http://vis.berkeley.edu/papers/scented_widgets/

Navio

Navio Demo
https://navio.dev

Attribute Filtering

DOSFA Paper

http://www.cs.ubc.ca/~tmm/courses/cpsc533c-04-spr/readings/dimorder.pdf

Navio Load Notebook

UMAP Playground

Dimensionality Reduction

Aggregation: Hierarchichal Cluster Explorer

Item Aggregation

Idiom: Histogram

  • Static item aggregation
  • Task: find distribution
  • Data: table
  • Derived data
    • New table: keys are bins, values are counts
  • Bin size crucial
  • Pattern can change dramatically depending on discretization
  • Opportunity for interaction: control bin size on the fly
Histogram

Idiom: Boxplot

  • Static item aggregation
  • Task: find distribution
  • Data: table
  • Derived data
    • Five quantitative attributes
      • Median: central line
      • Lower and upper quartile: boxes
      • Lower upper fences: whiskers
        • Values beyond which items are outliers
    • Outliers beyond fence cutoffs explicitly shown
Boxplot
[40 years of boxplots. Wickham and Stryjewski. 2012. had.co.nz]

Box Plot

http://blockbuilder.org/mbostock/4061502by mbostock

Violin Plot

http://blockbuilder.org/asielen/92929960988a8935d907e39e60ea8417by asielen

Idiom: 2D Density Plots

  • Scatterplot meets heatmap
    • Derived data:
      • Tesselate space info areas
      • Count number of elements falling on that area
    • Mark: dots (boxes)
    • Channels:
      • Position: location of areas
      • Color (brightness): number of elements
      • Marks (re-)ordered by cluster hierarchy traversal
    • Tasks: summarize distribution
    • Scalability:
      • Millions of rows (might require preprocessing)

Interactive Density Plot

Idiom: Hierarchical Parallel Coordinates

  • Dynamic item aggregation
  • Derived data: hierarchical clustering
  • Encoding:
    • Cluster band with variable transparency, line at mean, width by min/max values
    • Color by proximity in hierarchy
[Hierarchical Parallel Coordinates for Exploration of Large Datasets. Fua, Ward, and Rundensteiner. Proc. IEEE Visualization Conference (Vis ’99), pp. 43– 50, 1999.]

Spatial Aggregation

Geo Level

  • Country
  • State
  • City
  • Neighborhood

Aggregation Problems

  • MAUP: Modifiable Areal Unit Problem
  • Gerrymandering (manipulating voting district boundaries) is only one example!
  • Zone effects
  • Scale effects
Gerrymandering
[http://www.e-education.psu/edu/geog486/l4_p7.html, Fig 4.cg.6]

Overlapping

  • ZIP codes
  • Disputed borders

Regions

  • Aggregate by commonalities
    • e.g. Agricultural vs. industrial regions
    • e.g. Historically right- vs. left-wing
  • Aggregate by the data attributes

Geo patterns vs. political patterns

Time Aggregation

Date Part vs. Truncate

  • Date part: extract a part of the date
  • Date truncate: cut the date at a certain level

Date Truncate

  • Different levels can hide seasonality.
  • Sometimes, too much detail is unnecessary.

Truncate dates

Date Part

  • Useful for highlighting human patterns
    • Weekends
    • Night time
    • Holidays
    • Summer vs. winter

Aggregate by date parts

Window Average/Median

Covid Moving Average by state
NY Times How Coronavirus Cases Have Risen Since States Reopened July 9th 2020

Dimensionality Reduction

Dimensionality Reduction

  • Attribute aggregation
    • Derive low-dimensional target space from high-dimensional measured space
      • Capture most of variance with minimal error
    • Use when you can’t directly measure what you care about
    • True dimensionality of dataset conjectured to be smaller than dimensionality of measurements
    • Latent factors, hidden variables
Taking tumor measurement data in 9D measured space and running dimensionality reduction derives that data in a 2D target space where it is easier to see groupings of benign and malignant tumors

Dimensionality Reduction for Documents

Dimensionality vs. Attribute Reduction

  • Vocab use in field not consistent
    • Dimension/attribute
  • Attribute reduction: reduce set with filtering
    • Includes orthographic projection
  • Dimensionality reduction (DR): create smaller set of new dimensionss/attributes
    • Typically implies dimensional aggregation, not just filtering
    • Vocabulary: projection/mapping

Estimating True Dimensionality

  • How do you know when you would benefit from DR?
    • Consider error for low-dim projection vs. high-dim projection
  • No single correct answer; many metrics proposed
    • Cumulative variance that is not accounted for
    • Strain: match variations in distance (vs. actual distance values)
    • Stress: difference between interpoint distances in high and low dimensionss
Stresss Function

Estimating True Dimensionality

  • Scree plots as simple way: error against number of attributes
    • Original dataset: 294 dimensions
    • Estimate: Almost all variance preserved with less than 20 dimensions
Spree Plots
[Fig 2. DimStiller: Workflows for dimensional analysis and reduction. Ingram et al. Proc. VAST 2010, p 3-10]

Dimensionality Reduction and Visualization

  • Why do people do DR?
    • Improve performance of downstream algorithm
      • Avoid curse of dimensionality
    • Data analysis
      • If looking at the output: visual data analysis
  • Abstract tasks when visualizing DR data
    • Dimension-oriented tasks
    • Naming synthesized dimensions, mapping synthesized dimensions to original dimensions
  • Cluster-oriented tasks
    • Verifying clusters, naming clusters, matching clusters and classes
[Visualizing Dimensionally-Reduced Data: Interviews with Analysts and a Characterization of Task Sequences. Brehmer, Sedlmair, Ingram, and Munzner. Proc. BELIV 2014.]

Linear Dimensionality Reduction

  • Principal components analysis (PCA)
    • Finding axes: first with most variance, second with next most, etc.
    • Describe location of each point as linear combination of weights for each axis
      • Mapping synthesized dimensions to original dimensions
Linear Dimensionality Reduction
[http://en.wikipedia.org/wiki/File:GaussianScatterPCA.png]

Nonlinear Dimensionality Reduction

  • Pro: can handle curved rather than linear structure
  • Con: lose all ties to original dimensions/attributes
    • New dimensions often cannot be easily related to originals
      • Mapping synthesized dims to original dims task is difficult
  • Many techniques proposed
  • Many literatures: visualization, machine learning, optimization, psychology, etc.
  • Techniques: t-SNE, MDS (multidimensional scaling), charting, isomap, LLE, etc.
  • t-SNE: excellent for clusters
    • But some trickiness remains: a(href="http://distill.pub/2016/misread-tsne/") [How to Use t-SNE Effectively]
  • MDS: confusingly, entire family of techniques, both linear and nonlinear
    • Minimize stress or strain metrics
    • Early formulations equivalent to PCA

t-SNE Explorations

http://distill.pub/2016/misread-tsne/

Interactive T-SNE

Project by Fabián Peña
MLExplore.js: Exploring High-Dimensional Data by Interacting and Interpreting t-SNE and K-Means

Embed, Focus+Context

Embed: Focus+Context

  • Combine information within single view
  • Elide
    • Selectively filter and aggregate
  • Superimpose layer
    • Local lens
  • Distortion design choices
    • Region shape: radial, rectilinear, complex
    • How many regions: one, many
    • Region extent: local, global
    • Interaction metaphor
elide data, superimpose data, distort geometry

Idiom: DOITrees Revisited

  • Elide
    • Some items dynamically filtered out
    • Some items dynamically aggregated together
    • Some items shown in detail
[DOITrees Revisited: Scalable, Space-Constrained Visualization of Hierarchical Data. Heer and Card. Proc. Advanced Visual Interfaces (AVI), pp. 421–424, 2004.]

Idiom: Fisheye Lens

  • Distort geometry
    • Shape: radial
    • Focus: single extent
    • Extent: local
    • Metaphor: draggable lens

Fisheye

https://bost.ocks.org/mike/fisheye/by mbostock

Idiom: Stretch and Squish Navigation

System: TreeJuxtaposer
  • Distort geometry
    • Shape: rectilinear
    • Foci: multiple
    • Impact: global
    • Metaphor: stretch and squish, borders fixed
[https://youtu.be/GdaPj8a9QEo]
[TreeJuxtaposer: Scalable Tree Comparison Using Focus+Context With Guaranteed Visibility. Munzner, Guimbretiere, Tasiran, Zhang, and Zhou. ACM Transactions on Graphics (Proc. SIGGRAPH) 22:3 (2003), 453– 462.]

Distortion Costs and Benefits

  • Benefits
    • Combine focus and context information in single view
  • Costs
    • Length comparisons impaired
      • Network/tree topology comparisons unaffected: connection, containment
  • Effects of distortion unclear if original structure unfamiliar
  • Object constancy/tracking may be impaired
[https://www.youtube.com/watch?v=hm2oFBqVM9o]
[Living Flows: Enhanced Exploration of Edge-Bundled Graphs Based on GPU-Intensive Edge Rendering. Lambert, Auber, and Melançon. Proc. Intl. Conf. Information Visualisation (IV), pp. 523–530, 2010.]

What We Learned

  • Reduce
    • Items
    • Attributes
  • Aggregation
    • Item
    • Spatial
    • Time
  • Dimensionality Reduction
  • Embed, Focus and Context
  • Exploratory Data Analysis