the true objective of big data

John Alexis Guerra GΓ³mez

@duto_guerra

http://johnguerra.co/viz/bigDataQuestSU

Use** spacebar** and the arrows to advance slides

- Volume
- Velocity
- Variety
- and Veracity and Value

Too ambiguous!! π€¦π½ββοΈ Let's go beyond that

Can you fit it in one computer?

Yes? ππΌ Then, is not really big π€·π½ββοΈ

Big data ππΌ Big overhead

- One photo ππΌ 10MB
- 1k photos in a π± ππΌ 10MB * 1k = 10000MB = 10GB
- 50k photos in your π» ππΌ 10MB * 50k = 500GB

Big Data? π π½ββοΈ

How do you compute this?

- Put all your photos in one π»
- Go through all the collection and count the blue ones

80+ trillion photos (80'''000''000'000.000)

That's big data

How do you compute this?

- Distribute the data among 100s of π»π»π»s. (a cluster)
- Compute subtotals on each data part. (Map)
- Aggregate the subtotals into one big total. (Reduce)

Big Data? ππΌ Only if it doesn't fit on one π»

β οΈ Use it only if you must β οΈ

My wife tells it to me all the time!

Making Sense of Data

- Statistical Analysis
- Machine Learning and Artificial Intelligence
- Visual Analytics (and data analytics)

I | II | III | IV | ||||
---|---|---|---|---|---|---|---|

x | y | x | y | x | y | x | y |

10.0 | 8.04 | 10.0 | 9.14 | 10.0 | 7.46 | 8.0 | 6.58 |

8.0 | 6.95 | 8.0 | 8.14 | 8.0 | 6.77 | 8.0 | 5.76 |

13.0 | 7.58 | 13.0 | 8.74 | 13.0 | 12.74 | 8.0 | 7.71 |

9.0 | 8.81 | 9.0 | 8.77 | 9.0 | 7.11 | 8.0 | 8.84 |

11.0 | 8.33 | 11.0 | 9.26 | 11.0 | 7.81 | 8.0 | 8.47 |

14.0 | 9.96 | 14.0 | 8.10 | 14.0 | 8.84 | 8.0 | 7.04 |

6.0 | 7.24 | 6.0 | 6.13 | 6.0 | 6.08 | 8.0 | 5.25 |

4.0 | 4.26 | 4.0 | 3.10 | 4.0 | 5.39 | 19.0 | 12.50 |

12.0 | 10.84 | 12.0 | 9.13 | 12.0 | 8.15 | 8.0 | 5.56 |

7.0 | 4.82 | 7.0 | 7.26 | 7.0 | 6.42 | 8.0 | 7.91 |

5.0 | 5.68 | 5.0 | 4.74 | 5.0 | 5.73 | 8.0 | 6.89 |

Property | Value |
---|---|

Mean of x |
9 |

Variance of x |
11 |

Mean of y |
7.50 |

Variance of y |
4.125 |

Correlation between x and y |
0.816 |

Linear regression | y = 3.00 + 0.500x |

Coefficient of determination of the linear regression | 0.67 |

https://dabblingwithdata.wordpress.com/2017/05/03/the-datasaurus-a-monstrous-anscombe-for-the-21st-century/

https://dabblingwithdata.wordpress.com/2017/05/03/the-datasaurus-a-monstrous-anscombe-for-the-21st-century/

- Deep understanding
- Meaningful
- Non obvious
- Actionable
- Based on data

Ask friends and family

That's inferring statistics from a sample n=1

Data based decisions

http://tucarro.com

Sure, If it doesn't fit on a computer

Size doesn't matter

http://johnguerra.co/viz/saber11/

No need to wait for Stanford, MIT or Berkeley to help you

- Size doesn't matter
- ππΌ Insights! ππΌ
- Open data and share
- Ask for infovis

Task: Change in drug's adverse effects reports

User: FDA Analysts

Task: Detect fraud networks

User: Undisclosed Analysts

- Infographics
- Scientific Visualization (sciviz)
- Information Visualization (infovis, datavis)

- Inherently spatial
- 2D and 3D

- Overview first
- Zoom and Filter
- Details on Demand

1-D Linear |
Document Lens, SeeSoft, Info Mural |

2-D Map |
GIS, ArcView, PageMaker, Medical imagery |

3-D World |
CAD, Medical, Molecules, Architecture |

Multi-Var |
Spotfire, Tableau, GGobi, TableLens, ParCoords, |

Temporal |
LifeLines, TimeSearcher, Palantir, DataMontage, LifeFlow |

Tree |
Cone/Cam/Hyperbolic, SpaceTree, Treemap, Treeversity |

Network |
Gephi, NodeXL, Sigmajs |