Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data
... to work effectively with heterogeneous, realworld data and to extract insights from the data using the latest tools and analytical methods.
I  II  III  IV  

x  y  x  y  x  y  x  y 
10.0  8.04  10.0  9.14  10.0  7.46  8.0  6.58 
8.0  6.95  8.0  8.14  8.0  6.77  8.0  5.76 
13.0  7.58  13.0  8.74  13.0  12.74  8.0  7.71 
9.0  8.81  9.0  8.77  9.0  7.11  8.0  8.84 
11.0  8.33  11.0  9.26  11.0  7.81  8.0  8.47 
14.0  9.96  14.0  8.10  14.0  8.84  8.0  7.04 
6.0  7.24  6.0  6.13  6.0  6.08  8.0  5.25 
4.0  4.26  4.0  3.10  4.0  5.39  19.0  12.50 
12.0  10.84  12.0  9.13  12.0  8.15  8.0  5.56 
7.0  4.82  7.0  7.26  7.0  6.42  8.0  7.91 
5.0  5.68  5.0  4.74  5.0  5.73  8.0  6.89 
Property  Value 

Mean of x  9 
Variance of x  11 
Mean of y  7.50 
Variance of y  4.125 
Correlation between x and y  0.816 
Linear regression  y = 3.00 + 0.500x 
Coefficient of determination of the linear regression  0.67 
Ask friends and family
That's inferring statistics from a sample n=1
Data based decisions
1D Linear  Document Lens, SeeSoft, Info Mural 
2D Map  GIS, ArcView, PageMaker, Medical imagery 
3D World  CAD, Medical, Molecules, Architecture 
MultiVar  Spotfire, Tableau, GGobi, TableLens, ParCoords, 
Temporal  LifeLines, TimeSearcher, Palantir, DataMontage, LifeFlow 
Tree  Cone/Cam/Hyperbolic, SpaceTree, Treemap, Treeversity 
Network  Gephi, NodeXL, Sigmajs 
Too ambiguous!! π€¦π½ββοΈ Let's go beyond that
Can you fit it in one computer?
Yes? ππΌ Then, is not really big π€·π½ββοΈ
Big data ππΌ Big overhead
Big Data? π π½ββοΈ
How do you compute this?
80+ trillion photos (80'''000''000'000.000)
That's big data
How do you compute this?
Big Data? ππΌ Only if it doesn't fit on one π»
β οΈ Use it only if you must β οΈ
My wife tells it to me all the time!
Machine Learning?
Develop locally
Task: Change in drug's adverse effects reports
User: FDA Analysts
Task: Detect fraud networks
User: Undisclosed Analysts
A distributed computing alternative of to map reduce.
Traditional
Pros:
Cons:
 Data Mining/ML
Pros:
Cons:
 InfoVis
Pros:
Cons
