Visual Analytics Architecture for large table-based datasets

Master thesis project by Juan Camilo Ortiz Román working under the supervision of John Alexis Guerra Gómez

DUTO Project


Design and test a technique that allows visual analytics tools to sample, summarize and explore big datasets in a web context.

Problem Statement

Visual Analytics provide the user with tools to process data in a very intuitive way. One of the challenges Visual Analytics face nowadays is the need to represent big amounts of information in a way that the user can explore. This large amounts of data can not be managed by conventional machines and must be partitioned or underrepresented. This thesis project presents a technique of representative sampling for large table-based datasets.


Implement sampling algorithms in ElasticSearch frameworks that allows the retrieval of a representative sample of datasets between 400Mb and 4GB.
Benchmarking of the sampling methods, the discrepancy of its results and its execution times in a web context of large data selection.
Backend architecture that will connect with Navio.

Github Poster Thesis Document