Consequences of sampling before analyzing data with DataSketches

Sergio Castro Wed, 18 Nov 2020 10:55:32 -0800

Hi, I am new to DataSketches.

 I know Datasketches provides an *approximate* calculation of statistics
with *mathematically proven error bounds*.


My question is:
Say that I am constrained to take a sampling of the original data set
before handling it to Datasketches (for example, I cannot take more than
10.000 random rows from a table).
What would be the consequence of this previous sampling in the
"mathematically proven error bounds" of the Datasketches statistics,
relative to the original data set?

Best,

Sergio

Consequences of sampling before analyzing data with DataSketches

Reply via email to