Sorry, if you presample your data all bets are off in terms of accuracy. On Wed, Nov 18, 2020 at 10:55 AM Sergio Castro <sergio...@gmail.com> wrote:
> Hi, I am new to DataSketches. > > I know Datasketches provides an *approximate* calculation of statistics > with *mathematically proven error bounds*. > > My question is: > Say that I am constrained to take a sampling of the original data set > before handling it to Datasketches (for example, I cannot take more than > 10.000 random rows from a table). > What would be the consequence of this previous sampling in the > "mathematically proven error bounds" of the Datasketches statistics, > relative to the original data set? > > Best, > > Sergio >