Re: Consequences of sampling before analyzing data with DataSketches

2020-11-19 Thread Justin Thaler
nal server > error). It might be down, or out-of-service. You might also check to make > sure it is the correct URL. > > Thanks! > Lee. > > On Thu, Nov 19, 2020 at 6:05 AM Justin Thaler > wrote: > >> I think the way to think about this is the following. If you downs

Re: Consequences of sampling before analyzing data with DataSketches

2020-11-19 Thread Justin Thaler
ketches configuration with a more balanced trade-off > between accuracy and memory requirements? > > Would you say this is a good best-effort strategy? Or in both cases you > would recommend me to use the same configuration ? > > Thanks for your time and feedback, > > Sergio &g

Re: Consequences of sampling before analyzing data with DataSketches

2020-11-18 Thread Justin Thaler
Lee's response is correct, but I'll elaborate slightly (hopefully this is helpful instead of confusing). There are some queries for which the following is true: if the data sample is uniform from the original (unsampled) data, then accurate answers with respect to the sample are also accurate with

Re: Regarding error bounds and confidence of apache KLL implementation

2020-06-23 Thread Justin Thaler
you change this strategy to >> lets say picking a level which overshoots its capacity by the >> largest amount? >> In my opinion, this strategy would free up more space and should >> lead to lesser number of compaction and hence decreasing empirical erro