I don't know anything about statistics but in your case, duplicating
splits(x100?) by using custom InputFormat might much simpler.
2013/9/6 Sameer Agarwal
> Hi All,
>
> In order to support approximate queries in Hive and BlinkDB (
> http://blinkdb.org/), I am working towards implementing the bo
Hi All,
In order to support approximate queries in Hive and BlinkDB (
http://blinkdb.org/), I am working towards implementing the bootstrap
primitive (http://en.wikipedia.org/wiki/Bootstrapping_(statistics)) in Hive
that can help us quantify the "error" incurred by a query Q when it
operates on a