You may use percent based (block sampling) sampling for non-bucketed
tables, though there are some restrictions.
https://cwiki.apache.org/Hive/languagemanual-sampling.html
Regards,
Ramki.
On Wed, Mar 20, 2013 at 12:27 PM, Mark Grover
wrote:
> Hey Dean,
> I am not a power user of the sampling f
Hey Dean,
I am not a power user of the sampling feature but my understanding was that
sampling in Hive only works on bucketed tables. I am happy to be corrected
though.
Mark
On Wed, Mar 20, 2013 at 12:20 PM, Dean Wampler <
dean.wamp...@thinkbiganalytics.com> wrote:
> Mark,
>
> Aside from what mi
Mark,
Aside from what might be wrong here, isn't it true that sampling with the
bucket clause still works on non-bucketed tables; it's just inefficient
because it still scans the whole table? Or am I an idiot? ;)
dean
On Wed, Mar 20, 2013 at 2:17 PM, Mark Grover wrote:
> Hi Robert,
> Sampling i
Hi Robert,
Sampling in Hive is based on buckets. Therefore, you table needs to be
appropriately bucketed.
I would recommend storing the results of your inner query in a bucketed
table. See how to populate a bucketed table at
https://cwiki.apache.org/Hive/languagemanual-ddl-bucketedtables.html
The