Re: Using TABLESAMPLE on inner queries

2013-03-20 Thread Ramki Palle
You may use percent based (block sampling) sampling for non-bucketed tables, though there are some restrictions. https://cwiki.apache.org/Hive/languagemanual-sampling.html Regards, Ramki. On Wed, Mar 20, 2013 at 12:27 PM, Mark Grover wrote: > Hey Dean, > I am not a power user of the sampling f

Re: Using TABLESAMPLE on inner queries

2013-03-20 Thread Mark Grover
Hey Dean, I am not a power user of the sampling feature but my understanding was that sampling in Hive only works on bucketed tables. I am happy to be corrected though. Mark On Wed, Mar 20, 2013 at 12:20 PM, Dean Wampler < dean.wamp...@thinkbiganalytics.com> wrote: > Mark, > > Aside from what mi

Re: Using TABLESAMPLE on inner queries

2013-03-20 Thread Dean Wampler
Mark, Aside from what might be wrong here, isn't it true that sampling with the bucket clause still works on non-bucketed tables; it's just inefficient because it still scans the whole table? Or am I an idiot? ;) dean On Wed, Mar 20, 2013 at 2:17 PM, Mark Grover wrote: > Hi Robert, > Sampling i

Re: Using TABLESAMPLE on inner queries

2013-03-20 Thread Mark Grover
Hi Robert, Sampling in Hive is based on buckets. Therefore, you table needs to be appropriately bucketed. I would recommend storing the results of your inner query in a bucketed table. See how to populate a bucketed table at https://cwiki.apache.org/Hive/languagemanual-ddl-bucketedtables.html The