Re: Block Sampling

Carl Steinbach Fri, 15 Jun 2012 12:20:03 -0700

Hi Anand,

This feature was implemented in HIVE-2121 and appeared in Hive 0.8.0.


Ref: https://issues.apache.org/jira/browse/HIVE-2121

Thanks.

Carl

On Fri, Jun 15, 2012 at 11:59 AM, Ladda, Anand <lan...@microstrategy.com>wrote:

>  Has the block sampling feature been added to one of the latest (Hive 0.8
> or Hive 0.9) releases. The wiki has the blurb below on block sampling****
>
> *Block Sampling*
>
> It is a feature that is still on trunk and is not yet in any release
> version.****
>
> block_sample: TABLESAMPLE (n PERCENT)****
>
> This will allow Hive to pick up at least n% data size (notice it doesn't
> necessarily mean number of rows) as inputs. Only CombineHiveInputFormat is
> supported and some special compression formats are not handled. If we fail
> to sample it, the input of MapReduce job will be the whole table/partition.
> We do it in HDFS block level so that the sampling granularity is block
> size. For example, if block size is 256MB, even if n% of input size is only
> 100MB, you get 256MB of data.****
>
> In the following example the input size 0.1% or more will be used for the
> query.****
>
> SELECT * ** **
>
> FROM source TABLESAMPLE(0.1 PERCENT) s; ****
>
> Sometimes you want to sample the same data with different blocks, you can
> change this seed number:****
>
> set hive.sample.seednumber=<INTEGER>;****
>
> ** **
>

Re: Block Sampling

Reply via email to