Hi Anand, This feature was implemented in HIVE-2121 and appeared in Hive 0.8.0.
Ref: https://issues.apache.org/jira/browse/HIVE-2121 Thanks. Carl On Fri, Jun 15, 2012 at 11:59 AM, Ladda, Anand <lan...@microstrategy.com>wrote: > Has the block sampling feature been added to one of the latest (Hive 0.8 > or Hive 0.9) releases. The wiki has the blurb below on block sampling**** > > *Block Sampling* > > It is a feature that is still on trunk and is not yet in any release > version.**** > > block_sample: TABLESAMPLE (n PERCENT)**** > > This will allow Hive to pick up at least n% data size (notice it doesn't > necessarily mean number of rows) as inputs. Only CombineHiveInputFormat is > supported and some special compression formats are not handled. If we fail > to sample it, the input of MapReduce job will be the whole table/partition. > We do it in HDFS block level so that the sampling granularity is block > size. For example, if block size is 256MB, even if n% of input size is only > 100MB, you get 256MB of data.**** > > In the following example the input size 0.1% or more will be used for the > query.**** > > SELECT * ** ** > > FROM source TABLESAMPLE(0.1 PERCENT) s; **** > > Sometimes you want to sample the same data with different blocks, you can > change this seed number:**** > > set hive.sample.seednumber=<INTEGER>;**** > > ** ** >