Re: Sampling from a single column

Lefty Leverenz Fri, 14 Feb 2014 02:55:14 -0800

>
> ... saw a simple sample() function while browsing the documentation ...



I grepped an export of the Hive wiki for 'sample(' and 'sample (' but only
found tablesample in these three docs:

   -
   https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-Sampling
   -
   https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Sampling
   -
   
https://cwiki.apache.org/confluence/display/Hive/HBaseBulkLoad#HBaseBulkLoad-PrepareRangePartitioning


-- Lefty


On Wed, Feb 12, 2014 at 8:19 PM, Navis류승우 <navis....@nexr.com> wrote:

> If it should be sampled using subquery would be inevitable, something like,
>
> select x from (select distinct key as x from src)a where rand() > 0.9
> limit 10;
>
>
>
> 2014-02-12 6:07 GMT+09:00 Oliver Keyes <oke...@wikimedia.org>:
>
> Hey all
>>
>> So, what I'm looking to do is get N randomly-sampled distinct values from
>> a column in a table. I'm kind of flummoxed by how to do this without using
>> TABLESAMPLE, which would require me to add Yet Another Subquery (it'd be
>> 'select these values, from this sample, from these distinct values'). I
>> could swear I saw a simple sample() function while browsing the
>> documentation just last week, but I'll be damned if I can find it again.
>> Can anyone help me out, or is Yet Another Subquery the way to go?
>>
>> Thanks!
>>
>
>

Re: Sampling from a single column

Reply via email to