Re: Sampling from a single column

2014-02-14 Thread Lefty Leverenz
> > ... saw a simple sample() function while browsing the documentation ... I grepped an export of the Hive wiki for 'sample(' and 'sample (' but only found tablesample in these three docs: - https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-Sampling - https://cwiki.

Re: Sampling from a single column

2014-02-12 Thread Navis류승우
If it should be sampled using subquery would be inevitable, something like, select x from (select distinct key as x from src)a where rand() > 0.9 limit 10; 2014-02-12 6:07 GMT+09:00 Oliver Keyes : > Hey all > > So, what I'm looking to do is get N randomly-sampled distinct values from > a colum

Sampling from a single column

2014-02-11 Thread Oliver Keyes
Hey all So, what I'm looking to do is get N randomly-sampled distinct values from a column in a table. I'm kind of flummoxed by how to do this without using TABLESAMPLE, which would require me to add Yet Another Subquery (it'd be 'select these values, from this sample, from these distinct values')