> > ... saw a simple sample() function while browsing the documentation ...
I grepped an export of the Hive wiki for 'sample(' and 'sample (' but only found tablesample in these three docs: - https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-Sampling - https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Sampling - https://cwiki.apache.org/confluence/display/Hive/HBaseBulkLoad#HBaseBulkLoad-PrepareRangePartitioning -- Lefty On Wed, Feb 12, 2014 at 8:19 PM, Navis류승우 <navis....@nexr.com> wrote: > If it should be sampled using subquery would be inevitable, something like, > > select x from (select distinct key as x from src)a where rand() > 0.9 > limit 10; > > > > 2014-02-12 6:07 GMT+09:00 Oliver Keyes <oke...@wikimedia.org>: > > Hey all >> >> So, what I'm looking to do is get N randomly-sampled distinct values from >> a column in a table. I'm kind of flummoxed by how to do this without using >> TABLESAMPLE, which would require me to add Yet Another Subquery (it'd be >> 'select these values, from this sample, from these distinct values'). I >> could swear I saw a simple sample() function while browsing the >> documentation just last week, but I'll be damned if I can find it again. >> Can anyone help me out, or is Yet Another Subquery the way to go? >> >> Thanks! >> > >