Re: Tablesample doubling

2013-07-29 Thread Stephen Sprague
+1 for documentation. sometimes it surprises you. :) On Mon, Jul 29, 2013 at 7:11 PM, j.barrett Strausser < j.barrett.straus...@gmail.com> wrote: > Nevermind I see in the docs, it is rows PER SPLIT. > > -b > > > On Mon, Jul 29, 2013 at 9:52 PM, j.barrett Strausser < > j.barrett.straus...@gmail.

Re: Tablesample doubling

2013-07-29 Thread j.barrett Strausser
Nevermind I see in the docs, it is rows PER SPLIT. -b On Mon, Jul 29, 2013 at 9:52 PM, j.barrett Strausser < j.barrett.straus...@gmail.com> wrote: > SELECT COUNT(*) FROM sparse_features_small; > > And I receive back : > > Total MapReduce CPU Time Spent: 3 seconds 330 msec > OK > 10 > > Rath

Re: Tablesample doubling

2013-07-29 Thread j.barrett Strausser
SELECT COUNT(*) FROM sparse_features_small; And I receive back : Total MapReduce CPU Time Spent: 3 seconds 330 msec OK 10 Rather than the expected 5 I am running hive 11.2 On Mon, Jul 29, 2013 at 9:51 PM, j.barrett Strausser < j.barrett.straus...@gmail.com> wrote: > Hello All, > >

Tablesample doubling

2013-07-29 Thread j.barrett Strausser
Hello All, Why does TABLESAMPLE(N rows) produce ouptut with 2*N rows? I have the following script: DROP TABLE IF EXISTS sparse_features_small; CREATE TABLE sparse_features_small ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' as SELECT * FROM sparse_feat