Great analysis. Couldn't agree more.
2013/5/29 mehmet <[email protected]> > I tried your code on 0.10 and it gives the same result. I can logically > explain why it gives you this result, although I am not convinced that > would be the desired outcome. > > If you think of it as there is a synthetic implicit key 'all' in each > tuple, and you are grouping over that, you can see why there is no output: > no tuples, nothing to group over (no reducer sees the key 'all', because > it doesn't exist). Although, I would contend that when there are no > tuples, it is might be ideal to output (all,{}) as the output of the group > all. > > > ________________________________ > From: Marco Brinkmann <[email protected]> > To: [email protected] > Sent: Wednesday, May 29, 2013 8:43 AM > Subject: Re: Count empty relation after filtering > > > I tried to explain why in my basic understanding an operation in a foreach > (count, count_star or anything else) will not leed to any success. And I > still appreciate any hints or tricks to achieve the above. > > > 2013/5/29 Shahab Yunus <[email protected]> > > > So basically this means that we were trying to look at this from RDBMS' > SQL > > perspective where 'SELECT COUNT(*) FROM TABLE' returns 0 even if there is > > nothing in the result set and that is why we ignored the possibility that > > FOREACH might not being executed at all (which could be by design)? > > > > -Shahab > > > > > > On Wed, May 29, 2013 at 10:13 AM, Marco Brinkmann > > <[email protected]>wrote: > > > > > Thanks, but this does not change anything. My personal guess (and I > only > > > work for a few days with pig) is that FOREACH will never be executed, > > > because the relation 'test' is empty. > > > > > > > > > 2013/5/29 Shahab Yunus <[email protected]> > > > > > > > Try COUNT_STAR. > > > > > > > > -Shahab > > > > > > > > > > > > On Wed, May 29, 2013 at 9:55 AM, Marco Brinkmann < > > > [email protected] > > > > >wrote: > > > > > > > > > Hi everybody, > > > > > > > > > > I have a rather simple question and scenario, but still I could not > > > find > > > > an > > > > > answer in the documention or in other resource: > > > > > > > > > > id, valid > > > > > (1, false) > > > > > (2, false) > > > > > > > > > > records = LOAD 'test.csv' USING PigStorage(',') AS (id:long, > > > > > valid:boolean); > > > > > > > > > > test = FILTER records BY valid == true; > > > > > test_count = FOREACH (GROUP test ALL) GENERATE COUNT(test); > > > > > > > > > > DUMP test_count; > > > > > > > > > > > > > > > I would expect that 'valid_count' nows contains '0'. But the dump > is > > > > > completely empty (with 'valid == false' I get '(2)' as expected). I > > use > > > > pig > > > > > 0.11.1. > > > > > > > > > > Could someone point me in the right direction? > > > > > > > > > > Cheers, Marco > > > > > > > > > > > > > > >
