Hi,

Can anyone please help me in finding the root-cause of this issue?

Thanks,
Joel

On Wed, Nov 18, 2015 at 1:04 AM, Sam Joe <[email protected]> wrote:

> Hi Andrew, I tried that too. Every field has got correct data.
>
> Thanks,
> Joel
>
> On Wed, Nov 18, 2015 at 12:55 AM, Andrew Oliver <[email protected]>
>  wrote:
>
>> Project just screen_name. If it is blank or empty you have your answer.
>> On Nov 17, 2015 23:47, "Sam Joe" <[email protected]> wrote:
>>
>> > debug is on. verbose have to try.
>> >
>> > Thx.
>> >
>> > On Tue, Nov 17, 2015 at 11:45 PM, Arvind S <[email protected]>
>> wrote:
>> >
>> > > have you tried
>> > > grunt> set debug on;
>> > > grunt> set verbose on;
>> > >
>> > > this gives some counters which might help ..
>> > >
>> > >
>> > > *Cheers !!*
>> > > Arvind
>> > >
>> > > On Wed, Nov 18, 2015 at 9:51 AM, Sam Joe <[email protected]>
>> > wrote:
>> > >
>> > > > Hi Arvind,
>> > > >
>> > > > Thanks but I ensured that each element is populated to their
>> respective
>> > > > fields. I also ensured that the data is clean since the record
>> which is
>> > > > getting eliminated is getting processed fine if only one record is
>> > > > processed.
>> > > >
>> > > > How to find the root-cause? I am not getting anything from the
>> server
>> > > logs
>> > > > or from the application logs. Is there any place I should look?
>> > > >
>> > > >
>> > > > Thanks,
>> > > > Joel
>> > > >
>> > > > On Tue, Nov 17, 2015 at 11:06 PM, Arvind S <[email protected]>
>> > > wrote:
>> > > >
>> > > > > Hi ..
>> > > > > if you are reading json then ensure that the file content is
>> parsed
>> > > > correct
>> > > > > by pig before you do grouping.
>> > > > > Simple dump sometimes does not show if the json was parsed into
>> > > multiple
>> > > > > columns or entire line was read as one string into the 1st column
>> > only.
>> > > > >
>> > > > >
>> > > > >
>> > > > > *Cheers !!*
>> > > > > Arvind
>> > > > >
>> > > > > On Wed, Nov 18, 2015 at 4:59 AM, Sam Joe <[email protected]
>> >
>> > > > wrote:
>> > > > >
>> > > > > > Hi Arvind,
>> > > > > >
>> > > > > > You are right. It works fine in local mode. No records
>> eliminated.
>> > > > > >
>> > > > > > I need to now find out why while using mapreduce mode some
>> records
>> > > are
>> > > > > > getting eliminated.
>> > > > > >
>> > > > > > Any suggestions on troubleshooting steps for finding out the
>> > > root-cause
>> > > > > in
>> > > > > > mapreduce mode? Which logs to be checked, etc.
>> > > > > >
>> > > > > > Appreciate any help!
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Joel
>> > > > > >
>> > > > > > On Mon, Nov 16, 2015 at 11:32 PM, Arvind S <
>> [email protected]>
>> > > > > wrote:
>> > > > > >
>> > > > > > > tested on pig .15 using your data and in local mode .. could
>> not
>> > > > > > reproduce
>> > > > > > > issue ..
>> > > > > > > ==================================================
>> > > > > > > final_by_lsn_g = GROUP final_by_lsn BY screen_name;
>> > > > > > >
>> > > > > > > (Ian_hoch,{(en,Ian_hoch)})
>> > > > > > > (gwenshap,{(en,gwenshap)})
>> > > > > > > (p2people,{(en,p2people)})
>> > > > > > > (DoThisBest,{(en,DoThisBest)})
>> > > > > > > (wesleyyuhn1,{(en,wesleyyuhn1)})
>> > > > > > > (GuitartJosep,{(en,GuitartJosep)})
>> > > > > > > (Komalmittal91,{(en,Komalmittal91)})
>> > > > > > > (LornaGreenNWC,{(en,LornaGreenNWC)})
>> > > > > > > (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ)})
>> > > > > > > (innovatesocialm,{(en,innovatesocialm)})
>> > > > > > > ==================================================
>> > > > > > > final_by_lsn_g = GROUP final_by_lsn BY language;
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> (en,{(en,DoThisBest),(en,wesleyyuhn1),(en,W4_Jobs_in_ARZ),(en,p2people),(en,Ian_hoch),(en,Komalmittal91),(en,innovatesocialm),(en,gwenshap),(en,GuitartJosep),(en,LornaGreenNWC)})
>> > > > > > > ==================================================
>> > > > > > >
>> > > > > > > suggestions ..
>> > > > > > > > try in local mode to reporduce issue .. (if you have not
>> > already
>> > > > done
>> > > > > > so)
>> > > > > > > > close all old sessions and open a new one... (i know its
>> > > dumb..but
>> > > > > > helped
>> > > > > > > me some times)
>> > > > > > >
>> > > > > > >
>> > > > > > > *Cheers !!*
>> > > > > > > Arvind
>> > > > > > >
>> > > > > > > On Tue, Nov 17, 2015 at 8:09 AM, Sam Joe <
>> > [email protected]>
>> > > > > > wrote:
>> > > > > > >
>> > > > > > > > Hi,
>> > > > > > > >
>> > > > > > > > I reproduced the issue with less columns as well.
>> > > > > > > >
>> > > > > > > > dump final_by_lsn;
>> > > > > > > >
>> > > > > > > > (en,LornaGreenNWC)
>> > > > > > > > (en,GuitartJosep)
>> > > > > > > > (en,gwenshap)
>> > > > > > > > (en,innovatesocialm)
>> > > > > > > > (en,Komalmittal91)
>> > > > > > > > (en,Ian_hoch)
>> > > > > > > > (en,p2people)
>> > > > > > > > (en,W4_Jobs_in_ARZ)
>> > > > > > > > (en,wesleyyuhn1)
>> > > > > > > > (en,DoThisBest)
>> > > > > > > >
>> > > > > > > > grunt> final_by_lsn_g = GROUP final_by_lsn BY screen_name;
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > grunt> dump final_by_lsn_g;
>> > > > > > > >
>> > > > > > > > (gwenshap,{(en,gwenshap)})
>> > > > > > > > (p2people,{(en,p2people),(en,p2people),(en,p2people)})
>> > > > > > > >
>> > > > >
>> > (GuitartJosep,{(en,GuitartJosep),(en,GuitartJosep),(en,GuitartJosep)})
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ)})
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > Steps I tried to find the root-cause:
>> > > > > > > > - Removing special characters from the data
>> > > > > > > > - Setting the loglevel to 'Debug'
>> > > > > > > > However, I couldn't find a clue about the problem.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > Can someone please help me troubleshoot the issue?
>> > > > > > > >
>> > > > > > > > Thanks,
>> > > > > > > > Joel
>> > > > > > > >
>> > > > > > > > On Fri, Nov 13, 2015 at 12:18 PM, Steve Terrell <
>> > > > > [email protected]
>> > > > > > >
>> > > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > Please try reproducing the problem with the smallest
>> amount
>> > of
>> > > > data
>> > > > > > > > > possible.  Use as few rows and the smallest strings
>> possible
>> > > that
>> > > > > > still
>> > > > > > > > > demonstrate the discrepancy.  And then repost your
>> problem.
>> > In
>> > > > > doing
>> > > > > > > so,
>> > > > > > > > > it will make your request easier to digest by the readers
>> of
>> > > > group,
>> > > > > > and
>> > > > > > > > you
>> > > > > > > > > might even discover a problem in your original data if you
>> > can
>> > > > not
>> > > > > > > > > reproduce it on a smaller scale.
>> > > > > > > > >
>> > > > > > > > > Thanks,
>> > > > > > > > >     Steve
>> > > > > > > > >
>> > > > > > > > > On Fri, Nov 13, 2015 at 10:28 AM, Sam Joe <
>> > > > [email protected]
>> > > > > >
>> > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > Hi,
>> > > > > > > > > >
>> > > > > > > > > > I am trying to group a table (final) containing 10
>> records,
>> > > by
>> > > > a
>> > > > > > > > > > column screen_name using the following command.
>> > > > > > > > > >
>> > > > > > > > > > final_by_sn = GROUP final BY screen_name;
>> > > > > > > > > >
>> > > > > > > > > > When I dump final_by_sn table, only 4 records are
>> returned
>> > as
>> > > > > shown
>> > > > > > > > > below:
>> > > > > > > > > >
>> > > > > > > > > > grunt> dump final_by_sn;
>> > > > > > > > > >
>> > > > > > > > > > (gwenshap,{(.@bigdata used this photo in his blog post
>> and
>> > > made
>> > > > > me
>> > > > > > > > > realize
>> > > > > > > > > > how much I miss Japan:
>> > > > > > > > > https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,2943
>> > > > > > > > > > )
>> > > > > > > > > > })
>> > > > > > > > > > (p2people,{(6 new @p2pLanguages jobs w/ #BigData #Hadoop
>> > > skills
>> > > > > > > > > > http://t.co/UBAni5DPrw
>> > > > > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437
>> > > > > > > > > > ),(6
>> > > > > > > > > > new @p2pLanguages jobs w/ #BigData #Hadoop skills
>> > > > > > > > http://t.co/UBAni5DPrw
>> > > > > > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437),(6
>> new
>> > > > > > > > @p2pLanguages
>> > > > > > > > > > jobs w/ #BigData #Hadoop skills http://t.co/UBAni5DPrw
>> > > > > > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437)})
>> > > > > > > > > > (GuitartJosep,{(#BigData: What it can and can't do!
>> > > > > > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140
>> > > ),(#BigData:
>> > > > > > What
>> > > > > > > it
>> > > > > > > > > can
>> > > > > > > > > > and can't do!
>> > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140
>> > > > > > > > > > ),(#BigData:
>> > > > > > > > > > What it can and can't do!
>> > > > > > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140)})
>> > > > > > > > > > (W4_Jobs_in_ARZ,{(Big #Data #Lead Phoenix AZ (#job)
>> wanted
>> > in
>> > > > > > > #Arizona.
>> > > > > > > > > > #TechFetch
>> > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433
>> > > > > ),(Big
>> > > > > > > > #Data
>> > > > > > > > > > #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch
>> > > > > > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433),(Big
>> > > #Data
>> > > > > > #Lead
>> > > > > > > > > > Phoenix
>> > > > > > > > > > AZ (#job) wanted in #Arizona. #TechFetch
>> > > > > > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433)})
>> > > > > > > > > >
>> > > > > > > > > > dump final;
>> > > > > > > > > >
>> > > > > > > > > > (RT @lordlancaster: Absolutely blown away by
>> > > @SciTecDaresbury!
>> > > > > > > 'Proper'
>> > > > > > > > > Big
>> > > > > > > > > > Data, Smart Cities, Internet of Things &amp; more!
>> > #TechNorth
>> > > > > > > > > > http:/…,en,LornaGreenNWC,8,166,188,Mon May 12 10:19:39
>> > +0000
>> > > > > > > > > > 2014,654395184428515332)
>> > > > > > > > > > (#BigData: What it can and can't do!
>> > > > > > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,Thu Jun
>> 18
>> > > > > 10:20:02
>> > > > > > > > +0000
>> > > > > > > > > > 2015,654395189595869184)
>> > > > > > > > > > (.@bigdata used this photo in his blog post and made me
>> > > realize
>> > > > > how
>> > > > > > > > much
>> > > > > > > > > I
>> > > > > > > > > > miss Japan:
>> > > https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,Mon
>> > > > > Oct
>> > > > > > > 15
>> > > > > > > > > > 20:49:39 +0000 2007,654395195581009920)
>> > > > > > > > > > ("Global Release [Big Data Book] Profit From Science" on
>> > > > > @LinkedIn
>> > > > > > > > > > http://t.co/WnJ2HwthYF Congrats to George
>> > > > > > > > > > Danner!,en,innovatesocialm,,1517,1712,Wed Sep 12
>> 13:46:43
>> > > +0000
>> > > > > > > > > > 2012,654395207065034752)
>> > > > > > > > > > (Hi, BesPardon Don't Forget to follow --&gt;&gt;
>> > > > > > > > http://t.co/Dahu964w5U
>> > > > > > > > > > Thanks..
>> http://t.co/9kKXJ0GQcT,en,Komalmittal91,,51,0,Thu
>> > > Feb
>> > > > > 12
>> > > > > > > > > 16:44:50
>> > > > > > > > > > +0000 2015,654395216208752641)
>> > > > > > > > > > (On Google Books, language, and the possible limits of
>> big
>> > > data
>> > > > > > > > > > https://t.co/OEebZSK952,en,Ian_hoch,,63,107,Fri Aug 31
>> > > > 16:25:09
>> > > > > > > +0000
>> > > > > > > > > > 2012,654395216057659392)
>> > > > > > > > > > (6 new @p2pLanguages jobs w/ #BigData #Hadoop skills
>> > > > > > > > > > http://t.co/UBAni5DPrw
>> > > > > > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,Wed Mar
>> 04
>> > > > > 06:17:09
>> > > > > > > > +0000
>> > > > > > > > > > 2009,654395220373729280)
>> > > > > > > > > > (Big #Data #Lead Phoenix AZ (#job) wanted in #Arizona.
>> > > > #TechFetch
>> > > > > > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,Fri Aug
>> 29
>> > > > > 09:32:31
>> > > > > > > > +0000
>> > > > > > > > > > 2014,654395236718911488)
>> > > > > > > > > > (#Appboy expands suite of #mobile #analytics
>> @venturebeat
>> > > > > > > @wesleyyuhn1
>> > > > > > > > > > http://t.co/85P6vEJg08 #MarTech #automation
>> > > > > > > > > > http://t.co/rWqzNNt1vW,en,wesleyyuhn1,,1531,1927,Mon
>>  Jul
>> > 21
>> > > > > > 12:35:12
>> > > > > > > > > +0000
>> > > > > > > > > > 2014,654395243975065600)
>> > > > > > > > > > (Best Cloud Hosting and CDN services for Web Developers
>> > > > > > > > > > http://t.co/9uf6IaUIlM #cdn #cloudcomputing
>> #cloudhosting
>> > > > > > > #webmasters
>> > > > > > > > > > #websites,en,DoThisBest,,816,1092,Mon Nov 26 18:34:20
>> +0000
>> > > > > > > > > > 2012,654395246025904128)
>> > > > > > > > > > grunt>
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > Could you please help me understand why 6 records are
>> > > > eliminated
>> > > > > > > while
>> > > > > > > > > > doing a group by?
>> > > > > > > > > >
>> > > > > > > > > > Thanks,
>> > > > > > > > > > Joel
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Reply via email to