Hi, Can anyone please help me in finding the root-cause of this issue?
Thanks, Joel On Wed, Nov 18, 2015 at 1:04 AM, Sam Joe <[email protected]> wrote: > Hi Andrew, I tried that too. Every field has got correct data. > > Thanks, > Joel > > On Wed, Nov 18, 2015 at 12:55 AM, Andrew Oliver <[email protected]> > wrote: > >> Project just screen_name. If it is blank or empty you have your answer. >> On Nov 17, 2015 23:47, "Sam Joe" <[email protected]> wrote: >> >> > debug is on. verbose have to try. >> > >> > Thx. >> > >> > On Tue, Nov 17, 2015 at 11:45 PM, Arvind S <[email protected]> >> wrote: >> > >> > > have you tried >> > > grunt> set debug on; >> > > grunt> set verbose on; >> > > >> > > this gives some counters which might help .. >> > > >> > > >> > > *Cheers !!* >> > > Arvind >> > > >> > > On Wed, Nov 18, 2015 at 9:51 AM, Sam Joe <[email protected]> >> > wrote: >> > > >> > > > Hi Arvind, >> > > > >> > > > Thanks but I ensured that each element is populated to their >> respective >> > > > fields. I also ensured that the data is clean since the record >> which is >> > > > getting eliminated is getting processed fine if only one record is >> > > > processed. >> > > > >> > > > How to find the root-cause? I am not getting anything from the >> server >> > > logs >> > > > or from the application logs. Is there any place I should look? >> > > > >> > > > >> > > > Thanks, >> > > > Joel >> > > > >> > > > On Tue, Nov 17, 2015 at 11:06 PM, Arvind S <[email protected]> >> > > wrote: >> > > > >> > > > > Hi .. >> > > > > if you are reading json then ensure that the file content is >> parsed >> > > > correct >> > > > > by pig before you do grouping. >> > > > > Simple dump sometimes does not show if the json was parsed into >> > > multiple >> > > > > columns or entire line was read as one string into the 1st column >> > only. >> > > > > >> > > > > >> > > > > >> > > > > *Cheers !!* >> > > > > Arvind >> > > > > >> > > > > On Wed, Nov 18, 2015 at 4:59 AM, Sam Joe <[email protected] >> > >> > > > wrote: >> > > > > >> > > > > > Hi Arvind, >> > > > > > >> > > > > > You are right. It works fine in local mode. No records >> eliminated. >> > > > > > >> > > > > > I need to now find out why while using mapreduce mode some >> records >> > > are >> > > > > > getting eliminated. >> > > > > > >> > > > > > Any suggestions on troubleshooting steps for finding out the >> > > root-cause >> > > > > in >> > > > > > mapreduce mode? Which logs to be checked, etc. >> > > > > > >> > > > > > Appreciate any help! >> > > > > > >> > > > > > Thanks, >> > > > > > Joel >> > > > > > >> > > > > > On Mon, Nov 16, 2015 at 11:32 PM, Arvind S < >> [email protected]> >> > > > > wrote: >> > > > > > >> > > > > > > tested on pig .15 using your data and in local mode .. could >> not >> > > > > > reproduce >> > > > > > > issue .. >> > > > > > > ================================================== >> > > > > > > final_by_lsn_g = GROUP final_by_lsn BY screen_name; >> > > > > > > >> > > > > > > (Ian_hoch,{(en,Ian_hoch)}) >> > > > > > > (gwenshap,{(en,gwenshap)}) >> > > > > > > (p2people,{(en,p2people)}) >> > > > > > > (DoThisBest,{(en,DoThisBest)}) >> > > > > > > (wesleyyuhn1,{(en,wesleyyuhn1)}) >> > > > > > > (GuitartJosep,{(en,GuitartJosep)}) >> > > > > > > (Komalmittal91,{(en,Komalmittal91)}) >> > > > > > > (LornaGreenNWC,{(en,LornaGreenNWC)}) >> > > > > > > (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ)}) >> > > > > > > (innovatesocialm,{(en,innovatesocialm)}) >> > > > > > > ================================================== >> > > > > > > final_by_lsn_g = GROUP final_by_lsn BY language; >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> (en,{(en,DoThisBest),(en,wesleyyuhn1),(en,W4_Jobs_in_ARZ),(en,p2people),(en,Ian_hoch),(en,Komalmittal91),(en,innovatesocialm),(en,gwenshap),(en,GuitartJosep),(en,LornaGreenNWC)}) >> > > > > > > ================================================== >> > > > > > > >> > > > > > > suggestions .. >> > > > > > > > try in local mode to reporduce issue .. (if you have not >> > already >> > > > done >> > > > > > so) >> > > > > > > > close all old sessions and open a new one... (i know its >> > > dumb..but >> > > > > > helped >> > > > > > > me some times) >> > > > > > > >> > > > > > > >> > > > > > > *Cheers !!* >> > > > > > > Arvind >> > > > > > > >> > > > > > > On Tue, Nov 17, 2015 at 8:09 AM, Sam Joe < >> > [email protected]> >> > > > > > wrote: >> > > > > > > >> > > > > > > > Hi, >> > > > > > > > >> > > > > > > > I reproduced the issue with less columns as well. >> > > > > > > > >> > > > > > > > dump final_by_lsn; >> > > > > > > > >> > > > > > > > (en,LornaGreenNWC) >> > > > > > > > (en,GuitartJosep) >> > > > > > > > (en,gwenshap) >> > > > > > > > (en,innovatesocialm) >> > > > > > > > (en,Komalmittal91) >> > > > > > > > (en,Ian_hoch) >> > > > > > > > (en,p2people) >> > > > > > > > (en,W4_Jobs_in_ARZ) >> > > > > > > > (en,wesleyyuhn1) >> > > > > > > > (en,DoThisBest) >> > > > > > > > >> > > > > > > > grunt> final_by_lsn_g = GROUP final_by_lsn BY screen_name; >> > > > > > > > >> > > > > > > > >> > > > > > > > grunt> dump final_by_lsn_g; >> > > > > > > > >> > > > > > > > (gwenshap,{(en,gwenshap)}) >> > > > > > > > (p2people,{(en,p2people),(en,p2people),(en,p2people)}) >> > > > > > > > >> > > > > >> > (GuitartJosep,{(en,GuitartJosep),(en,GuitartJosep),(en,GuitartJosep)}) >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ)}) >> > > > > > > > >> > > > > > > > >> > > > > > > > Steps I tried to find the root-cause: >> > > > > > > > - Removing special characters from the data >> > > > > > > > - Setting the loglevel to 'Debug' >> > > > > > > > However, I couldn't find a clue about the problem. >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > Can someone please help me troubleshoot the issue? >> > > > > > > > >> > > > > > > > Thanks, >> > > > > > > > Joel >> > > > > > > > >> > > > > > > > On Fri, Nov 13, 2015 at 12:18 PM, Steve Terrell < >> > > > > [email protected] >> > > > > > > >> > > > > > > > wrote: >> > > > > > > > >> > > > > > > > > Please try reproducing the problem with the smallest >> amount >> > of >> > > > data >> > > > > > > > > possible. Use as few rows and the smallest strings >> possible >> > > that >> > > > > > still >> > > > > > > > > demonstrate the discrepancy. And then repost your >> problem. >> > In >> > > > > doing >> > > > > > > so, >> > > > > > > > > it will make your request easier to digest by the readers >> of >> > > > group, >> > > > > > and >> > > > > > > > you >> > > > > > > > > might even discover a problem in your original data if you >> > can >> > > > not >> > > > > > > > > reproduce it on a smaller scale. >> > > > > > > > > >> > > > > > > > > Thanks, >> > > > > > > > > Steve >> > > > > > > > > >> > > > > > > > > On Fri, Nov 13, 2015 at 10:28 AM, Sam Joe < >> > > > [email protected] >> > > > > > >> > > > > > > > wrote: >> > > > > > > > > >> > > > > > > > > > Hi, >> > > > > > > > > > >> > > > > > > > > > I am trying to group a table (final) containing 10 >> records, >> > > by >> > > > a >> > > > > > > > > > column screen_name using the following command. >> > > > > > > > > > >> > > > > > > > > > final_by_sn = GROUP final BY screen_name; >> > > > > > > > > > >> > > > > > > > > > When I dump final_by_sn table, only 4 records are >> returned >> > as >> > > > > shown >> > > > > > > > > below: >> > > > > > > > > > >> > > > > > > > > > grunt> dump final_by_sn; >> > > > > > > > > > >> > > > > > > > > > (gwenshap,{(.@bigdata used this photo in his blog post >> and >> > > made >> > > > > me >> > > > > > > > > realize >> > > > > > > > > > how much I miss Japan: >> > > > > > > > > https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,2943 >> > > > > > > > > > ) >> > > > > > > > > > }) >> > > > > > > > > > (p2people,{(6 new @p2pLanguages jobs w/ #BigData #Hadoop >> > > skills >> > > > > > > > > > http://t.co/UBAni5DPrw >> > > > > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437 >> > > > > > > > > > ),(6 >> > > > > > > > > > new @p2pLanguages jobs w/ #BigData #Hadoop skills >> > > > > > > > http://t.co/UBAni5DPrw >> > > > > > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437),(6 >> new >> > > > > > > > @p2pLanguages >> > > > > > > > > > jobs w/ #BigData #Hadoop skills http://t.co/UBAni5DPrw >> > > > > > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437)}) >> > > > > > > > > > (GuitartJosep,{(#BigData: What it can and can't do! >> > > > > > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140 >> > > ),(#BigData: >> > > > > > What >> > > > > > > it >> > > > > > > > > can >> > > > > > > > > > and can't do! >> > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140 >> > > > > > > > > > ),(#BigData: >> > > > > > > > > > What it can and can't do! >> > > > > > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140)}) >> > > > > > > > > > (W4_Jobs_in_ARZ,{(Big #Data #Lead Phoenix AZ (#job) >> wanted >> > in >> > > > > > > #Arizona. >> > > > > > > > > > #TechFetch >> > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433 >> > > > > ),(Big >> > > > > > > > #Data >> > > > > > > > > > #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch >> > > > > > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433),(Big >> > > #Data >> > > > > > #Lead >> > > > > > > > > > Phoenix >> > > > > > > > > > AZ (#job) wanted in #Arizona. #TechFetch >> > > > > > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433)}) >> > > > > > > > > > >> > > > > > > > > > dump final; >> > > > > > > > > > >> > > > > > > > > > (RT @lordlancaster: Absolutely blown away by >> > > @SciTecDaresbury! >> > > > > > > 'Proper' >> > > > > > > > > Big >> > > > > > > > > > Data, Smart Cities, Internet of Things & more! >> > #TechNorth >> > > > > > > > > > http:/…,en,LornaGreenNWC,8,166,188,Mon May 12 10:19:39 >> > +0000 >> > > > > > > > > > 2014,654395184428515332) >> > > > > > > > > > (#BigData: What it can and can't do! >> > > > > > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,Thu Jun >> 18 >> > > > > 10:20:02 >> > > > > > > > +0000 >> > > > > > > > > > 2015,654395189595869184) >> > > > > > > > > > (.@bigdata used this photo in his blog post and made me >> > > realize >> > > > > how >> > > > > > > > much >> > > > > > > > > I >> > > > > > > > > > miss Japan: >> > > https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,Mon >> > > > > Oct >> > > > > > > 15 >> > > > > > > > > > 20:49:39 +0000 2007,654395195581009920) >> > > > > > > > > > ("Global Release [Big Data Book] Profit From Science" on >> > > > > @LinkedIn >> > > > > > > > > > http://t.co/WnJ2HwthYF Congrats to George >> > > > > > > > > > Danner!,en,innovatesocialm,,1517,1712,Wed Sep 12 >> 13:46:43 >> > > +0000 >> > > > > > > > > > 2012,654395207065034752) >> > > > > > > > > > (Hi, BesPardon Don't Forget to follow -->> >> > > > > > > > http://t.co/Dahu964w5U >> > > > > > > > > > Thanks.. >> http://t.co/9kKXJ0GQcT,en,Komalmittal91,,51,0,Thu >> > > Feb >> > > > > 12 >> > > > > > > > > 16:44:50 >> > > > > > > > > > +0000 2015,654395216208752641) >> > > > > > > > > > (On Google Books, language, and the possible limits of >> big >> > > data >> > > > > > > > > > https://t.co/OEebZSK952,en,Ian_hoch,,63,107,Fri Aug 31 >> > > > 16:25:09 >> > > > > > > +0000 >> > > > > > > > > > 2012,654395216057659392) >> > > > > > > > > > (6 new @p2pLanguages jobs w/ #BigData #Hadoop skills >> > > > > > > > > > http://t.co/UBAni5DPrw >> > > > > > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,Wed Mar >> 04 >> > > > > 06:17:09 >> > > > > > > > +0000 >> > > > > > > > > > 2009,654395220373729280) >> > > > > > > > > > (Big #Data #Lead Phoenix AZ (#job) wanted in #Arizona. >> > > > #TechFetch >> > > > > > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,Fri Aug >> 29 >> > > > > 09:32:31 >> > > > > > > > +0000 >> > > > > > > > > > 2014,654395236718911488) >> > > > > > > > > > (#Appboy expands suite of #mobile #analytics >> @venturebeat >> > > > > > > @wesleyyuhn1 >> > > > > > > > > > http://t.co/85P6vEJg08 #MarTech #automation >> > > > > > > > > > http://t.co/rWqzNNt1vW,en,wesleyyuhn1,,1531,1927,Mon >> Jul >> > 21 >> > > > > > 12:35:12 >> > > > > > > > > +0000 >> > > > > > > > > > 2014,654395243975065600) >> > > > > > > > > > (Best Cloud Hosting and CDN services for Web Developers >> > > > > > > > > > http://t.co/9uf6IaUIlM #cdn #cloudcomputing >> #cloudhosting >> > > > > > > #webmasters >> > > > > > > > > > #websites,en,DoThisBest,,816,1092,Mon Nov 26 18:34:20 >> +0000 >> > > > > > > > > > 2012,654395246025904128) >> > > > > > > > > > grunt> >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > Could you please help me understand why 6 records are >> > > > eliminated >> > > > > > > while >> > > > > > > > > > doing a group by? >> > > > > > > > > > >> > > > > > > > > > Thanks, >> > > > > > > > > > Joel >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> > >
