Re: Equi Join is taking for ever. 1 Task is Running while other 199 are complete

2015-04-20 Thread ๏̯͡๏
t much larger than the fast finished tasks, is that >>>> normal? >>>> >>>> I am also interested in this case, as from statistics on the UI, how it >>>> indicates the task could have skew data? >>>> >>>> Yong >>>> &g

Re: Equi Join is taking for ever. 1 Task is Running while other 199 are complete

2015-04-20 Thread ๏̯͡๏
t;> of this task is not much larger than the fast finished tasks, is that >>> normal? >>> >>> I am also interested in this case, as from statistics on the UI, how it >>> indicates the task could have skew data? >>> >>> Yong >>> >>

Re: Equi Join is taking for ever. 1 Task is Running while other 199 are complete

2015-04-14 Thread Imran Rashid
ew data? >> >> Yong >> >> -------------- >> Date: Mon, 13 Apr 2015 12:58:12 -0400 >> Subject: Re: Equi Join is taking for ever. 1 Task is Running while other >> 199 are complete >> From: jcove...@gmail.com >> To: deepuj...@gmail.

Re: Equi Join is taking for ever. 1 Task is Running while other 199 are complete

2015-04-13 Thread Jonathan Coveney
as from statistics on the UI, how it > indicates the task could have skew data? > > Yong > > -- > Date: Mon, 13 Apr 2015 12:58:12 -0400 > Subject: Re: Equi Join is taking for ever. 1 Task is Running while other > 199 are complete > From: jcove...

RE: Equi Join is taking for ever. 1 Task is Running while other 199 are complete

2015-04-13 Thread java8964
from statistics on the UI, how it indicates the task could have skew data? Yong Date: Mon, 13 Apr 2015 12:58:12 -0400 Subject: Re: Equi Join is taking for ever. 1 Task is Running while other 199 are complete From: jcove...@gmail.com To: deepuj...@gmail.com CC: user@spark.apache.org I can promise

Re: Equi Join is taking for ever. 1 Task is Running while other 199 are complete

2015-04-13 Thread Jonathan Coveney
I can promise you that this is also a problem in the pig world :) not sure why it's not a problem for this data set, though... are you sure that the two are doing the exact same code? you should inspect your source data. Make a histogram for each and see what the data distribution looks like. If t

Re: Equi Join is taking for ever. 1 Task is Running while other 199 are complete

2015-04-13 Thread ๏̯͡๏
You mean there is a tuple in either RDD, that has itemID = 0 or null ? And what is catch all ? That implies is it a good idea to run a filter on each RDD first ? We do not do this using Pig on M/R. Is it required in Spark world ? On Mon, Apr 13, 2015 at 9:58 PM, Jonathan Coveney wrote: > My gue

Re: Equi Join is taking for ever. 1 Task is Running while other 199 are complete

2015-04-13 Thread Jonathan Coveney
My guess would be data skew. Do you know if there is some item id that is a catch all? can it be null? item id 0? lots of data sets have this sort of value and it always kills joins 2015-04-13 11:32 GMT-04:00 ÐΞ€ρ@Ҝ (๏̯͡๏) : > Code: > > val viEventsWithListings: RDD[(Long, (DetailInputRecord, VIS

Equi Join is taking for ever. 1 Task is Running while other 199 are complete

2015-04-13 Thread ๏̯͡๏
Code: val viEventsWithListings: RDD[(Long, (DetailInputRecord, VISummary, Long))] = lstgItem.join(viEvents).map { case (itemId, (listing, viDetail)) => val viSummary = new VISummary viSummary.leafCategoryId = listing.getLeafCategId().toInt viSummary.itemSiteId = listi