-1 from me...same FetchFailed issue as what Hector saw...

I am running Netflix dataset and dumping out recommendation for all users.
It shuffles around 100 GB data on disk to run a reduceByKey per user on
utils.BoundedPriorityQueue...The code runs fine with MovieLens1m dataset...

I gave Spark 10 nodes, 8 cores, 160 GB of memory.

Fails with the following FetchFailed errors.

14/11/23 11:51:22 WARN TaskSetManager: Lost task 28.0 in stage 188.0 (TID
2818, tblpmidn08adv-hdp.tdc.vzwcorp.com): FetchFailed(BlockManagerId(1,
tblpmidn03adv-hdp.tdc.vzwcorp.com, 52528, 0), shuffleId=35, mapId=28,
reduceId=28)

It's a consistent behavior on master as well.

I tested it both on YARN and Standalone. I compiled spark-1.1 branch
(assuming it has all the fixes from RC2 tag.

I am now compiling spark-1.0 branch and see if this issue shows up there as
well. If it is related to hash/sort based shuffle most likely it won't show
up on 1.0.

Thanks.

Deb

On Thu, Nov 20, 2014 at 12:16 PM, Hector Yee <hector....@gmail.com> wrote:

> Whoops I must have used the 1.2 preview and mixed them up.
>
> spark-shell -version shows  version 1.2.0
>
> Will update the bug https://issues.apache.org/jira/browse/SPARK-4516 to
> 1.2
>
> On Thu, Nov 20, 2014 at 11:59 AM, Matei Zaharia <matei.zaha...@gmail.com>
> wrote:
>
> > Ah, I see. But the spark.shuffle.blockTransferService property doesn't
> > exist in 1.1 (AFAIK) -- what exactly are you doing to get this problem?
> >
> > Matei
> >
> > On Nov 20, 2014, at 11:50 AM, Hector Yee <hector....@gmail.com> wrote:
> >
> > This is whatever was in http://people.apache.org/~andrewor14/spark-1
> > .1.1-rc2/
> >
> > On Thu, Nov 20, 2014 at 11:48 AM, Matei Zaharia <matei.zaha...@gmail.com
> >
> > wrote:
> >
> >> Hector, is this a comment on 1.1.1 or on the 1.2 preview?
> >>
> >> Matei
> >>
> >> > On Nov 20, 2014, at 11:39 AM, Hector Yee <hector....@gmail.com>
> wrote:
> >> >
> >> > I think it is a race condition caused by netty deactivating a channel
> >> while
> >> > it is active.
> >> > Switched to nio and it works fine
> >> > --conf spark.shuffle.blockTransferService=nio
> >> >
> >> > On Thu, Nov 20, 2014 at 10:44 AM, Hector Yee <hector....@gmail.com>
> >> wrote:
> >> >
> >> >> I'm still seeing the fetch failed error and updated
> >> >> https://issues.apache.org/jira/browse/SPARK-3633
> >> >>
> >> >> On Thu, Nov 20, 2014 at 10:21 AM, Marcelo Vanzin <
> van...@cloudera.com>
> >> >> wrote:
> >> >>
> >> >>> +1 (non-binding)
> >> >>>
> >> >>> . ran simple things on spark-shell
> >> >>> . ran jobs in yarn client & cluster modes, and standalone cluster
> mode
> >> >>>
> >> >>> On Wed, Nov 19, 2014 at 2:51 PM, Andrew Or <and...@databricks.com>
> >> wrote:
> >> >>>> Please vote on releasing the following candidate as Apache Spark
> >> version
> >> >>>> 1.1.1.
> >> >>>>
> >> >>>> This release fixes a number of bugs in Spark 1.1.0. Some of the
> >> notable
> >> >>> ones
> >> >>>> are
> >> >>>> - [SPARK-3426] Sort-based shuffle compression settings are
> >> incompatible
> >> >>>> - [SPARK-3948] Stream corruption issues in sort-based shuffle
> >> >>>> - [SPARK-4107] Incorrect handling of Channel.read() led to data
> >> >>> truncation
> >> >>>> The full list is at http://s.apache.org/z9h and in the CHANGES.txt
> >> >>> attached.
> >> >>>>
> >> >>>> Additionally, this candidate fixes two blockers from the previous
> RC:
> >> >>>> - [SPARK-4434] Cluster mode jar URLs are broken
> >> >>>> - [SPARK-4480][SPARK-4467] Too many open files exception from
> shuffle
> >> >>> spills
> >> >>>>
> >> >>>> The tag to be voted on is v1.1.1-rc2 (commit 3693ae5d):
> >> >>>> http://s.apache.org/p8
> >> >>>>
> >> >>>> The release files, including signatures, digests, etc can be found
> >> at:
> >> >>>> http://people.apache.org/~andrewor14/spark-1.1.1-rc2/
> >> >>>>
> >> >>>> Release artifacts are signed with the following key:
> >> >>>> https://people.apache.org/keys/committer/andrewor14.asc
> >> >>>>
> >> >>>> The staging repository for this release can be found at:
> >> >>>>
> >> https://repository.apache.org/content/repositories/orgapachespark-1043/
> >> >>>>
> >> >>>> The documentation corresponding to this release can be found at:
> >> >>>> http://people.apache.org/~andrewor14/spark-1.1.1-rc2-docs/
> >> >>>>
> >> >>>> Please vote on releasing this package as Apache Spark 1.1.1!
> >> >>>>
> >> >>>> The vote is open until Saturday, November 22, at 23:00 UTC and
> >> passes if
> >> >>>> a majority of at least 3 +1 PMC votes are cast.
> >> >>>> [ ] +1 Release this package as Apache Spark 1.1.1
> >> >>>> [ ] -1 Do not release this package because ...
> >> >>>>
> >> >>>> To learn more about Apache Spark, please see
> >> >>>> http://spark.apache.org/
> >> >>>>
> >> >>>> Cheers,
> >> >>>> Andrew
> >> >>>>
> >> >>>>
> >> >>>>
> ---------------------------------------------------------------------
> >> >>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> >> >>>> For additional commands, e-mail: dev-h...@spark.apache.org
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Marcelo
> >> >>>
> >> >>>
> ---------------------------------------------------------------------
> >> >>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> >> >>> For additional commands, e-mail: dev-h...@spark.apache.org
> >> >>>
> >> >>>
> >> >>
> >> >>
> >> >> --
> >> >> Yee Yang Li Hector <http://google.com/+HectorYee>
> >> >> *google.com/+HectorYee <http://google.com/+HectorYee>*
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Yee Yang Li Hector <http://google.com/+HectorYee>
> >> > *google.com/+HectorYee <http://google.com/+HectorYee>*
> >>
> >>
> >
> >
> > --
> > Yee Yang Li Hector <http://google.com/+HectorYee>
> > *google.com/+HectorYee <http://google.com/+HectorYee>*
> >
> >
> >
>
>
> --
> Yee Yang Li Hector <http://google.com/+HectorYee>
> *google.com/+HectorYee <http://google.com/+HectorYee>*
>

Reply via email to