I can tell you what the environment and rough processes are like:
CDH5 Yarn
15 executors (16GB for driver, 8GB for executors)
Total cached data about 10GB
Shuffled data size per iteration ~1GB. - map followed by groupby followed
by map followed by collect
I'd imagine that every time map/groupby is
Maybe, TorrentBroadcast is more complicated than HttpBroadcast, could
you tell us
how to reproduce this issue? That will help us a lot to improve
TorrentBroadcast.
Thanks!
On Fri, Oct 10, 2014 at 8:46 AM, Sung Hwan Chung
wrote:
> I haven't seen this at all since switching to HttpBroadcast. It se
I haven't seen this at all since switching to HttpBroadcast. It seems
TorrentBroadcast might have some issues?
On Thu, Oct 9, 2014 at 4:28 PM, Sung Hwan Chung
wrote:
> I don't think that I saw any other error message. This is all I saw.
>
> I'm currently experimenting to see if this can be allev
I don't think that I saw any other error message. This is all I saw.
I'm currently experimenting to see if this can be alleviated by using
HttpBroadcastFactory instead of TorrentBroadcast. So far, with
HttpBroadcast, I haven't seen this recurring as of yet. I'll keep you
posted.
On Thu, Oct 9, 20
Could you provide a script to reproduce this problem?
Thanks!
On Wed, Oct 8, 2014 at 9:13 PM, Sung Hwan Chung
wrote:
> This is also happening to me on a regular basis, when the job is large with
> relatively large serialized objects used in each RDD lineage. A bad thing
> about this is that this
This exception should be caused by another one, could you paste all of
them here?
Also, that will be great if you can provide a script to reproduce this problem.
Thanks!
On Fri, Sep 26, 2014 at 6:11 AM, Arun Ahuja wrote:
> Has anyone else seen this erorr in task deserialization? The task is
>
This is also happening to me on a regular basis, when the job is large with
relatively large serialized objects used in each RDD lineage. A bad thing
about this is that this exception always stops the whole job.
On Fri, Sep 26, 2014 at 11:17 AM, Brad Miller
wrote:
> FWIW I suspect that each cou
FWIW I suspect that each count operation is an opportunity for you to
trigger the bug, and each filter operation increases the likelihood of
setting up the bug. I normally don't come across this error until my job
has been running for an hour or two and had a chance to build up longer
lineages for
No for me as well it is non-deterministic. It happens in a piece of code
that does many filter and counts on a small set of records (~1k-10k). The
originally set is persisted in memory and we have a Kryo serializer set for
it. The task itself takes in just a few filtering parameters. This with
I've had multiple jobs crash due to "java.io.IOException: unexpected
exception type"; I've been running the 1.1 branch for some time and am now
running the 1.1 release binaries. Note that I only use PySpark. I haven't
kept detailed notes or the tracebacks around since there are other problems
that
Has anyone else seen this erorr in task deserialization? The task is
processing a small amount of data and doesn't seem to have much data
hanging to the closure? I've only seen this with Spark 1.1
Job aborted due to stage failure: Task 975 in stage 8.0 failed 4
times, most recent failure: Lost t
11 matches
Mail list logo