Hi Sebastian,
Do you have any updates on the issue? I faced with pretty the same problem
and disabling kryo + raising the spark.network.timeout up to 600s helped.
So for my job it takes about 5 minutes to broadcast the variable (~5GB in
my case) but then it's fast. I mean much faster than shufflin
I'm using Spark 1.6.0.
I tried removing Kryo and reverting back to Java Serialisation, and get a
different error which maybe points in the right direction...
java.lang.AssertionError: assertion failed: No plan for BroadcastHint
+- InMemoryRelation
[tradeId#30,tradeVersion#31,agreement#49,counterP
You were using Kryo serialization ?
If you switch to Java serialization, your job should run fine.
Which Spark release are you using ?
Thanks
On Thu, Jan 21, 2016 at 6:59 AM, sebastian.piu
wrote:
> Hi all,
>
> I'm trying to work out a problem when using Spark Streaming, currently I
> have the
Hi all,
I'm trying to work out a problem when using Spark Streaming, currently I
have the following piece of code inside a foreachRDD call:
Dataframe results = ... //some dataframe created from the incoming rdd -
moderately big, I don't want this to be shuffled
DataFrame t = sqlContext.table("a_t