In PySpark streaming, if checkpoint enabled, and if use a stream.transform
operator to join with another rdd, “PicklingError: Could not serialize
object” will be thrown. I have asked the same question at stackoverflow:
https://stackoverflow.com/questions/56267591/pyspark-streaming-picklingerror-cou
Ok... I am sure it is a bug of spark, I found the bug code, but the code is
removed in 2.2.3, so I just upgrade spark to fix the problem.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mai
We met broadcast issue in some of our applications, but not every time we run
application, usually it gone when we rerun it. In the exception log, I see
below two types of exception:
Exception 1:
10:09:20.295 [shuffle-server-6-2] ERROR
org.apache.spark.network.server.TransportRequestHandler - Erro
Hey,
We use a customize receiver to receive data from our MQ. We used to use def
store(dataItem: T) to store data however I found the block size can be very
different from 0.5K to 5M size. So that data partition processing time is
very different. Shuffle is an option, but I want to avoid it.
I no