This is interesting. Would really appreciate it if you could share what
exactly did you change in* core-site.xml *and *yarn-site.xml.*
On Wed, May 22, 2019 at 9:14 AM Gourav Sengupta
wrote:
> just wondering what is the advantage of doing this?
>
> Regards
> Gourav Sengupta
>
> On Wed, May 22, 20
One potential case that can cause this is the optimizer being a little
overzealous with determining if a table can be broadcasted or not. Have you
checked the UI or query plan to see if any steps include a
BroadcastHashJoin? Its possible that the optimizer thinks that it should be
able to fit the t
Hi,
We have a quite long winded Spark application we inherited with many stages.
When we run on our spark cluster, things start off well enough. Workers are
busy, lots of progress made, etc. etc. However, 30 minutes into processing, we
see CPU usage of the workers drop drastically. At this time,
In PySpark streaming, if checkpoint enabled, and if use a stream.transform
operator to join with another rdd, “PicklingError: Could not serialize
object” will be thrown. I have asked the same question at stackoverflow:
https://stackoverflow.com/questions/56267591/pyspark-streaming-picklingerror-cou