Backpressure is the suggested way out here and is the correct approach, it
rate limits at the source itself for safety. Imagine a service with
throttling enabled, It can outright reject your calls.
Even if you split your df that alone won't achieve your purpose, You can
combine that with backpre
I think most spark technical support people would really recommend
upgrading to spark 2.0+ for starters. However, I understand that's not
always possible. In this case I would double check to make sure that you
don't have a situation where you have a join key that has many records
associated with i
Thank you for the tips. We are running Spark 1.6 (scala), and OOM happens
with SparkSQL trying to join a few large dataset together for
processing/transformation...
On Wed, Jan 9, 2019 at 3:42 PM Ramandeep Singh wrote:
> Hi,
>
> Here are a few suggestions that you can try.
>
> OOM Issues that, I
Hi,
Here are a few suggestions that you can try.
OOM Issues that, I have faced with Spark:
*Not enough shuffle partition*s.Increase them.
Less memory Overhead settings: Boosting it to around 12 percent. You
usually get this as a error message in your executors.
*Large Executor Configs*: They can
Hi William,
Just to get started, can you describe the spark version you are using and
the language? It doesn't sound like you are using pyspark, however,
problems arising from that can be different so I just want to be sure. As
well, can you talk through the scenario under which you are dealing wi
Hi there,
We've encountered Spark executor Java OOM issues for our Spark application.
Any tips on how to troubleshoot to identify what objects are occupying the
heap? In the past, dealing with JVM OOM, we've worked with analyzing heap
dumps, but we are having a hard time with locating Spark heap d
Have you tried controlling the number of partitions of the dataframe? Say
you have 5 partitions, it means you are making 5 concurrent calls to the
web service. The throughput of the web service would be your bottleneck and
Spark workers would be waiting for tasks, but if you cant control the REST
s
Hi,
I write a stream of (String, String) tuples to HDFS partitioned by the
first ("_1") member of the pair.
Everything looks great when I list the directory via "hadoop fs -ls ...".
However, when I try to read all the data as a single dataframe, I get
unexpected results (see below).
I notice th
Dear all,
when fitting a logistic regression model, for some data no p-values are
computed. I cannot really tell under what circumstances this happpens
though.Is there an explanation why and when this might be the case?
Thank you,
Simon
---