I have pretty much the same "symptoms" - the computation itself is pretty
fast, but most of my computation is spent in JavaToPython steps (~15min).
I'm using the Spark 1.5.0-rc1 with DataFrame and ML Pipelines.
Any insights into what these steps are exactly ?
2015-06-02 9:18 GMT+02:00 Karlson :
>
Hi, the code is some hundreds lines of Python. I can try to compose a
minimal example as soon as I find the time, though. Any ideas until
then?
Would you mind posting the code?
On 2 Jun 2015 00:53, "Karlson" wrote:
Hi,
In all (pyspark) Spark jobs, that become somewhat more involved, I am
e
Would you mind posting the code?
On 2 Jun 2015 00:53, "Karlson" wrote:
> Hi,
>
> In all (pyspark) Spark jobs, that become somewhat more involved, I am
> experiencing the issue that some stages take a very long time to complete
> and sometimes don't at all. This clearly correlates with the size of