p.
>
>
>
> Thanks,
>
>
>
> Piero
>
>
>
> *From:* DB Tsai [mailto:dbt...@dbtsai.com
> ]
> *Sent:* Wednesday, June 03, 2015 10:33 PM
> *To:* Piero Cinquegrana
> *Cc:* user@spark.apache.org
>
> *Subject:* Re: Standard Scaler taking 1.5hrs
>
>
>
each step.
Thanks,
Piero
From: DB Tsai [mailto:dbt...@dbtsai.com]
Sent: Wednesday, June 03, 2015 10:33 PM
To: Piero Cinquegrana
Cc: user@spark.apache.org
Subject: Re: Standard Scaler taking 1.5hrs
Can you do count() before fit to force materialize the RDD? I think something
before fit is slow
Can you do count() before fit to force materialize the RDD? I think
something before fit is slow.
On Wednesday, June 3, 2015, Piero Cinquegrana
wrote:
> The fit part is very slow, transform not at all.
>
> The number of partitions was 210 vs number of executors 80.
>
> Spark 1.4 sounds great
The fit part is very slow, transform not at all.
The number of partitions was 210 vs number of executors 80.
Spark 1.4 sounds great but as my company is using Qubole we are dependent upon
them to upgrade from version 1.3.1. Until that happens, can you think of any
other reasons as to why it cou
Which part of StandardScaler is slow? Fit or transform? Fit has shuffle but
very small, and transform doesn't do shuffle. I guess you don't have enough
partition, so please repartition your input dataset to a number at least
larger than the # of executors you have.
In Spark 1.4's new ML pipeline a