Hi all,
I run the following LogisticRegression code (ml classification class) with
14 and 28 workers respectively (2 cores/worker, 12G/worker), and the fitting
times are almost the same: 11.25 vs 10.39 minutes for 14 & 28 workers.
Shouldn't it reduce speed in half?
DataFrame 'train' has 3,654,390
ch parameter map.
* Subclasses could override this to optimize multi-model training.
Is it possible to parallelize CrossValidator on nFolds and numModels so that
is faster?
The times in comparison to R glmnet are not competitive, at least for
dataframes under 3.5 million rows…
Thanks!
Julia.
--
V
Hi Experts,
A question on what could potentially happen with Spark Streaming 2.2.0 +
Kafka. LocationStrategies says that "new Kafka consumer API will pre-fetch
messages into buffers.".
If we store offsets in Kafka, currently we can only use a async commits.
So,
1 - Could it happen that we commit
messages in
> the event of failure in any case, so the only difference sync commit
> would make would be (possibly) slower run time.
>
> On Sat, Aug 26, 2017 at 1:07 AM, Julia Wistance
> wrote:
> > Hi Experts,
> >
> > A question on what could potentially happen with Spa