why when I double the number of workers, ml LogisticRegression fitting time is not reduced in half?

2015-09-16 Thread julia
Hi all, I run the following LogisticRegression code (ml classification class) with 14 and 28 workers respectively (2 cores/worker, 12G/worker), and the fitting times are almost the same: 11.25 vs 10.39 minutes for 14 & 28 workers. Shouldn't it reduce speed in half? DataFrame 'train' has 3,654,390

CrossValidator speed - for loop on each parameter map?

2015-09-23 Thread julia
ch parameter map. * Subclasses could override this to optimize multi-model training. Is it possible to parallelize CrossValidator on nFolds and numModels so that is faster? The times in comparison to R glmnet are not competitive, at least for dataframes under 3.5 million rows… Thanks! Julia. -- V

Kafka Consumer Pre Fetch Messages + Async commits

2017-08-25 Thread Julia Wistance
Hi Experts, A question on what could potentially happen with Spark Streaming 2.2.0 + Kafka. LocationStrategies says that "new Kafka consumer API will pre-fetch messages into buffers.". If we store offsets in Kafka, currently we can only use a async commits. So, 1 - Could it happen that we commit

Re: Kafka Consumer Pre Fetch Messages + Async commits

2017-08-29 Thread Julia Wistance
messages in > the event of failure in any case, so the only difference sync commit > would make would be (possibly) slower run time. > > On Sat, Aug 26, 2017 at 1:07 AM, Julia Wistance > wrote: > > Hi Experts, > > > > A question on what could potentially happen with Spa