Yes that makes sense, but it doesn't make the jobs CPU-bound. What is
the bottleneck? the model building or other stages? I would think you
can get the model building to be CPU bound, unless you have chopped it
up into really small partitions. I think it's best to look further
into what stages are
Hi Sean,
I'm trying to increase the cpu usage by running logistic regression in
different datasets in parallel. They shouldn't depend on each other.
I train several logistic regression models from different column
combinations of a main dataset. I processed the combinations in a ParArray
in an att
It sounds like your computation just isn't CPU bound, right? or maybe
that only some stages are. It's not clear what work you are doing
beyond the core LR.
Stages don't wait on each other unless one depends on the other. You'd
have to clarify what you mean by running stages in parallel, like what
Hi all,
I'm running Spark 1.2.0, in Stand alone mode, on different cluster and
server sizes. All of my data is cached in memory.
Basically I have a mass of data, about 8gb, with about 37k of columns, and
I'm running different configs of an BinaryLogisticRegressionBFGS.
When I put spark to run on 9