I would actually think about this the other way around. Move the functions you
are passing to the streaming jobs out to their own object if possible. Spark's
closure capture rules are necessarily far reaching and serialize the object
that contains these methods, which is a common cause of the pr
Yes, remember that your bandwidth is the maximum number of bytes per second
that can be shipped to the driver. So if you've got 5 blocks that size, then it
looks like you're basically saturating the network.
Aggregation trees help for many partitions/nodes and butterfly mixing can help
use all
I was under the impression that we were using the usual sort by average
response value heuristic when storing histogram bins (and searching for optimal
splits) in the tree code.
Maybe Manish or Joseph can clarify?
> On Oct 12, 2014, at 2:50 PM, Sean Owen wrote:
>
> I'm having trouble getting
I recommend using the data generators provided with MLlib to generate synthetic
data for your scalability tests - provided they're well suited for your
algorithms. They let you control things like number of examples and
dimensionality of your dataset, as well as number of partitions.
As far as
Hey,
We're actually working on similar ideas in the AMPlab with spark - for example
we've got some image classification pipelines built on this idea -
http://www.eecs.berkeley.edu/~brecht/papers/07.rah.rec.nips.pdf
Approximating kernel methods via random projections hit with nonlinearity.
Add
Sorry - just saw the 11% number. That is around the spot where dense data is
usually faster (blocking, cache coherence, etc) is there any chance you have a
1% (or so) sparse dataset to experiment with?
> On Apr 23, 2014, at 9:21 PM, DB Tsai wrote:
>
> Hi all,
>
> I'm benchmarking Logistic Reg
What is the number of non zeroes per row (and number of features) in the sparse
case? We've hit some issues with breeze sparse support in the past but for
sufficiently sparse data it's still pretty good.
> On Apr 23, 2014, at 9:21 PM, DB Tsai wrote:
>
> Hi all,
>
> I'm benchmarking Logistic