Re: Python vs Scala performance

2014-10-22 Thread Eustache DIEMERT
Wild guess maybe, but do you decode the json records in Python ? it could be much slower as the default lib is quite slow. If so try ujson [1] - a C implementation that is at least an order of magnitude faster. HTH [1] https://pypi.python.org/pypi/ujson 2014-10-22 16:51 GMT+02:00 Marius Soutier

Re: [mllib] strange/buggy results with RidgeRegressionWithSGD

2014-07-07 Thread Eustache DIEMERT
d in adding the ones column. Does anyone here has had success with this code on real-world datasets ? [1] https://github.com/oddskool/mllib-samples/tree/ridge (in the ridge branch) 2014-07-07 9:08 GMT+02:00 Eustache DIEMERT : > Well, why not, but IMHO MLLib Logistic Regression is unusabl

Re: [mllib] strange/buggy results with RidgeRegressionWithSGD

2014-07-07 Thread Eustache DIEMERT
27;m using those right... >> >> Thanks, >> >> -- >> >> *Thomas ROBERT* >> www.creativedata.fr >> >> >> 2014-07-03 16:16 GMT+02:00 Eustache DIEMERT : >> >>> Printing the model show the intercept is always 0 :( >

Re: [mllib] strange/buggy results with RidgeRegressionWithSGD

2014-07-07 Thread Eustache DIEMERT
vises concerning the use of > these regression algorithms, for example how to choose a good step and > number of iterations? I wonder if I'm using those right... > > Thanks, > > -- > > *Thomas ROBERT* > www.creativedata.fr > > > 2014-07-03 16:16 GMT+02:00 Eustache D

Re: [mllib] strange/buggy results with RidgeRegressionWithSGD

2014-07-03 Thread Eustache DIEMERT
Printing the model show the intercept is always 0 :( Should I open a bug for that ? 2014-07-02 16:11 GMT+02:00 Eustache DIEMERT : > Hi list, > > I'm benchmarking MLlib for a regression task [1] and get strange results. > > Namely, using RidgeRegressionWithSGD it seems the p

[mllib] strange/buggy results with RidgeRegressionWithSGD

2014-07-02 Thread Eustache DIEMERT
Hi list, I'm benchmarking MLlib for a regression task [1] and get strange results. Namely, using RidgeRegressionWithSGD it seems the predicted points miss the intercept: {code} val trainedModel = RidgeRegressionWithSGD.train(trainingData, 1000) ... valuesAndPreds.take(10).map(t => println(t)) {c

Re: How to use K-fold validation in spark-1.0?

2014-06-24 Thread Eustache DIEMERT
I'm interested in this topic too :) Are the MLLib core devs on this list ? E/ 2014-06-24 14:19 GMT+02:00 holdingonrobin : > Anyone knows anything about it? Or should I actually move this topic to a > MLlib specif mailing list? Any information is appreciated! Thanks! > > > > -- > View this mess

Re: MLLib inside Storm : silly or not ?

2014-06-20 Thread Eustache DIEMERT
e/batch learning. > > > On Thu, Jun 19, 2014 at 12:26 AM, Eustache DIEMERT > wrote: > >> Hi Sparkers, >> >> We have a Storm cluster and looking for a decent execution engine for >> machine learned models. What I've seen from MLLib is extremely positive, &

Re: MLLib inside Storm : silly or not ?

2014-06-19 Thread Eustache DIEMERT
rie.cs.understanding.edu, > which at least provides an online lda. > C > > > On Thursday, June 19, 2014, Eustache DIEMERT wrote: > >> Hi Sparkers, >> >> We have a Storm cluster and looking for a decent execution engine for >> machine learned models. What I

MLLib inside Storm : silly or not ?

2014-06-19 Thread Eustache DIEMERT
Hi Sparkers, We have a Storm cluster and looking for a decent execution engine for machine learned models. What I've seen from MLLib is extremely positive, but we can't just throw away our Storm based stack. So my question is: is it feasible/recommended to train models in Spark/MLLib and execute

Re: Random Forest on Spark

2014-04-18 Thread Eustache DIEMERT
sorry I mismatched the link, it should be https://gist.github.com/wpm/6454814 and the algorithm is not ExtraTrees but a basic ensemble of boosted trees. 2014-04-18 10:31 GMT+02:00 Eustache DIEMERT : > Another option is to use ExtraTrees as provided by scikit-learn with > pyspark: >

Re: Random Forest on Spark

2014-04-18 Thread Eustache DIEMERT
Is there a PR or issue where GBT / RF progress in MLLib is tracked ? 2014-04-17 21:11 GMT+02:00 Evan R. Sparks : > Sorry - I meant to say that "Multiclass classification, Gradient > Boosting, and Random Forest support based on the recent Decision Tree > implementation in MLlib is planned and com

Re: Random Forest on Spark

2014-04-18 Thread Eustache DIEMERT
Another option is to use ExtraTrees as provided by scikit-learn with pyspark: https://github.com/pydata/pyrallel/blob/master/pyrallel/ensemble.py#L27-L59 this is a proof of concept right now and should be hacked to what you need, but the core decision tree implementation is highly optimized and c

Re: Spark enables us to process Big Data on an ARM cluster !!

2014-03-20 Thread Eustache DIEMERT
Hey, do you have a blog post or url I can share ? This is a quite cool experiment ! E/ 2014-03-20 15:01 GMT+01:00 Chanwit Kaewkasi : > Hi Chester, > > It is on our todo-list but it doesn't work at the moment. The > Parallela cores can not be utilized by the JVM. So, Spark will just > use its A