RE: [MLlib] Performance issues when building GBM models

2015-02-09 Thread Christopher Thom
angrui Meng [mailto:men...@gmail.com] Sent: Tuesday, 10 February 2015 7:07 AM To: Christopher Thom Cc: user@spark.apache.org Subject: Re: [MLlib] Performance issues when building GBM models Could you check the Spark UI and see whether there are RDDs being kicked out during the computation? We cache the

[MLlib] Performance issues when building GBM models

2015-02-08 Thread Christopher Thom
nt at DecisionTreeMetadata.scala:111, took 5.495166 s Any thoughts or advice, or even suggestions on where to dig for more info would be welcome. thanks chris Christopher Thom QUANTIUM Level 25, 8 Chifley, 8-12 Chifley Square Sydney NSW 2000 T: +61 2 8222 3577 F: +61 2 9292 6444 W: quantium.c

RE: Does DecisionTree model in MLlib deal with missing values?

2015-01-11 Thread Christopher Thom
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For > additional commands, e-mail: user-h...@spark.apache.org > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mai

[MLlib] Scoring GBTs with a variable number of trees

2015-01-07 Thread Christopher Thom
imal" when the MSE is minimum. i.e. in a plot of MSE vs Number of trees, the error rate will decrease (as the model improves), hit a minimum (the optimal point), and then increase (as the model starts to overfit the data). cheers chris Christopher Thom QUANTIUM Level 25, 8 Chifley, 8-12

RE: python API for gradient boosting?

2015-01-05 Thread Christopher Thom
ay, 6 January 2015 8:43 AM To: Christopher Thom Cc: user@spark.apache.org Subject: Re: python API for gradient boosting? I created a JIRA for it: https://issues.apache.org/jira/browse/SPARK-5094. Hopefully someone would work on it and make it available in the 1.3 release. -Xiangrui On Sun, Jan 4,

python API for gradient boosting?

2015-01-05 Thread Christopher Thom
compelling. As an alternative, if it'll be a while before this API is implemented, does anyone have suggestions for scala replacements for the above python libraries? cheers chris Christopher Thom QUANTIUM Level 25, 8 Chifley, 8-12 Chifley Square Sydney NSW 2000 T: +61 2 8222 3577 F: +61 2 9292 64

python API for gradient boosting?

2015-01-04 Thread Christopher Thom
Hi, I wonder if anyone knows when a python API will be added for Gradient Boosted Trees? I see that java and scala APIs were added for the 1.2 release, and would love to be able to build GBMs in pyspark too. cheers chris Christopher Thom QUANTIUM Level 25, 8 Chifley, 8-12 Chifley Square