Re: Random Forest hangs without trace of error

2017-05-30 Thread Morten Hornbech
Hi Sumona I’m afraid I never really resolved the issue. Actually I have just had to rollback an upgrade from 2.1.0 to 2.1.1 because it (for reasons unknown) reintroduced the issue in our nightly integration tests (see http://apache-spark-user-list.1001560.n3.nabble.com/Issue-upgrading-to-Spark-

Re: Random Forest hangs without trace of error

2017-05-30 Thread Sumona Routh
Hi Morten, Were you able to resolve your issue with RandomForest? I am having similar issues with a newly trained model (that does have larger number of trees, smaller minInstancesPerNode, which is by design to produce the best performing model). I wanted to get some feedback on how you solved you

Re: Random Forest hangs without trace of error

2016-12-11 Thread Marco Mistroni
OK. Did u change spark version? Java/scala/python version? Have u tried with different versions of any of the above? Hope this helps Kr On 10 Dec 2016 10:37 pm, "Morten Hornbech" wrote: > I haven’t actually experienced any non-determinism. We have nightly > integration tests comparing output fro

Re: Random Forest hangs without trace of error

2016-12-10 Thread Morten Hornbech
I haven’t actually experienced any non-determinism. We have nightly integration tests comparing output from random forests with no variations. The workaround we will probably try is to split the dataset, either randomly or on one of the variables, and then train a forest on each partition, which

Re: Random Forest hangs without trace of error

2016-12-10 Thread Marco Mistroni
Hello Morten ok. afaik there is a tiny bit of randomness in these ML algorithms (pls anyone correct me if i m wrong). In fact if you run your RDF code multiple times, it will not give you EXACTLY the same results (though accuracy and errors should me more or less similar)..at least this is what i f

Re: Random Forest hangs without trace of error

2016-12-10 Thread Marco Mistroni
Hi Bring back samples to 1k range to debugor as suggested reduce tree and bins had rdd running on same size data with no issues.or send me some sample code and data and I try it out on my ec2 instance ... Kr On 10 Dec 2016 3:16 am, "Md. Rezaul Karim" wrote: > I had similar experienc

Re: Random Forest hangs without trace of error

2016-12-09 Thread Md. Rezaul Karim
I had similar experience last week. Even I could not find any error trace. Later on, I did the following to get rid of the problem: i) I downgraded to Spark 2.0.0 ii) Decreased the value of maxBins and maxDepth Additionally, make sure that you set the featureSubsetStrategy as "auto" to let the al