Hi Yanbo, As long as two models fit into memory of a single machine, there should be no problems, so even 16GB machines can handle large models. (master should have more memory because it runs LBFGS) In my experiments, I’ve trained the models 12M and 32M parameters without issues.
Best regards, Alexander From: Yanbo Liang [mailto:yblia...@gmail.com] Sent: Sunday, December 27, 2015 2:23 AM To: Joseph Bradley Cc: Eugene Morozov; user; dev@spark.apache.org Subject: Re: SparkML algos limitations question. Hi Eugene, AFAIK, the current implementation of MultilayerPerceptronClassifier have some scalability problems if the model is very huge (such as >10M), although I think the limitation can cover many use cases already. Yanbo 2015-12-16 6:00 GMT+08:00 Joseph Bradley <jos...@databricks.com<mailto:jos...@databricks.com>>: Hi Eugene, The maxDepth parameter exists because the implementation uses Integer node IDs which correspond to positions in the binary tree. This simplified the implementation. I'd like to eventually modify it to avoid depending on tree node IDs, but that is not yet on the roadmap. There is not an analogous limit for the GLMs you listed, but I'm not very familiar with the perceptron implementation. Joseph On Mon, Dec 14, 2015 at 10:52 AM, Eugene Morozov <evgeny.a.moro...@gmail.com<mailto:evgeny.a.moro...@gmail.com>> wrote: Hello! I'm currently working on POC and try to use Random Forest (classification and regression). I also have to check SVM and Multiclass perceptron (other algos are less important at the moment). So far I've discovered that Random Forest has a limitation of maxDepth for trees and just out of curiosity I wonder why such a limitation has been introduced? An actual question is that I'm going to use Spark ML in production next year and would like to know if there are other limitations like maxDepth in RF for other algorithms: Logistic Regression, Perceptron, SVM, etc. Thanks in advance for your time. -- Be well! Jean Morozov