subject:"Size exceeds Integer.MAX_VALUE issue with RandomForest"

Re: Size exceeds Integer.MAX_VALUE issue with RandomForest

2017-09-18 Thread Pulluru Ranjith

Hi, Here are the commands that are used. - > spark.default.parallelism=1000 > sparkR.session() Java ref type org.apache.spark.sql.SparkSession id 1 > sql("use test") SparkDataFrame[] > mydata <-sql("select c1 ,p1 ,rt1 ,c2 ,p2 ,rt2 ,avt,avn from test_temp2 where vdr = 'TEST31X' ") > > nrow(myda

Re: Size exceeds Integer.MAX_VALUE issue with RandomForest

2017-09-16 Thread Akhil Das

What are the parameters you passed to the classifier and what is the size of your train data? You are hitting that issue because one of the block size is over 2G, repartitioning the data will help. On Fri, Sep 15, 2017 at 7:55 PM, rpulluru wrote: > Hi, > > I am using sparkR randomForest function

Size exceeds Integer.MAX_VALUE issue with RandomForest

2017-09-15 Thread rpulluru

Hi, I am using sparkR randomForest function and running into java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE issue. Looks like I am running into this issue https://issues.apache.org/jira/browse/SPARK-1476, I used spark.default.parallelism=1000 but still facing the same issue.