Hi,
Here are the commands that are used.
-
> spark.default.parallelism=1000
> sparkR.session()
Java ref type org.apache.spark.sql.SparkSession id 1
> sql("use test")
SparkDataFrame[]
> mydata <-sql("select c1 ,p1 ,rt1 ,c2 ,p2 ,rt2 ,avt,avn from test_temp2
where vdr = 'TEST31X' ")
>
> nrow(myda
What are the parameters you passed to the classifier and what is the size
of your train data? You are hitting that issue because one of the block
size is over 2G, repartitioning the data will help.
On Fri, Sep 15, 2017 at 7:55 PM, rpulluru wrote:
> Hi,
>
> I am using sparkR randomForest function
Hi,
I am using sparkR randomForest function and running into
java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE issue.
Looks like I am running into this issue
https://issues.apache.org/jira/browse/SPARK-1476, I used
spark.default.parallelism=1000 but still facing the same issue.