I can’t reproduce your issue with len=10000 in local mode.
Could you give more environment information?
> On Aug 9, 2016, at 11:35, Shane Lee <shane_y_...@yahoo.com.INVALID> wrote:
>
> Hi All,
>
> I am trying out SparkR 2.0 and have run into an issue with repartition.
>
> Here is the R code (essentially a port of the pi-calculating scala example in
> the spark package) that can reproduce the behavior:
>
> schema <- structType(structField("input", "integer"),
> structField("output", "integer"))
>
> library(magrittr)
>
> len = 3000
> data.frame(n = 1:len) %>%
> as.DataFrame %>%
> SparkR:::repartition(10L) %>%
> dapply(., function (df)
> {
> library(plyr)
> ddply(df, .(n), function (y)
> {
> data.frame(z =
> {
> x1 = runif(1) * 2 - 1
> y1 = runif(1) * 2 - 1
> z = x1 * x1 + y1 * y1
> if (z < 1)
> {
> 1L
> }
> else
> {
> 0L
> }
> })
> })
> }
> , schema
> ) %>%
> SparkR:::summarize(total = sum(.$output)) %>% collect * 4 / len
>
> For me it runs fine as long as len is less than 5000, otherwise it errors out
> with the following message:
>
> Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 6
> in stage 56.0 failed 4 times, most recent failure: Lost task 6.3 in stage
> 56.0 (TID 899, LARBGDV-VM02): org.apache.spark.SparkException: R computation
> failed with
> Error in readBin(con, raw(), stringLen, endian = "big") :
> invalid 'n' argument
> Calls: <Anonymous> -> readBin
> Execution halted
> at org.apache.spark.api.r.RRunner.compute(RRunner.scala:108)
> at
> org.apache.spark.sql.execution.r.MapPartitionsRWrapper.apply(MapPartitionsRWrapper.scala:59)
> at
> org.apache.spark.sql.execution.r.MapPartitionsRWrapper.apply(MapPartitionsRWrapper.scala:29)
> at
> org.apache.spark.sql.execution.MapPartitionsExec$$anonfun$6.apply(objects.scala:178)
> at
> org.apache.spark.sql.execution.MapPartitionsExec$$anonfun$6.apply(objects.scala:175)
> at
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:784)
> at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$
>
> If the repartition call is removed, it runs fine again, even with very large
> len.
>
> After looking through the documentations and searching the web, I can't seem
> to find any clues how to fix this. Anybody has seen similary problem?
>
> Thanks in advance for your help.
>
> Shane
>