For YARN mode, you can set --executor-cores 1

-----Original Message-----
From: Sun, Rui [mailto:[email protected]] 
Sent: Monday, February 15, 2016 11:35 AM
To: Simon Hafner <[email protected]>; user <[email protected]>
Subject: RE: Running synchronized JRI code

Yes, JRI loads an R dynamic library into the executor JVM, which faces 
thread-safe issue when there are multiple task threads within the executor.

If you are running Spark on Standalone mode, it is possible to run multiple 
workers per node, and at the same time, limit the cores per worker to be 1. 

You could use RDD.pipe(), but you may need handle binary-text conversion as the 
input/output to/from the R process is string-based.

I am thinking if the demand like yours (calling R code in RDD transformations) 
is much desired, we may consider refactoring RRDD for this purpose, although it 
is currently intended for internal use by SparkR and not a public API. 

-----Original Message-----
From: Simon Hafner [mailto:[email protected]] 
Sent: Monday, February 15, 2016 5:09 AM
To: user <[email protected]>
Subject: Running synchronized JRI code

Hello

I'm currently running R code in an executor via JRI. Because R is 
single-threaded, any call to R needs to be wrapped in a `synchronized`. Now I 
can use a bit more than one core per executor, which is undesirable. Is there a 
way to tell spark that this specific application (or even specific UDF) needs 
multiple JVMs? Or should I switch from JRI to a pipe-based (slower) setup?

Cheers,
Simon

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected] For additional 
commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to