RE: Running synchronized JRI code

Sun, Rui Sun, 14 Feb 2016 19:56:57 -0800

For YARN mode, you can set --executor-cores 1

-----Original Message-----
From: Sun, Rui [mailto:[email protected]] 
Sent: Monday, February 15, 2016 11:35 AM
To: Simon Hafner <[email protected]>; user <[email protected]>
Subject: RE: Running synchronized JRI code


Yes, JRI loads an R dynamic library into the executor JVM, which faces 
thread-safe issue when there are multiple task threads within the executor.

If you are running Spark on Standalone mode, it is possible to run multiple 
workers per node, and at the same time, limit the cores per worker to be 1. 

You could use RDD.pipe(), but you may need handle binary-text conversion as the 
input/output to/from the R process is string-based.

I am thinking if the demand like yours (calling R code in RDD transformations) 
is much desired, we may consider refactoring RRDD for this purpose, although it 
is currently intended for internal use by SparkR and not a public API. 

-----Original Message-----
From: Simon Hafner [mailto:[email protected]] 
Sent: Monday, February 15, 2016 5:09 AM
To: user <[email protected]>
Subject: Running synchronized JRI code

Hello

I'm currently running R code in an executor via JRI. Because R is 
single-threaded, any call to R needs to be wrapped in a `synchronized`. Now I 
can use a bit more than one core per executor, which is undesirable. Is there a 
way to tell spark that this specific application (or even specific UDF) needs 
multiple JVMs? Or should I switch from JRI to a pipe-based (slower) setup?

Cheers,
Simon

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected] For additional 
commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

RE: Running synchronized JRI code

Reply via email to