Using Spark, SparkR and Ranger, please help.

2016-01-20 Thread Julien Carme
Hello, I have been able to use Spark with Apache Ranger. I had the right configuration files to Spark conf, I add Ranger jars to the classpath and it works, Spark complies to Ranger rules when I access Hive tables. However with SparkR it does not work, which is rather surprising considering Spark

Re: Saving RDD with array of strings

2014-09-21 Thread Julien Carme
Just use flatMap, it does exactly what you need: newLines.flatMap { lines => lines }.saveAsTextFile(...) 2014-09-21 11:26 GMT+02:00 Sarath Chandra < sarathchandra.jos...@algofusiontech.com>: > Hi All, > > If my RDD is having array/sequence of strings, how can I save them as a > HDFS file with e

Issues with partitionBy: FetchFailed

2014-09-21 Thread Julien Carme
Hello, I am facing an issue with partitionBy, it is not clear whether it is a problem with my code or with my spark setup. I am using Spark 1.1, standalone, and my other spark projects work fine. So I have to repartition a relatively large file (about 70 million lines). Here is a minimal version

Strange exception while accessing hdfs from spark.

2014-09-18 Thread Julien Carme
Hello, I have been using Spark for quite some time, and I now get this error (please stderr output below) when accessing hdfs. It seems to come from Hadoop, however, I can access hdfs from the command line without any problem. The WARN on the first seems to be key, because it never appeared previ

Re: ReduceByKey performance optimisation

2014-09-13 Thread Julien Carme
in memory. > I bet you can make it faster than this example too. > > > On Sat, Sep 13, 2014 at 1:15 PM, Gary Malouf > wrote: > > You need something like: > > > > val x: RDD[MyAwesomeObject] > > > > x.map(obj => obj.fieldtobekey -> obj).reduceByK

Re: ReduceByKey performance optimisation

2014-09-13 Thread Julien Carme
nct() should be > much better. > > On Sat, Sep 13, 2014 at 10:46 AM, Julien Carme > wrote: > > Hello, > > > > I am facing performance issues with reduceByKey. In know that this topic > has > > already been covered but I did not really find answers to my q

ReduceByKey performance optimisation

2014-09-13 Thread Julien Carme
Hello, I am facing performance issues with reduceByKey. In know that this topic has already been covered but I did not really find answers to my question. I am using reduceByKey to remove entries with identical keys, using, as reduce function, (a,b) => a. It seems to be a relatively straightforwa

Re: Using an external jar in the driver, in yarn-standalone mode.

2014-03-26 Thread Julien Carme
just using ordinary, everyday java/scala - so it just has > to be on the normal java classpath. > > Could that be your issue? > > -Nathan > > > > On Tue, Mar 25, 2014 at 2:18 PM, Sandy Ryza wrote: > > Hi Julien, > > Have you called SparkContext#addJars?

Re: Using an external jar in the driver, in yarn-standalone mode.

2014-03-25 Thread Julien Carme
2014 at 2:18 PM, Sandy Ryza wrote: > >> Hi Julien, >> >> Have you called SparkContext#addJars? >> >> -Sandy >> >> >> On Tue, Mar 25, 2014 at 10:05 AM, Julien Carme wrote: >> >>> Hello, >>> >>> I have been struggling

Using an external jar in the driver, in yarn-standalone mode.

2014-03-25 Thread Julien Carme
Hello, I have been struggling for ages to use an external jar in my spark driver program, in yarn-standalone mode. I just want to use in my main program, outside the calls to spark functions, objects that are defined in another jar. I tried to set SPARK_CLASSPATH, ADD_JAR, I tried to use --addJar