Access several s3 buckets, with credentials containing "/"

2015-06-05 Thread Pierre B
Hi list! My problem is quite simple. I need to access several S3 buckets, using different credentials.: ``` val c1 = sc.textFile("s3n://[ACCESS_KEY_ID1:SECRET_ACCESS_KEY1]@bucket/file1.csv").count val c2 = sc.textFile("s3n://[ACCESS_KEY_ID2:SECRET_ACCESS_KEY2]@bucket/file1.csv").count val c3 = sc.

Access several s3 buckets, with credentials containing "/"

2015-06-05 Thread Pierre B
Hi list! My problem is quite simple. I need to access several S3 buckets, using different credentials.: ``` val c1 = sc.textFile("s3n://[ACCESS_KEY_ID1:SECRET_ACCESS_KEY1]@bucket1/file.csv").count val c2 = sc.textFile("s3n://[ACCESS_KEY_ID2:SECRET_ACCESS_KEY2]@bucket2/file.csv").count val c3 = sc.

[SQL] Self join with ArrayType columns problems

2015-01-26 Thread Pierre B
Using Spark 1.2.0, we are facing some weird behaviour when performing self join on a table with some ArrayType field. (potential bug ?) I have set up a minimal non working example here: https://gist.github.com/pierre-borckmans/4853cd6d0b2f2388bf4f

[Spark SQL]: Convert SchemaRDD back to RDD

2014-07-08 Thread Pierre B
Hi there! 1/ Is there a way to convert a SchemaRDD (for instance loaded from a parquet file) back to a RDD of a given case class? 2/ Even better, is there a way to get the schema information from a SchemaRDD ? I am trying to figure out how to properly get the various fields of the Rows of a Schem

Re: [Spark SQL]: Convert SchemaRDD back to RDD

2014-07-08 Thread Pierre B
Cool Thanks Michael! Message sent from a mobile device - excuse typos and abbreviations > Le 8 juil. 2014 à 22:17, Michael Armbrust [via Apache Spark User List] > a écrit : > >> On Tue, Jul 8, 2014 at 12:43 PM, Pierre B <[hidden email]> wrote: >> 1/ Is there a way

[SQL] Set Parquet block size?

2014-10-09 Thread Pierre B
Hi there! Is there a way to modify default parquet block size? I didn't see any reference to ParquetOutputFormat.setBlockSize in Spark code so I was wondering if there was a way to provide this option? I'm asking because we are facing Out of Memory issues when writing parquet files. The rdd we a

Re: Is there a way to look at RDD's lineage? Or debug a fault-tolerance error?

2014-10-09 Thread Pierre B
To add a bit on this one, if you look at RDD.scala in Spark code, you'll see that both "parent" and "firstParent" methods are protected[spark]. I guess, for good reasons, that I must admit I don't understand completely, you are not supposed to explore an RDD lineage programmatically... I had a u

Re: Spark SQL - custom aggregation function (UDAF)

2014-10-13 Thread Pierre B
Is it planned in a "near" future ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-custom-aggregation-function-UDAF-tp15784p16275.html Sent from the Apache Spark User List mailing list archive at Nabble.com. ---

[SQL] Is RANK function supposed to work in SparkSQL 1.1.0?

2014-10-21 Thread Pierre B
Hi! The RANK function is available in hive since version 0.11. When trying to use it in SparkSQL, I'm getting the following exception (full stacktrace below): java.lang.ClassCastException: org.apache.hadoop.hive.ql.udf.generic.GenericUDAFRank$RankBuffer cannot be cast to org.apache.hadoop.hive.ql.

Re: [SQL] Is RANK function supposed to work in SparkSQL 1.1.0?

2014-10-21 Thread Pierre B
Ok thanks Michael. In general, what's the easy way to figure out what's already implemented? The exception I was getting was not really helpful here? Also, is there a roadmap document somewhere ? Thanks! P. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.co

MissingRequirementError with spark

2015-01-15 Thread Pierre B
After upgrading our project to Spark 1.2.0, we get this error when doing a "sbt test": scala.reflect.internal.MissingRequirementError: class org.apache.spark.sql.catalyst.ScalaReflection The strange thing is that when running our test suites from IntelliJ, everything runs smoothly... Any idea w

Re: ScalaReflectionException when using saveAsParquetFile in sbt

2015-01-15 Thread Pierre B
Same problem here... Did u find a solution for this? P. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ScalaReflectionException-when-using-saveAsParquetFile-in-sbt-tp21020p21150.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: MissingRequirementError with spark

2015-01-15 Thread Pierre B
I found this, which might be useful: https://github.com/deanwampler/spark-workshop/blob/master/project/Build.scala I seems that forking is needed. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MissingRequirementError-with-spark-tp21149p21153.html Sent fr

Spark 0.9.0 - local mode - sc.addJar problem (bug?)

2014-03-02 Thread Pierre B
Hi all! In spark 0.9.0, local mode, whenever I try to add jar(s), using either SparkConf.addJars or SparkConfiguration.addJar, in the shell or in a standalone mode, I observe a strange behaviour. I investigated this because my standalone app works perfectly on my cluster but is getting stuck in l

Re: Spark 0.9.0 - local mode - sc.addJar problem (bug?)

2014-03-02 Thread Pierre B
I'm still puzzled why trying wget with my IP is not working properly, whereas it's working if I use 127.0.0.1 or localhost... ? -- View this message in context: http://apache-spark-use

Nested method in a class: Task not serializable?

2014-05-16 Thread Pierre B
Hi! I understand the usual "Task not serializable" issue that arises when accessing a field or a method that is out of scope of a closure. To fix it, I usually define a local copy of these fields/methods, which avoids the need to serialize the whole class: class MyClass(val myField: Any) { def

Use SparkListener to get overall progress of an action

2014-05-22 Thread Pierre B
Is there a simple way to monitor the overall progress of an action using SparkListener or anything else? I see that one can name an RDD... Could that be used to determine which action triggered a stage, ... ? Thanks Pierre -- View this message in context: http://apache-spark-user-list.10015

Re: Use SparkListener to get overall progress of an action

2014-05-22 Thread Pierre B
eers, > > aℕdy ℙetrella > about.me/noootsab > > > > > On Thu, May 22, 2014 at 4:51 PM, Pierre B <[hidden email]> wrote: > Is there a simple way to monitor the overall progress of an action using > SparkListener or anything else? > > I see that one can na

Re: Use SparkListener to get overall progress of an action

2014-05-23 Thread Pierre B
at 10:57 AM, Chester <[hidden email]> wrote: >> This is something we are interested as well. We are planning to investigate >> more on this. If someone has suggestions, we would love to hear. >> >> Chester >> >> Sent from my iPad >> >> On M

Re: Use SparkListener to get overall progress of an action

2014-05-23 Thread Pierre B
> information in a somewhat arbitrary format and will be deprecated soon. If > you find this feature useful, you can test it out by building the master > branch of Spark yourself, following the instructions in > https://github.com/apache/spark/pull/42. > > > > On 05/22/2014 08

Re: Spark Summit 2014 (Hotel suggestions)

2014-05-27 Thread Pierre B
Hi everyone! Any recommendation anyone? Pierre -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Summit-2014-Hotel-suggestions-tp5457p6424.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: SparkContext startup time out

2014-05-30 Thread Pierre B
I was annoyed by this as well. It appears that just permuting the order of decencies inclusion solves this problem: first spark, than your cdh hadoop distro. HTH, Pierre -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkContext-startup-time-out-tp1753p

Using sbt-pack with Spark 1.0.0

2014-06-01 Thread Pierre B
Hi all! We'be been using the sbt-pack sbt plugin (https://github.com/xerial/sbt-pack) for building our standalone Spark application for a while now. Until version 1.0.0, that worked nicely. For those who don't know the sbt-pack plugin, it basically copies all the dependencies JARs from your local

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-02 Thread Pierre B
Hi Michaël, Thanks for this. We could indeed do that. But I guess the question is more about the change of behaviour from 0.9.1 to 1.0.0. We never had to care about that in previous versions. Does that mean we have to manually remove existing files or is there a way to "aumotically" overwrite wh