Spark on EMR

2015-06-16 Thread kamatsuoka
Spark is now officially supported on Amazon Elastic Map Reduce: http://aws.amazon.com/elasticmapreduce/details/spark/ -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-EMR-tp23343.html Sent from the Apache Spark User List mailing list archive at Nabbl

Spark SQL filter DataFrame by date?

2015-03-19 Thread kamatsuoka
I'm trying to filter a DataFrame by a date column, with no luck so far. Here's what I'm doing: When I run reqs_day.count() I get zero, apparently because my date parameter gets translated to 16509. Is this a bug, or am I doing it wrong? -- View this message in context: http://apache-spark-

Re: How to read a multipart s3 file?

2014-05-15 Thread kamatsuoka
Whereas with s3://, the write takes 32 seconds and the rename takes 33 seconds: 14/05/06 20:23:08 INFO DAGScheduler: Stage 0 (saveAsTextFile at FileCopy.scala:17) finished in 32.208 s 14/05/06 20:23:08 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 14/05/06

Re: How to read a multipart s3 file?

2014-05-13 Thread kamatsuoka
Thanks Nicholas! I looked at those docs several times without noticing that critical part you highlighted. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-read-a-multipart-s3-file-tp5463p5494.html Sent from the Apache Spark User List mailing list arc

Re: How to read a multipart s3 file?

2014-05-10 Thread kamatsuoka
For example, this app just reads a 4GB file and writes a copy of it. It takes 41 seconds to write the file, then 3 more minutes to move all the temporary files. I guess this is an issue with the hadoop / jets3t code layer, not Spark. 14/05/06 20:11:41 INFO TaskSetManager: Finished TID 63 in 8688

Re: How to read a multipart s3 file?

2014-05-07 Thread kamatsuoka
Yes, I'm using s3:// for both. I was using s3n:// but I got frustrated by how slow it is at writing files. In particular the phases where it moves the temporary files to their permanent location takes as long as writing the file itself. I can't believe anyone uses this. -- View this message in

How to read a multipart s3 file?

2014-05-06 Thread kamatsuoka
I have a Spark app that writes out a file, s3://mybucket/mydir/myfile.txt. Behind the scenes, the S3 driver creates a bunch of files like s3://mybucket//mydir/myfile.txt/part-, as well as the block files like s3://mybucket/block_3574186879395643429. How do I construct an url to use this file

Re: Spark 0.9.1 -- assembly fails?

2014-04-28 Thread kamatsuoka
Um. When I updated the spark dependency, I unintentially deleted the "provided" attribute. Oops. Nothing to see here . . . -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-0-9-1-assembly-fails-tp4979p4982.html Sent from the Apache Spark User List mai

Re: Spark 0.9.1 -- assembly fails?

2014-04-28 Thread kamatsuoka
So, good news and bad news. I have a customized Build.scala that allows me to use the 'run' and 'assembly' commands in sbt without toggling the 'pro

Spark 0.9.1 -- assembly fails?

2014-04-28 Thread kamatsuoka
After upgrading to Spark 0.9.1, sbt assembly is failing. I'm trying to fix it with merge strategy, etc., but is anyone else seeing this? For example, -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-0-9-1-assembly-fails-tp4979.html Sent from the Apa

Re: Anyone using value classes in RDDs?

2014-04-19 Thread kamatsuoka
wrote: > isn't valueclasses for primitives (AnyVal) only? that doesn't apply to > string, which is an object (AnyRef) > > > On Fri, Apr 18, 2014 at 2:51 PM, kamatsuoka <[hidden > email]<http://user/SendEmail.jtp?type=node&node=4475&i=0> > > wrote: >

Anyone using value classes in RDDs?

2014-04-18 Thread kamatsuoka
I'm wondering if anyone has tried using value classes in RDDs? My use case is that I have a number of RDDs containing strings, e.g. val r1: RDD[(String, (String, Int)] = ... val r2: RDD[(String, (String, Int)] = ... and it might be clearer if I wrote case class ID(val id: String) extends AnyVa

Spark with SSL?

2014-04-08 Thread kamatsuoka
Can Spark be configured to use SSL for all its network communication? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-with-SSL-tp3916.html Sent from the Apache Spark User List mailing list archive at Nabble.com.