Wrong temp directory when compressing before sending text file to S3

2014-11-06 Thread Gary Malouf
We have some data that we are exporting from our HDFS cluster to S3 with some help from Spark. The final RDD command we run is: csvData.saveAsTextFile("s3n://data/mess/2014/11/dump-oct-30-to-nov-5-gzip", classOf[GzipCodec]) We have our 'spark.local.dir' set to our large ephemeral partition on e

Re: Error reading HDFS file using spark 0.9.0 / hadoop 2.2.0 - incompatible protobuf 2.5 and 2.4.1

2014-03-25 Thread Gary Malouf
Can anyone verify the claims from Aureliano regarding the Akka dependency protobuf collision? Our team has a major need to upgrade to protobuf 2.5.0 up the pipe and Spark seems to be the blocker here. On Fri, Mar 21, 2014 at 6:49 PM, Aureliano Buendia wrote: > > > > On Tue, Mar 18, 2014 at 12:5

Re: Akka problem when using scala command to launch Spark applications in the current 0.9.0-SNAPSHOT

2014-04-14 Thread Gary Malouf
Sorry to dig up an old issue. We build an assembly against spark-0.9.0-RC3 to run on our Spark cluster on top of Mesos. When we upgraded to 0.9.0-RC3 from an earlier master cut from November, we ran into Akka issues described above. Is it supported to be able to deploy this jar using the Spark c

Re: Spark on Scala 2.11

2014-05-10 Thread Gary Malouf
Considering the team just bumped to 2.10 in 0.9, I would be surprised if this is a near term priority. On Thu, May 8, 2014 at 9:33 PM, Anand Avati wrote: > Is there an ongoing effort (or intent) to support Spark on Scala 2.11? > Approximate timeline? > > Thanks >

Re: [VOTE] Release Apache Spark 1.0.1 (RC2)

2014-07-10 Thread Gary Malouf
-1 I honestly do not know the voting rules for the Spark community, so please excuse me if I am out of line or if Mesos compatibility is not a concern at this point. We just tried to run this version built against 2.3.0-cdh5.0.2 on mesos 0.18.2. All of our jobs with data above a few gigabytes hun

Re: [VOTE] Release Apache Spark 1.0.1 (RC2)

2014-07-10 Thread Gary Malouf
Just realized the deadline was Monday, my apologies. The issue nevertheless stands. On Thu, Jul 10, 2014 at 9:28 PM, Gary Malouf wrote: > -1 I honestly do not know the voting rules for the Spark community, so > please excuse me if I am out of line or if Mesos compatibility is not a >

Re: [VOTE] Release Apache Spark 1.0.1 (RC2)

2014-07-11 Thread Gary Malouf
gt; but it really requires narrowing down the issue to get more > > information about the scope and severity. Could you fork another > > thread for this? > > > > - Patrick > > > > On Thu, Jul 10, 2014 at 6:28 PM, Gary Malouf > wrote: > >> -1 I hones

Re: Reproducible deadlock in 1.0.1, possibly related to Spark-1097

2014-07-14 Thread Gary Malouf
We use the Hadoop configuration inside of our code executing on Spark as we need to list out files in the path. Maybe that is why it is exposed for us. On Mon, Jul 14, 2014 at 6:57 PM, Patrick Wendell wrote: > Hey Nishkam, > > Aaron's fix should prevent two concurrent accesses to getJobConf (a

Re: Reproducible deadlock in 1.0.1, possibly related to Spark-1097

2014-07-14 Thread Gary Malouf
We'll try to run a build tomorrow AM. On Mon, Jul 14, 2014 at 7:22 PM, Patrick Wendell wrote: > Andrew and Gary, > > Would you guys be able to test > https://github.com/apache/spark/pull/1409/files and see if it solves > your problem? > > - Patrick > > On Mon, Jul 14, 2014 at 4:18 PM, Andrew As

Kryo Issue on Spark 1.0.1, Mesos 0.18.2

2014-07-25 Thread Gary Malouf
After upgrading to Spark 1.0.1 from 0.9.1 everything seemed to be going well. Looking at the Mesos slave logs, I noticed: ERROR KryoSerializer: Failed to run spark.kryo.registrator java.lang.ClassNotFoundException: com/mediacrossing/verrazano/kryo/MxDataRegistrator My spark-env.sh has the follow

Re: Kryo Issue on Spark 1.0.1, Mesos 0.18.2

2014-07-25 Thread Gary Malouf
g loaded. On Fri, Jul 25, 2014 at 2:27 PM, Gary Malouf wrote: > After upgrading to Spark 1.0.1 from 0.9.1 everything seemed to be going > well. Looking at the Mesos slave logs, I noticed: > > ERROR KryoSerializer: Failed to run spark.kryo.registrator > java.lang.Cla

Re: replacement for SPARK_JAVA_OPTS

2014-08-07 Thread Gary Malouf
Can this be cherry-picked for 1.1 if everything works out? In my opinion, it could be qualified as a bug fix. On Thu, Aug 7, 2014 at 5:47 PM, Marcelo Vanzin wrote: > Andrew has been working on a fix: > https://github.com/apache/spark/pull/1770 > > On Thu, Aug 7, 2014 at 2:35 PM, Cody Koeninger

Re: [SPARK-3050] Spark program running with 1.0.2 jar cannot run against a 1.0.1 cluster

2014-08-14 Thread Gary Malouf
To be clear, is it 'compiled' against 1.0.2 or it packaged with it? On Thu, Aug 14, 2014 at 6:39 PM, Mingyu Kim wrote: > I ran a really simple code that runs with Spark 1.0.2 jar and connects to > a Spark 1.0.1 cluster, but it fails with java.io.InvalidClassException. I > filed the bug at https

Spark 1.1.0 Progress

2014-08-18 Thread Gary Malouf
I understand there must still being work done preventing the cutting of an RC, is the specific remaining items tracked just through Jira?

Mesos/Spark Deadlock

2014-08-23 Thread Gary Malouf
I just wanted to bring up a significant Mesos/Spark issue that makes the combo difficult to use for teams larger than 4-5 people. It's covered in https://issues.apache.org/jira/browse/MESOS-1688. My understanding is that Spark's use of executors in fine-grained mode is a very different behavior t

Re: Mesos/Spark Deadlock

2014-08-23 Thread Gary Malouf
u can use Mesos in > coarse-grained mode by setting spark.mesos.coarse=true. Then it will hold > onto CPUs for the duration of the job. > > Matei > > On August 23, 2014 at 7:57:30 AM, Gary Malouf (malouf.g...@gmail.com) > wrote: > > I just wanted to bring up a significant Mesos

Re: Mesos/Spark Deadlock

2014-08-25 Thread Gary Malouf
t; https://github.com/apache/spark/pull/1860 into Spark 1.1. Incidentally > have you tried that? > > > > Matei > > > > On August 23, 2014 at 4:30:27 PM, Gary Malouf (malouf.g...@gmail.com) > wrote: > > > > Hi Matei, > > > > We have an analytics

CoHadoop Papers

2014-08-26 Thread Gary Malouf
One of my colleagues has been questioning me as to why Spark/HDFS makes no attempts to try to co-locate related data blocks. He pointed to this paper: http://www.vldb.org/pvldb/vol4/p575-eltabakh.pdf from 2011 on the CoHadoop research and the performance improvements it yielded for Map/Reduce jobs

Re: CoHadoop Papers

2014-08-26 Thread Gary Malouf
It appears support for this type of control over block placement is going out in the next version of HDFS: https://issues.apache.org/jira/browse/HDFS-2576 On Tue, Aug 26, 2014 at 7:43 AM, Gary Malouf wrote: > One of my colleagues has been questioning me as to why Spark/HDFS makes no > at

Re: CoHadoop Papers

2014-08-26 Thread Gary Malouf
that's outside of Spark. On that note, > Hadoop does also make attempts to collocate data, e.g., rack awareness. I'm > sure the paper makes useful contributions for its set of use cases. > > Sent while mobile. Pls excuse typos etc. > On Aug 26, 2014 5:21 AM, "Gary Malouf&

Re: CoHadoop Papers

2014-08-26 Thread Gary Malouf
oning about partitioning and the need to > shuffle in the Spark SQL planner. > > > On Tue, Aug 26, 2014 at 8:37 AM, Gary Malouf > wrote: > >> Christopher, can you expand on the co-partitioning support? >> >> We have a number of spark SQL tables (saved in parquet for

Re: parquet predicate / projection pushdown into unionAll

2014-09-09 Thread Gary Malouf
I'm kind of surprised this was not run into before. Do people not segregate their data by day/week in the HDFS directory structure? On Tue, Sep 9, 2014 at 2:08 PM, Michael Armbrust wrote: > Thanks! > > On Tue, Sep 9, 2014 at 11:07 AM, Cody Koeninger > wrote: > > > Opened > > > > https://issue

Re: guava version conflicts

2014-09-22 Thread Gary Malouf
Hi Marcelo, Interested to hear the approach to be taken. Shading guava itself seems extreme, but that might make sense. Gary On Sat, Sep 20, 2014 at 9:38 PM, Marcelo Vanzin wrote: > Hmm, looks like the hack to maintain backwards compatibility in the > Java API didn't work that well. I'll take

Re: Parquet schema migrations

2014-10-24 Thread Gary Malouf
Hi Michael, Does this affect people who use Hive for their metadata store as well? I'm wondering if the issue is as bad as I think it is - namely that if you build up a year's worth of data, adding a field forces you to have to migrate that entire year's data. Gary On Wed, Oct 8, 2014 at 5:08 P

Parquet Migrations

2014-10-31 Thread Gary Malouf
Outside of what is discussed here as a future solution, is there any path for being able to modify a Parquet schema once some data has been written? This seems like the kind of thing that should make people pause when considering whether or not to