Re: Introduce a sbt plugin to deploy and submit jobs to a spark cluster on ec2

2015-08-26 Thread rake
This looks promising. I'm trying to use spark-ec2 to launch a cluster with Spark 1.5.0-SNAPSHOT and failing. Where should we ask questions, report problems? I couple of questions I have already after looking through the project: - Where does the configuration file /spark-deployer.conf/ go (w

Re: [VOTE] Release Apache Spark 1.5.0 (RC2)

2015-08-26 Thread rake
rxin wrote > > > The release files, including signatures, digests, etc. can be found at: > http://people.apache.org/~pwendell/spark-releases/spark-1.5.0-rc2-bin/ > > Release artifacts are signed with the following key: > https://people.apache.org/keys/committer/pwendell.asc > > I was

Re: [DataFrame] partitionBy issues

2015-06-30 Thread rake
I ran into a similar problem, reading a csv file into a DataFrame and saving to Parquet with 'partitionBy', and getting OutOfMemory error even though it's not a large data file. I discovered that by default Spark appears to be allocating a block of 128MB in memory for each output Parquet partition