This looks promising. I'm trying to use spark-ec2 to launch a cluster with
Spark 1.5.0-SNAPSHOT and failing.
Where should we ask questions, report problems?
I couple of questions I have already after looking through the project:
- Where does the configuration file /spark-deployer.conf/ go (w
rxin wrote
>
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.5.0-rc2-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
>
I was
I ran into a similar problem, reading a csv file into a DataFrame and saving
to Parquet with 'partitionBy', and getting OutOfMemory error even though
it's not a large data file.
I discovered that by default Spark appears to be allocating a block of 128MB
in memory for each output Parquet partition