Re: enum-like types in Spark

2015-03-05 Thread Mridul Muralidharan
I have a strong dislike for java enum's due to the fact that they are not stable across JVM's - if it undergoes serde, you end up with unpredictable results at times [1]. One of the reasons why we prevent enum's from being key : though it is highly possible users might depend on it internally and

Re: enum-like types in Spark

2015-03-05 Thread Xiangrui Meng
For #4, my previous proposal may confuse the IDEs with additional types generated by the case objects, and their toString contain the underscore. The following works better: sealed abstract class StorageLevel object StorageLevel { final val MemoryOnly: StorageLevel = { case object MemoryOnl

[RESULT] [VOTE] Release Apache Spark 1.3.0 (RC2)

2015-03-05 Thread Patrick Wendell
This vote is cancelled in favor of RC3. On Wed, Mar 4, 2015 at 3:22 PM, Sean Owen wrote: > I think we will have to fix > https://issues.apache.org/jira/browse/SPARK-5143 as well before the > final 1.3.x. > > But yes everything else checks out for me, including sigs and hashes > and building the s

[VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-05 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.3.0! The tag to be voted on is v1.3.0-rc2 (commit 4aaf48d4): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4aaf48d46d13129f0f9bdafd771dd80fe568a7dc The release files, including signatures, digests, etc. ca

Re: over 10000 commits!

2015-03-05 Thread shane knapp
WOOT! On Thu, Mar 5, 2015 at 1:26 PM, Reynold Xin wrote: > We reached a new milestone today. > > https://github.com/apache/spark > > > 10,001 commits now. Congratulations to Xiangrui for making the 1th > commit! >

over 10000 commits!

2015-03-05 Thread Reynold Xin
We reached a new milestone today. https://github.com/apache/spark 10,001 commits now. Congratulations to Xiangrui for making the 1th commit!

Re: enum-like types in Spark

2015-03-05 Thread Imran Rashid
I have a very strong dislike for #1 (scala enumerations). I'm ok with #4 (with Xiangrui's final suggestion, especially making it sealed & available in Java), but I really think #2, java enums, are the best option. Java enums actually have some very real advantages over the other approaches -- yo

Re: Fwd: Unable to Read/Write Avro RDD on cluster.

2015-03-05 Thread M. Dale
There was a avro-mapred version conflict described in https://issues.apache.org/jira/browse/SPARK-3039. Fixed by https://github.com/apache/spark/pull/4315 for Spark 1.3. Here is a link that describes how to fix Spark 1.2.1 for avro-mapred hadoop2: https://github.com/medale/spark-mail/blob/mast

Re: short jenkins 7am downtime tomorrow morning (3-5-15)

2015-03-05 Thread shane knapp
we're all back up and building now... looks like the package/kernel updates went off w/o a hitch! On Thu, Mar 5, 2015 at 6:57 AM, shane knapp wrote: > this is happening now. i'm waiting for the pull request builders to > finish (~16 mins) before i start. > > On Wed, Mar 4, 2015 at 1:06 PM, sha

Re: short jenkins 7am downtime tomorrow morning (3-5-15)

2015-03-05 Thread shane knapp
this is happening now. i'm waiting for the pull request builders to finish (~16 mins) before i start. On Wed, Mar 4, 2015 at 1:06 PM, shane knapp wrote: > the master and workers need some system and package updates, and i'll also > be rebooting the machines as well. > > this shouldn't take very

RE: Unable to Read/Write Avro RDD on cluster.

2015-03-05 Thread java8964
You can give Spark-Avro a try. It works great for our project. https://github.com/databricks/spark-avro > From: deepuj...@gmail.com > Date: Thu, 5 Mar 2015 10:27:04 +0530 > Subject: Fwd: Unable to Read/Write Avro RDD on cluster. > To: dev@spark.apache.org > > I am trying to read RDD avro, transfo

Re: Unable to Read/Write Avro RDD on cluster.

2015-03-05 Thread Akhil Das
Here's a workaround: - Download and put this jar in the SPARK_CLASSPATH in all workers - Make sure that jar is present in the same path in all workers. Thanks Best Regards On Thu, Mar 5, 2015 at 10:2

Re: Which OutputCommitter to use for S3?

2015-03-05 Thread Aaron Davidson
Yes, unfortunately that direct dependency makes this injection much more difficult for saveAsParquetFile. On Thu, Mar 5, 2015 at 12:28 AM, Pei-Lun Lee wrote: > Thanks for the DirectOutputCommitter example. > However I found it only works for saveAsHadoopFile. What about > saveAsParquetFile? > It

Re: Which OutputCommitter to use for S3?

2015-03-05 Thread Pei-Lun Lee
Thanks for the DirectOutputCommitter example. However I found it only works for saveAsHadoopFile. What about saveAsParquetFile? It looks like SparkSQL is using ParquetOutputCommitter, which is subclass of FileOutputCommitter. On Fri, Feb 27, 2015 at 1:52 AM, Thomas Demoor wrote: > FYI. We're cur

Re: enum-like types in Spark

2015-03-05 Thread Patrick Wendell
Yes - only new or internal API's. I doubt we'd break any exposed APIs for the purpose of clean up. Patrick On Mar 5, 2015 12:16 AM, "Mridul Muralidharan" wrote: > While I dont have any strong opinions about how we handle enum's > either way in spark, I assume the discussion is targetted at (new)

Re: enum-like types in Spark

2015-03-05 Thread Mridul Muralidharan
While I dont have any strong opinions about how we handle enum's either way in spark, I assume the discussion is targetted at (new) api being designed in spark. Rewiring what we already have exposed will lead to incompatible api change (StorageLevel for example, is in 1.0). Regards, Mridul On Wed