Does this matter for our own internal types in Spark? I don't think any of these types are designed to be used in RDD records, for instance.
On Mon, Mar 9, 2015 at 6:25 PM, Aaron Davidson <ilike...@gmail.com> wrote: > Perhaps the problem with Java enums that was brought up was actually that > their hashCode is not stable across JVMs, as it depends on the memory > location of the enum itself. > > On Mon, Mar 9, 2015 at 6:15 PM, Imran Rashid <iras...@cloudera.com> wrote: > >> Can you expand on the serde issues w/ java enum's at all? I haven't heard >> of any problems specific to enums. The java object serialization rules >> seem very clear and it doesn't seem like different jvms should have a >> choice on what they do: >> >> >> http://docs.oracle.com/javase/6/docs/platform/serialization/spec/serial-arch.html#6469 >> >> (in a nutshell, serialization must use enum.name()) >> >> of course there are plenty of ways the user could screw this up(eg. rename >> the enums, or change their meaning, or remove them). But then again, all >> of java serialization has issues w/ serialization the user has to be aware >> of. Eg., if we go with case objects, than java serialization blows up if >> you add another helper method, even if that helper method is completely >> compatible. >> >> Some prior debate in the scala community: >> >> https://groups.google.com/d/msg/scala-internals/8RWkccSRBxQ/AN5F_ZbdKIsJ >> >> SO post on which version to use in scala: >> >> >> http://stackoverflow.com/questions/1321745/how-to-model-type-safe-enum-types >> >> SO post about the macro-craziness people try to add to scala to make them >> almost as good as a simple java enum: >> (NB: the accepted answer doesn't actually work in all cases ...) >> >> >> http://stackoverflow.com/questions/20089920/custom-scala-enum-most-elegant-version-searched >> >> Another proposal to add better enums built into scala ... but seems to be >> dormant: >> >> https://groups.google.com/forum/#!topic/scala-sips/Bf82LxK02Kk >> >> >> >> On Thu, Mar 5, 2015 at 10:49 PM, Mridul Muralidharan <mri...@gmail.com> >> wrote: >> >> > I have a strong dislike for java enum's due to the fact that they >> > are not stable across JVM's - if it undergoes serde, you end up with >> > unpredictable results at times [1]. >> > One of the reasons why we prevent enum's from being key : though it is >> > highly possible users might depend on it internally and shoot >> > themselves in the foot. >> > >> > Would be better to keep away from them in general and use something more >> > stable. >> > >> > Regards, >> > Mridul >> > >> > [1] Having had to debug this issue for 2 weeks - I really really hate it. >> > >> > >> > On Thu, Mar 5, 2015 at 1:08 PM, Imran Rashid <iras...@cloudera.com> >> wrote: >> > > I have a very strong dislike for #1 (scala enumerations). I'm ok with >> > #4 >> > > (with Xiangrui's final suggestion, especially making it sealed & >> > available >> > > in Java), but I really think #2, java enums, are the best option. >> > > >> > > Java enums actually have some very real advantages over the other >> > > approaches -- you get values(), valueOf(), EnumSet, and EnumMap. There >> > has >> > > been endless debate in the Scala community about the problems with the >> > > approaches in Scala. Very smart, level-headed Scala gurus have >> > complained >> > > about their short-comings (Rex Kerr's name is coming to mind, though >> I'm >> > > not positive about that); there have been numerous well-thought out >> > > proposals to give Scala a better enum. But the powers-that-be in Scala >> > > always reject them. IIRC the explanation for rejecting is basically >> that >> > > (a) enums aren't important enough for introducing some new special >> > feature, >> > > scala's got bigger things to work on and (b) if you really need a good >> > > enum, just use java's enum. >> > > >> > > I doubt it really matters that much for Spark internals, which is why I >> > > think #4 is fine. But I figured I'd give my spiel, because every >> > developer >> > > loves language wars :) >> > > >> > > Imran >> > > >> > > >> > > >> > > On Thu, Mar 5, 2015 at 1:35 AM, Xiangrui Meng <men...@gmail.com> >> wrote: >> > > >> > >> `case object` inside an `object` doesn't show up in Java. This is the >> > >> minimal code I found to make everything show up correctly in both >> > >> Scala and Java: >> > >> >> > >> sealed abstract class StorageLevel // cannot be a trait >> > >> >> > >> object StorageLevel { >> > >> private[this] case object _MemoryOnly extends StorageLevel >> > >> final val MemoryOnly: StorageLevel = _MemoryOnly >> > >> >> > >> private[this] case object _DiskOnly extends StorageLevel >> > >> final val DiskOnly: StorageLevel = _DiskOnly >> > >> } >> > >> >> > >> On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell <pwend...@gmail.com> >> > >> wrote: >> > >> > I like #4 as well and agree with Aaron's suggestion. >> > >> > >> > >> > - Patrick >> > >> > >> > >> > On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson <ilike...@gmail.com> >> > >> wrote: >> > >> >> I'm cool with #4 as well, but make sure we dictate that the values >> > >> should >> > >> >> be defined within an object with the same name as the enumeration >> > (like >> > >> we >> > >> >> do for StorageLevel). Otherwise we may pollute a higher namespace. >> > >> >> >> > >> >> e.g. we SHOULD do: >> > >> >> >> > >> >> trait StorageLevel >> > >> >> object StorageLevel { >> > >> >> case object MemoryOnly extends StorageLevel >> > >> >> case object DiskOnly extends StorageLevel >> > >> >> } >> > >> >> >> > >> >> On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust < >> > >> mich...@databricks.com> >> > >> >> wrote: >> > >> >> >> > >> >>> #4 with a preference for CamelCaseEnums >> > >> >>> >> > >> >>> On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley < >> > jos...@databricks.com> >> > >> >>> wrote: >> > >> >>> >> > >> >>> > another vote for #4 >> > >> >>> > People are already used to adding "()" in Java. >> > >> >>> > >> > >> >>> > >> > >> >>> > On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch < >> java...@gmail.com >> > > >> > >> >>> wrote: >> > >> >>> > >> > >> >>> > > #4 but with MemoryOnly (more scala-like) >> > >> >>> > > >> > >> >>> > > http://docs.scala-lang.org/style/naming-conventions.html >> > >> >>> > > >> > >> >>> > > Constants, Values, Variable and Methods >> > >> >>> > > >> > >> >>> > > Constant names should be in upper camel case. That is, if the >> > >> member is >> > >> >>> > > final, immutable and it belongs to a package object or an >> > object, >> > >> it >> > >> >>> may >> > >> >>> > be >> > >> >>> > > considered a constant (similar to Java'sstatic final members): >> > >> >>> > > >> > >> >>> > > >> > >> >>> > > 1. object Container { >> > >> >>> > > 2. val MyConstant = ... >> > >> >>> > > 3. } >> > >> >>> > > >> > >> >>> > > >> > >> >>> > > 2015-03-04 17:11 GMT-08:00 Xiangrui Meng <men...@gmail.com>: >> > >> >>> > > >> > >> >>> > > > Hi all, >> > >> >>> > > > >> > >> >>> > > > There are many places where we use enum-like types in Spark, >> > but >> > >> in >> > >> >>> > > > different ways. Every approach has both pros and cons. I >> > wonder >> > >> >>> > > > whether there should be an "official" approach for enum-like >> > >> types in >> > >> >>> > > > Spark. >> > >> >>> > > > >> > >> >>> > > > 1. Scala's Enumeration (e.g., SchedulingMode, WorkerState, >> > etc) >> > >> >>> > > > >> > >> >>> > > > * All types show up as Enumeration.Value in Java. >> > >> >>> > > > >> > >> >>> > > > >> > >> >>> > > >> > >> >>> > >> > >> >>> >> > >> >> > >> http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html >> > >> >>> > > > >> > >> >>> > > > 2. Java's Enum (e.g., SaveMode, IOMode) >> > >> >>> > > > >> > >> >>> > > > * Implementation must be in a Java file. >> > >> >>> > > > * Values doesn't show up in the ScalaDoc: >> > >> >>> > > > >> > >> >>> > > > >> > >> >>> > > >> > >> >>> > >> > >> >>> >> > >> >> > >> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode >> > >> >>> > > > >> > >> >>> > > > 3. Static fields in Java (e.g., TripletFields) >> > >> >>> > > > >> > >> >>> > > > * Implementation must be in a Java file. >> > >> >>> > > > * Doesn't need "()" in Java code. >> > >> >>> > > > * Values don't show up in the ScalaDoc: >> > >> >>> > > > >> > >> >>> > > > >> > >> >>> > > >> > >> >>> > >> > >> >>> >> > >> >> > >> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields >> > >> >>> > > > >> > >> >>> > > > 4. Objects in Scala. (e.g., StorageLevel) >> > >> >>> > > > >> > >> >>> > > > * Needs "()" in Java code. >> > >> >>> > > > * Values show up in both ScalaDoc and JavaDoc: >> > >> >>> > > > >> > >> >>> > > > >> > >> >>> > > >> > >> >>> > >> > >> >>> >> > >> >> > >> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$ >> > >> >>> > > > >> > >> >>> > > > >> > >> >>> > > >> > >> >>> > >> > >> >>> >> > >> >> > >> http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html >> > >> >>> > > > >> > >> >>> > > > It would be great if we have an "official" approach for this >> > as >> > >> well >> > >> >>> > > > as the naming convention for enum-like values ("MEMORY_ONLY" >> > or >> > >> >>> > > > "MemoryOnly"). Personally, I like 4) with "MEMORY_ONLY". Any >> > >> >>> thoughts? >> > >> >>> > > > >> > >> >>> > > > Best, >> > >> >>> > > > Xiangrui >> > >> >>> > > > >> > >> >>> > > > >> > >> --------------------------------------------------------------------- >> > >> >>> > > > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> > >> >>> > > > For additional commands, e-mail: dev-h...@spark.apache.org >> > >> >>> > > > >> > >> >>> > > > >> > >> >>> > > >> > >> >>> > >> > >> >>> >> > >> >> > >> --------------------------------------------------------------------- >> > >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> > >> For additional commands, e-mail: dev-h...@spark.apache.org >> > >> >> > >> >> > >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org