Re: enum-like types in Spark

Imran Rashid Mon, 09 Mar 2015 18:18:07 -0700

Can you expand on the serde issues w/ java enum's at all?  I haven't heard
of any problems specific to enums.  The java object serialization rules
seem very clear and it doesn't seem like different jvms should have a
choice on what they do:


http://docs.oracle.com/javase/6/docs/platform/serialization/spec/serial-arch.html#6469

(in a nutshell, serialization must use enum.name())

of course there are plenty of ways the user could screw this up(eg. rename
the enums, or change their meaning, or remove them).  But then again, all
of java serialization has issues w/ serialization the user has to be aware
of.  Eg., if we go with case objects, than java serialization blows up if
you add another helper method, even if that helper method is completely
compatible.

Some prior debate in the scala community:

https://groups.google.com/d/msg/scala-internals/8RWkccSRBxQ/AN5F_ZbdKIsJ

SO post on which version to use in scala:

http://stackoverflow.com/questions/1321745/how-to-model-type-safe-enum-types

SO post about the macro-craziness people try to add to scala to make them
almost as good as a simple java enum:
(NB: the accepted answer doesn't actually work in all cases ...)

http://stackoverflow.com/questions/20089920/custom-scala-enum-most-elegant-version-searched

Another proposal to add better enums built into scala ... but seems to be
dormant:

https://groups.google.com/forum/#!topic/scala-sips/Bf82LxK02Kk



On Thu, Mar 5, 2015 at 10:49 PM, Mridul Muralidharan <mri...@gmail.com>
wrote:

>   I have a strong dislike for java enum's due to the fact that they
> are not stable across JVM's - if it undergoes serde, you end up with
> unpredictable results at times [1].
> One of the reasons why we prevent enum's from being key : though it is
> highly possible users might depend on it internally and shoot
> themselves in the foot.
>
> Would be better to keep away from them in general and use something more
> stable.
>
> Regards,
> Mridul
>
> [1] Having had to debug this issue for 2 weeks - I really really hate it.
>
>
> On Thu, Mar 5, 2015 at 1:08 PM, Imran Rashid <iras...@cloudera.com> wrote:
> > I have a very strong dislike for #1 (scala enumerations).   I'm ok with
> #4
> > (with Xiangrui's final suggestion, especially making it sealed &
> available
> > in Java), but I really think #2, java enums, are the best option.
> >
> > Java enums actually have some very real advantages over the other
> > approaches -- you get values(), valueOf(), EnumSet, and EnumMap.  There
> has
> > been endless debate in the Scala community about the problems with the
> > approaches in Scala.  Very smart, level-headed Scala gurus have
> complained
> > about their short-comings (Rex Kerr's name is coming to mind, though I'm
> > not positive about that); there have been numerous well-thought out
> > proposals to give Scala a better enum.  But the powers-that-be in Scala
> > always reject them.  IIRC the explanation for rejecting is basically that
> > (a) enums aren't important enough for introducing some new special
> feature,
> > scala's got bigger things to work on and (b) if you really need a good
> > enum, just use java's enum.
> >
> > I doubt it really matters that much for Spark internals, which is why I
> > think #4 is fine.  But I figured I'd give my spiel, because every
> developer
> > loves language wars :)
> >
> > Imran
> >
> >
> >
> > On Thu, Mar 5, 2015 at 1:35 AM, Xiangrui Meng <men...@gmail.com> wrote:
> >
> >> `case object` inside an `object` doesn't show up in Java. This is the
> >> minimal code I found to make everything show up correctly in both
> >> Scala and Java:
> >>
> >> sealed abstract class StorageLevel // cannot be a trait
> >>
> >> object StorageLevel {
> >>   private[this] case object _MemoryOnly extends StorageLevel
> >>   final val MemoryOnly: StorageLevel = _MemoryOnly
> >>
> >>   private[this] case object _DiskOnly extends StorageLevel
> >>   final val DiskOnly: StorageLevel = _DiskOnly
> >> }
> >>
> >> On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell <pwend...@gmail.com>
> >> wrote:
> >> > I like #4 as well and agree with Aaron's suggestion.
> >> >
> >> > - Patrick
> >> >
> >> > On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson <ilike...@gmail.com>
> >> wrote:
> >> >> I'm cool with #4 as well, but make sure we dictate that the values
> >> should
> >> >> be defined within an object with the same name as the enumeration
> (like
> >> we
> >> >> do for StorageLevel). Otherwise we may pollute a higher namespace.
> >> >>
> >> >> e.g. we SHOULD do:
> >> >>
> >> >> trait StorageLevel
> >> >> object StorageLevel {
> >> >>   case object MemoryOnly extends StorageLevel
> >> >>   case object DiskOnly extends StorageLevel
> >> >> }
> >> >>
> >> >> On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust <
> >> mich...@databricks.com>
> >> >> wrote:
> >> >>
> >> >>> #4 with a preference for CamelCaseEnums
> >> >>>
> >> >>> On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley <
> jos...@databricks.com>
> >> >>> wrote:
> >> >>>
> >> >>> > another vote for #4
> >> >>> > People are already used to adding "()" in Java.
> >> >>> >
> >> >>> >
> >> >>> > On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch <java...@gmail.com
> >
> >> >>> wrote:
> >> >>> >
> >> >>> > > #4 but with MemoryOnly (more scala-like)
> >> >>> > >
> >> >>> > > http://docs.scala-lang.org/style/naming-conventions.html
> >> >>> > >
> >> >>> > > Constants, Values, Variable and Methods
> >> >>> > >
> >> >>> > > Constant names should be in upper camel case. That is, if the
> >> member is
> >> >>> > > final, immutable and it belongs to a package object or an
> object,
> >> it
> >> >>> may
> >> >>> > be
> >> >>> > > considered a constant (similar to Java'sstatic final members):
> >> >>> > >
> >> >>> > >
> >> >>> > >    1. object Container {
> >> >>> > >    2.     val MyConstant = ...
> >> >>> > >    3. }
> >> >>> > >
> >> >>> > >
> >> >>> > > 2015-03-04 17:11 GMT-08:00 Xiangrui Meng <men...@gmail.com>:
> >> >>> > >
> >> >>> > > > Hi all,
> >> >>> > > >
> >> >>> > > > There are many places where we use enum-like types in Spark,
> but
> >> in
> >> >>> > > > different ways. Every approach has both pros and cons. I
> wonder
> >> >>> > > > whether there should be an "official" approach for enum-like
> >> types in
> >> >>> > > > Spark.
> >> >>> > > >
> >> >>> > > > 1. Scala's Enumeration (e.g., SchedulingMode, WorkerState,
> etc)
> >> >>> > > >
> >> >>> > > > * All types show up as Enumeration.Value in Java.
> >> >>> > > >
> >> >>> > > >
> >> >>> > >
> >> >>> >
> >> >>>
> >>
> http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html
> >> >>> > > >
> >> >>> > > > 2. Java's Enum (e.g., SaveMode, IOMode)
> >> >>> > > >
> >> >>> > > > * Implementation must be in a Java file.
> >> >>> > > > * Values doesn't show up in the ScalaDoc:
> >> >>> > > >
> >> >>> > > >
> >> >>> > >
> >> >>> >
> >> >>>
> >>
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode
> >> >>> > > >
> >> >>> > > > 3. Static fields in Java (e.g., TripletFields)
> >> >>> > > >
> >> >>> > > > * Implementation must be in a Java file.
> >> >>> > > > * Doesn't need "()" in Java code.
> >> >>> > > > * Values don't show up in the ScalaDoc:
> >> >>> > > >
> >> >>> > > >
> >> >>> > >
> >> >>> >
> >> >>>
> >>
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields
> >> >>> > > >
> >> >>> > > > 4. Objects in Scala. (e.g., StorageLevel)
> >> >>> > > >
> >> >>> > > > * Needs "()" in Java code.
> >> >>> > > > * Values show up in both ScalaDoc and JavaDoc:
> >> >>> > > >
> >> >>> > > >
> >> >>> > >
> >> >>> >
> >> >>>
> >>
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$
> >> >>> > > >
> >> >>> > > >
> >> >>> > >
> >> >>> >
> >> >>>
> >>
> http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html
> >> >>> > > >
> >> >>> > > > It would be great if we have an "official" approach for this
> as
> >> well
> >> >>> > > > as the naming convention for enum-like values ("MEMORY_ONLY"
> or
> >> >>> > > > "MemoryOnly"). Personally, I like 4) with "MEMORY_ONLY". Any
> >> >>> thoughts?
> >> >>> > > >
> >> >>> > > > Best,
> >> >>> > > > Xiangrui
> >> >>> > > >
> >> >>> > > >
> >> ---------------------------------------------------------------------
> >> >>> > > > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> >> >>> > > > For additional commands, e-mail: dev-h...@spark.apache.org
> >> >>> > > >
> >> >>> > > >
> >> >>> > >
> >> >>> >
> >> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: dev-h...@spark.apache.org
> >>
> >>
>

Re: enum-like types in Spark

Reply via email to