There was a long thread about enum's initiated by Xiangrui several months back in which the final consensus was to use java enum's. Is that discussion (/decision) applicable here?
2015-09-16 17:43 GMT-07:00 Ulanov, Alexander <alexander.ula...@hpe.com>: > Hi Joseph, > > > > Strings sounds reasonable. However, there is no StringParam (only > StringArrayParam). Should I create a new param type? Also, how can the user > get all possible values of String parameter? > > > > Best regards, Alexander > > > > *From:* Joseph Bradley [mailto:jos...@databricks.com] > *Sent:* Wednesday, September 16, 2015 5:35 PM > *To:* Feynman Liang > *Cc:* Ulanov, Alexander; dev@spark.apache.org > *Subject:* Re: Enum parameter in ML > > > > I've tended to use Strings. Params can be created with a validator > (isValid) which can ensure users get an immediate error if they try to pass > an unsupported String. Not as nice as compile-time errors, but easier on > the APIs. > > > > On Mon, Sep 14, 2015 at 6:07 PM, Feynman Liang <fli...@databricks.com> > wrote: > > We usually write a Java test suite which exercises the public API (e.g. > DCT > <https://github.com/apache/spark/blob/master/mllib/src/test/java/org/apache/spark/ml/feature/JavaDCTSuite.java#L71> > ). > > > > It may be possible to create a sealed trait with singleton concrete > instances inside of a serializable companion object, the just introduce a > Param[SealedTrait] to the model (e.g. StreamingDecay PR > <https://github.com/apache/spark/pull/8022/files#diff-cea0bec4853b1b2748ec006682218894R99>). > However, this would require Java users to use > CompanionObject$.ConcreteInstanceName to access enum values which isn't the > prettiest syntax. > > > > Another option would just be to use Strings, which although is not type > safe does simplify implementation. > > > > On Mon, Sep 14, 2015 at 5:43 PM, Ulanov, Alexander < > alexander.ula...@hpe.com> wrote: > > Hi Feynman, > > > > Thank you for suggestion. How can I ensure that there will be no problems > for Java users? (I only use Scala API) > > > > Best regards, Alexander > > > > *From:* Feynman Liang [mailto:fli...@databricks.com] > *Sent:* Monday, September 14, 2015 5:27 PM > *To:* Ulanov, Alexander > *Cc:* dev@spark.apache.org > *Subject:* Re: Enum parameter in ML > > > > Since PipelineStages are serializable, the params must also be > serializable. We also have to keep the Java API in mind. Introducing a new > enum Param type may work, but we will have to ensure that Java users can > use it without dealing with ClassTags (I believe Scala will create new > types for each possible value in the Enum) and that it can be serialized. > > > > On Mon, Sep 14, 2015 at 4:31 PM, Ulanov, Alexander < > alexander.ula...@hpe.com> wrote: > > Dear Spark developers, > > > > I am currently implementing the Estimator in ML that has a parameter that > can take several different values that are mutually exclusive. The most > appropriate type seems to be Scala Enum ( > http://www.scala-lang.org/api/current/index.html#scala.Enumeration). > However, the current ML API has the following parameter types: > > BooleanParam, DoubleArrayParam, DoubleParam, FloatParam, IntArrayParam, > IntParam, LongParam, StringArrayParam > > > > Should I introduce a new parameter type in ML API that is based on Scala > Enum? > > > > Best regards, Alexander > > > > > > >