@Alexander  It's worked for us to use Param[String] directly.  (I think
it's b/c String is exactly java.lang.String, rather than a Scala version of
it, so it's still Java-friendly.)  In other classes, I've added a static
list (e.g., NaiveBayes.supportedModelTypes), though there isn't consistent
coverage on that yet.

@Stephen  It could be used, but I prefer String for spark.ml since it's
easier to maintain consistent APIs across languages.  That's what we've
used so far, at least.

On Wed, Sep 16, 2015 at 6:00 PM, Stephen Boesch <java...@gmail.com> wrote:

> There was a long thread about enum's initiated by Xiangrui several months
> back in which the final consensus was to use java enum's.  Is that
> discussion (/decision) applicable here?
>
> 2015-09-16 17:43 GMT-07:00 Ulanov, Alexander <alexander.ula...@hpe.com>:
>
>> Hi Joseph,
>>
>>
>>
>> Strings sounds reasonable. However, there is no StringParam (only
>> StringArrayParam). Should I create a new param type? Also, how can the user
>> get all possible values of String parameter?
>>
>>
>>
>> Best regards, Alexander
>>
>>
>>
>> *From:* Joseph Bradley [mailto:jos...@databricks.com]
>> *Sent:* Wednesday, September 16, 2015 5:35 PM
>> *To:* Feynman Liang
>> *Cc:* Ulanov, Alexander; dev@spark.apache.org
>> *Subject:* Re: Enum parameter in ML
>>
>>
>>
>> I've tended to use Strings.  Params can be created with a validator
>> (isValid) which can ensure users get an immediate error if they try to pass
>> an unsupported String.  Not as nice as compile-time errors, but easier on
>> the APIs.
>>
>>
>>
>> On Mon, Sep 14, 2015 at 6:07 PM, Feynman Liang <fli...@databricks.com>
>> wrote:
>>
>> We usually write a Java test suite which exercises the public API (e.g.
>> DCT
>> <https://github.com/apache/spark/blob/master/mllib/src/test/java/org/apache/spark/ml/feature/JavaDCTSuite.java#L71>
>> ).
>>
>>
>>
>> It may be possible to create a sealed trait with singleton concrete
>> instances inside of a serializable companion object, the just introduce a
>> Param[SealedTrait] to the model (e.g. StreamingDecay PR
>> <https://github.com/apache/spark/pull/8022/files#diff-cea0bec4853b1b2748ec006682218894R99>).
>> However, this would require Java users to use
>> CompanionObject$.ConcreteInstanceName to access enum values which isn't the
>> prettiest syntax.
>>
>>
>>
>> Another option would just be to use Strings, which although is not type
>> safe does simplify implementation.
>>
>>
>>
>> On Mon, Sep 14, 2015 at 5:43 PM, Ulanov, Alexander <
>> alexander.ula...@hpe.com> wrote:
>>
>> Hi Feynman,
>>
>>
>>
>> Thank you for suggestion. How can I ensure that there will be no problems
>> for Java users? (I only use Scala API)
>>
>>
>>
>> Best regards, Alexander
>>
>>
>>
>> *From:* Feynman Liang [mailto:fli...@databricks.com]
>> *Sent:* Monday, September 14, 2015 5:27 PM
>> *To:* Ulanov, Alexander
>> *Cc:* dev@spark.apache.org
>> *Subject:* Re: Enum parameter in ML
>>
>>
>>
>> Since PipelineStages are serializable, the params must also be
>> serializable. We also have to keep the Java API in mind. Introducing a new
>> enum Param type may work, but we will have to ensure that Java users can
>> use it without dealing with ClassTags (I believe Scala will create new
>> types for each possible value in the Enum) and that it can be serialized.
>>
>>
>>
>> On Mon, Sep 14, 2015 at 4:31 PM, Ulanov, Alexander <
>> alexander.ula...@hpe.com> wrote:
>>
>> Dear Spark developers,
>>
>>
>>
>> I am currently implementing the Estimator in ML that has a parameter that
>> can take several different values that are mutually exclusive. The most
>> appropriate type seems to be Scala Enum (
>> http://www.scala-lang.org/api/current/index.html#scala.Enumeration).
>> However, the current ML API has the following parameter types:
>>
>> BooleanParam, DoubleArrayParam, DoubleParam, FloatParam, IntArrayParam,
>> IntParam, LongParam, StringArrayParam
>>
>>
>>
>> Should I introduce a new parameter type in ML API that is based on Scala
>> Enum?
>>
>>
>>
>> Best regards, Alexander
>>
>>
>>
>>
>>
>>
>>
>
>

Reply via email to