[GitHub] spark pull request: [WIP][SPARK-3530][MLLIB] pipeline and paramete...

manishamde Wed, 05 Nov 2014 17:59:04 -0800

Github user manishamde commented on the pull request:

    https://github.com/apache/spark/pull/3099#issuecomment-61916011
  
    I have a few comments based upon the API:
    
    1. Like @jkbradley,  I prefer ```lr.setMaxIter(50)``` over 
```lr.set(lr.maxIter, 50)```. Also, prefer to avoid passing parameters to fit 
like ```lr.fit(dataset, lr.maxIter -> 50)```.
    
    2. Constructors with getters and setters as @shivaram pointed will be 
great. The LOC reduction is important and should not be discounted.
    
    3. Do we plan to provided syntactic sugar such as a ```predict``` method 
when we use ```model``` to transform a dataset? For me ```transform``` fits 
well with the feature engineering stage and ```predict``` after the model 
training has been performed.
    
    4. It will be great to see the corresponding examples in Python.The 
getter/setters would map well to  Python properties. Also, it will be nice to 
do an apples-to-apples comparison with the scikit-learn pipeline.
    
    5. Finally, how do we plan to programatically answer (developer/user) 
queries about algorithm properties such as multiclass classification support, 
using internal storage format, etc.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [WIP][SPARK-3530][MLLIB] pipeline and paramete...

Reply via email to