Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/2607#issuecomment-59589703
@manishamde Sorry for the delay; the code is looking good! I made some
small comments inline. My main overall comment is about specifying parameters.
How would it be if we started mimicking the coming API update (as much as
possible)? Parameter specification would work as follows:
In DecisionTree, add a static âdefaultParamsâ method so users can
construct a tree.Strategy instance without having to worry about importing
Strategy (and remembering its name). Likewise for GradientBoosting.
Change GradientBoostingStrategy to store tree params in a field
weakLearnerParams: tree.Strategy
Hereâs the use pattern I envision:
val treeParams = DecisionTree.defaultParams()
treeParams.maxDepth = ...
val boostingParams = GradientBoosting.defaultParams()
boostingParams.weakLearnerParams = treeParams
val model = GradientBoosting.train(myData, boostingParams)
This API should work for Scala and Python right away. (Though a Python API
can be another PR.)
For Java, this API should almost work; I believe the only issue will be
setting fields which take special types (e.g., quantileCalculationStrategy and
categoricalFeaturesInfo). For those, there is a nice annotation you can use
which will automatically add getParamName and setParamName methods for Java
users to call, and you can override them as needed. For the special params
like categoricalFeaturesInfo, you can overload them with versions which take
Java-friendly types (such as a Java map for categoricalFeaturesInfo and a
string for quantileCalculationStrategy). Hereâs the BeanProperty doc:
[http://www.scala-lang.org/api/current/scala/beans/BeanProperty.html]
Does that sound reasonable?
Let me know when itâs ready for another pass and for testing.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]