Hi Flink Developers
    I am sending this email to let you know about XGBoost4J, a package that
we are planning to announce next week . Here is the draft version of the
post https://github.com/dmlc/xgboost/blob/master/doc/jvm/xgboost4j-intro.md

    In short, XGBoost is a machine learning package that is used by more
than half of the machine challenge winning solutions and is already widely
used in industry. The distributed version scale to billion examples(10x
faster than spark.mllib in the experiment) with fewer resources (see .
http://arxiv.org/abs/1603.02754)

    We are interested in putting distributed XGBoost into all Dataflow
platforms include Flink. This does not mean we re-implement it on Flink.
But instead we build a portable API that has a communication library, and
being able to run on different DataFlow programs.

    We hope this can benefit the Flink users, to enable them to get access
to one of the state-of-art machine learning algorithm. I am sending this
email to the mail-list to let you know about it, and hoping to get some
contributors to help improving  the XGBoost Flink API to be more compatible
with current FlinkML stack.  We also hope to get some support from the
system side, to enable some abstraction needed in XGBoost for using
multiple threads within even one slot for maximum performance.


Let us know about your thoughts.

Cheers

Tianqi

Reply via email to