Hi Flink Developers I am sending this email to let you know about XGBoost4J, a package that we are planning to announce next week . Here is the draft version of the post https://github.com/dmlc/xgboost/blob/master/doc/jvm/xgboost4j-intro.md
In short, XGBoost is a machine learning package that is used by more than half of the machine challenge winning solutions and is already widely used in industry. The distributed version scale to billion examples(10x faster than spark.mllib in the experiment) with fewer resources (see . http://arxiv.org/abs/1603.02754) We are interested in putting distributed XGBoost into all Dataflow platforms include Flink. This does not mean we re-implement it on Flink. But instead we build a portable API that has a communication library, and being able to run on different DataFlow programs. We hope this can benefit the Flink users, to enable them to get access to one of the state-of-art machine learning algorithm. I am sending this email to the mail-list to let you know about it, and hoping to get some contributors to help improving the XGBoost Flink API to be more compatible with current FlinkML stack. We also hope to get some support from the system side, to enable some abstraction needed in XGBoost for using multiple threads within even one slot for maximum performance. Let us know about your thoughts. Cheers Tianqi