Hello Tianqui, Yes that definitely sounds interesting for us and we are looking forward to help out with the implementation.
Regards, Theodore -- Sent from a mobile device. May contain autocorrect errors. On Mar 12, 2016 11:29 AM, "Simone Robutti" <simone.robu...@radicalbit.io> wrote: > This is a really interesting approach. The idea of a ML library over > DataFlow is probably a winning move and I hope it will stop the > proliferation of worthless reimplementation that is taking place in the big > data world. Do you think that DataFlow posed specific problems to your > work? Does it missing something that you had to fill in with your work? > > Here at RadicalBit we are interested both in DataFlow/Apache Beam and in > distributed ML and your approach to us look the best and I hope more and > more teams follow your example, maybe integrating existing libraries like > H2O with DataFlow. > > Keep us updated if you plan to develop other algorithms. > > 2016-03-11 21:32 GMT+01:00 Tianqi Chen <tqc...@cs.washington.edu>: > > > Hi Flink Developers > > I am sending this email to let you know about XGBoost4J, a package > that > > we are planning to announce next week . Here is the draft version of the > > post > > https://github.com/dmlc/xgboost/blob/master/doc/jvm/xgboost4j-intro.md > > > > In short, XGBoost is a machine learning package that is used by more > > than half of the machine challenge winning solutions and is already > widely > > used in industry. The distributed version scale to billion examples(10x > > faster than spark.mllib in the experiment) with fewer resources (see . > > http://arxiv.org/abs/1603.02754) > > > > We are interested in putting distributed XGBoost into all Dataflow > > platforms include Flink. This does not mean we re-implement it on Flink. > > But instead we build a portable API that has a communication library, and > > being able to run on different DataFlow programs. > > > > We hope this can benefit the Flink users, to enable them to get > access > > to one of the state-of-art machine learning algorithm. I am sending this > > email to the mail-list to let you know about it, and hoping to get some > > contributors to help improving the XGBoost Flink API to be more > compatible > > with current FlinkML stack. We also hope to get some support from the > > system side, to enable some abstraction needed in XGBoost for using > > multiple threads within even one slot for maximum performance. > > > > > > Let us know about your thoughts. > > > > Cheers > > > > Tianqi > > >