Re: XGBoost on DataFlow and Flink

Theodore Vasiloudis Sat, 12 Mar 2016 04:52:20 -0800

Hello Tianqui,

Yes that definitely sounds interesting for us and we are looking forward to
help out with the implementation.


Regards,
Theodore
-- 
Sent from a mobile device. May contain autocorrect errors.
On Mar 12, 2016 11:29 AM, "Simone Robutti" <simone.robu...@radicalbit.io>
wrote:

> This is a really interesting approach. The idea of a ML library over
> DataFlow is probably a winning move and I hope it will stop the
> proliferation of worthless reimplementation that is taking place in the big
> data world. Do you think that DataFlow posed specific problems to your
> work? Does it missing something that you had to fill in with your work?
>
> Here at RadicalBit we are interested both in DataFlow/Apache Beam and in
> distributed ML and your approach to us look the best and I hope more and
> more teams follow your example, maybe integrating existing libraries like
> H2O with DataFlow.
>
> Keep us updated if you plan to develop other algorithms.
>
> 2016-03-11 21:32 GMT+01:00 Tianqi Chen <tqc...@cs.washington.edu>:
>
> > Hi Flink Developers
> >     I am sending this email to let you know about XGBoost4J, a package
> that
> > we are planning to announce next week . Here is the draft version of the
> > post
> > https://github.com/dmlc/xgboost/blob/master/doc/jvm/xgboost4j-intro.md
> >
> >     In short, XGBoost is a machine learning package that is used by more
> > than half of the machine challenge winning solutions and is already
> widely
> > used in industry. The distributed version scale to billion examples(10x
> > faster than spark.mllib in the experiment) with fewer resources (see .
> > http://arxiv.org/abs/1603.02754)
> >
> >     We are interested in putting distributed XGBoost into all Dataflow
> > platforms include Flink. This does not mean we re-implement it on Flink.
> > But instead we build a portable API that has a communication library, and
> > being able to run on different DataFlow programs.
> >
> >     We hope this can benefit the Flink users, to enable them to get
> access
> > to one of the state-of-art machine learning algorithm. I am sending this
> > email to the mail-list to let you know about it, and hoping to get some
> > contributors to help improving  the XGBoost Flink API to be more
> compatible
> > with current FlinkML stack.  We also hope to get some support from the
> > system side, to enable some abstraction needed in XGBoost for using
> > multiple threads within even one slot for maximum performance.
> >
> >
> > Let us know about your thoughts.
> >
> > Cheers
> >
> > Tianqi
> >
>

Re: XGBoost on DataFlow and Flink

Reply via email to