Dear Spark Users and Developers, 

We (Distributed (Deep) Machine Learning Community (http://dmlc.ml/)) are happy 
to announce the release of XGBoost4J 
(http://dmlc.ml/2016/03/14/xgboost4j-portable-distributed-xgboost-in-spark-flink-and-dataflow.html),
 a Portable Distributed XGBoost in Spark, Flink and Dataflow 

XGBoost is an optimized distributed gradient boosting library designed to be 
highly efficient, flexible and portable.XGBoost provides a parallel tree 
boosting (also known as GBDT, GBM) that solve many data science problems in a 
fast and accurate way. It has been the winning solution for many machine 
learning scenarios, ranging from Machine Learning Challenges 
(https://github.com/dmlc/xgboost/tree/master/demo#machine-learning-challenge-winning-solutions)
 to Industrial User Cases 
(https://github.com/dmlc/xgboost/tree/master/demo#usecases) 

XGBoost4J is a new package in XGBoost aiming to provide the clean Scala/Java 
APIs and the seamless integration with the mainstream data processing platform, 
like Apache Spark. With XGBoost4J, users can run XGBoost as a stage of Spark 
job and build a unified pipeline from ETL to Model training to data product 
service within Spark, instead of jumping across two different systems, i.e. 
XGBoost and Spark. (Example: 
https://github.com/dmlc/xgboost/blob/master/jvm-packages/xgboost4j-example/src/main/scala/ml/dmlc/xgboost4j/scala/example/spark/DistTrainWithSpark.scala)

Today, we release the first version of XGBoost4J to bring more choices to the 
Spark users who are seeking the solutions to build highly efficient data 
analytic platform and enrich the Spark ecosystem. We will keep moving forward 
to integrate with more features of Spark. Of course, you are more than welcome 
to join us and contribute to the project!

For more details of distributed XGBoost, you can refer to the recently 
published paper: http://arxiv.org/abs/1603.02754

Best, 

-- 
Nan Zhu
http://codingcat.me

Reply via email to