Hi Eron, Please register your Spark Package on http://spark-packages.org, which helps users find your work. Do you have some performance benchmark to share? Thanks!
Best, Xiangrui On Wed, Jun 10, 2015 at 10:48 PM, Nick Pentreath <nick.pentre...@gmail.com> wrote: > Looks very interesting, thanks for sharing this. > > I haven't had much chance to do more than a quick glance over the code. > Quick question - are the Word2Vec and GLOVE implementations fully parallel > on Spark? > > On Mon, Jun 8, 2015 at 6:20 PM, Eron Wright <ewri...@live.com> wrote: >> >> >> The deeplearning4j framework provides a variety of distributed, neural >> network-based learning algorithms, including convolutional nets, deep >> auto-encoders, deep-belief nets, and recurrent nets. We’re working on >> integration with the Spark ML pipeline, leveraging the developer API. This >> announcement is to share some code and get feedback from the Spark >> community. >> >> The integration code is located in the dl4j-spark-ml module in the >> deeplearning4j repository. >> >> Major aspects of the integration work: >> >> ML algorithms. To bind the dl4j algorithms to the ML pipeline, we >> developed a new classifier and a new unsupervised learning estimator. >> ML attributes. We strove to interoperate well with other pipeline >> components. ML Attributes are column-level metadata enabling information >> sharing between pipeline components. See here how the classifier reads >> label metadata from a column provided by the new StringIndexer. >> Large binary data. It is challenging to work with large binary data in >> Spark. An effective approach is to leverage PrunedScan and to carefully >> control partition sizes. Here we explored this with a custom data source >> based on the new relation API. >> Column-based record readers. Here we explored how to construct rows from >> a Hadoop input split by composing a number of column-level readers, with >> pruning support. >> UDTs. With Spark SQL it is possible to introduce new data types. We >> prototyped an experimental Tensor type, here. >> Spark Package. We developed a spark package to make it easy to use the >> dl4j framework in spark-shell and with spark-submit. See the >> deeplearning4j/dl4j-spark-ml repository for useful snippets involving the >> sbt-spark-package plugin. >> Example code. Examples demonstrate how the standardized ML API >> simplifies interoperability, such as with label preprocessing and feature >> scaling. See the deeplearning4j/dl4j-spark-ml-examples repository for an >> expanding set of example pipelines. >> >> Hope this proves useful to the community as we transition to exciting new >> concepts in Spark SQL and Spark ML. Meanwhile, we have Spark working with >> multiple GPUs on AWS and we're looking forward to optimizations that will >> speed neural net training even more. >> >> Eron Wright >> Contributor | deeplearning4j.org >> > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org