Re: Artificial Neural Network in Spark?

2014-06-30 Thread Debasish Das
I will let Xiangrui to comment on the PR process to add the code in mllib but I would love to look into your initial version if you push it to github... As far as I remember Quoc got his best ANN results using back-propagation algorithm and solved using CG...do you have those features or you are u

Re: Contributing to MLlib on GLM

2014-06-30 Thread Gang Bai
Thanks Xiaokai, I’ve created a pull request to merge features in my PR to your repo. Please take a review here https://github.com/xwei-datageek/spark/pull/2 . As for GLMs, here at Sina, we are solving the problem of predicting the num of visitors who read a particular news article or watch an o

RE: Artificial Neural Network in Spark?

2014-06-30 Thread Bert Greevenbosch
Hi Debasish, Alexander, all, Indeed I found the OpenDL project through the Powered by Spark page. I'll need some time to look into the code, but on the first sight it looks quite well-developed. I'll contact the author about this too. My own implementation (in Scala) works for multiple inputs a

Re: Eliminate copy while sending data : any Akka experts here ?

2014-06-30 Thread Aaron Davidson
I don't know of any way to avoid Akka doing a copy, but I would like to mention that it's on the priority list to piggy-back only the map statuses relevant to a particular map task on the task itself, thus reducing the total amount of data sent over the wire by a factor of N for N physical machines

Re: Eliminate copy while sending data : any Akka experts here ?

2014-06-30 Thread Mridul Muralidharan
Our current hack is to use Broadcast variables when serialized statuses are above some (configurable) size : and have the workers directly pull them from master. This is a workaround : so would be great if there was a better/principled solution. Please note that the responses are going to differen

Eliminate copy while sending data : any Akka experts here ?

2014-06-30 Thread Mridul Muralidharan
Hi, While sending map output tracker result, the same serialized byte array is sent multiple times - but the akka implementation copies it to a private byte array within ByteString for each send. Caching a ByteString instead of Array[Byte] did not help, since akka does not support special casing

Re: Application level progress monitoring and communication

2014-06-30 Thread Chester Chen
Reynold thanks for the reply. It's true, this is more to Yarn communication than Spark. But this is a general enough problem for all the YARN_CLUSTER mode application. I thought just to reach out to the community. If we choose to using Akka solution, then this is related to Spark, as there i

Contributing to MLlib

2014-06-30 Thread salexln
Hi guys, I'm new to Spark & MLlib and this may be a dumb question, but still As part of my M.Sc project, i'm working on implementation of Fuzzy C-means (FCM) algorithm in MLlib. FCM has many things in common with K - Means algorithm, which is already implemented, and I wanted to know whether