I will let Xiangrui to comment on the PR process to add the code in mllib
but I would love to look into your initial version if you push it to
github...
As far as I remember Quoc got his best ANN results using back-propagation
algorithm and solved using CG...do you have those features or you are u
Thanks Xiaokai,
I’ve created a pull request to merge features in my PR to your repo. Please
take a review here https://github.com/xwei-datageek/spark/pull/2 .
As for GLMs, here at Sina, we are solving the problem of predicting the num of
visitors who read a particular news article or watch an o
Hi Debasish, Alexander, all,
Indeed I found the OpenDL project through the Powered by Spark page. I'll need
some time to look into the code, but on the first sight it looks quite
well-developed. I'll contact the author about this too.
My own implementation (in Scala) works for multiple inputs a
I don't know of any way to avoid Akka doing a copy, but I would like to
mention that it's on the priority list to piggy-back only the map statuses
relevant to a particular map task on the task itself, thus reducing the
total amount of data sent over the wire by a factor of N for N physical
machines
Our current hack is to use Broadcast variables when serialized
statuses are above some (configurable) size : and have the workers
directly pull them from master.
This is a workaround : so would be great if there was a
better/principled solution.
Please note that the responses are going to differen
Hi,
While sending map output tracker result, the same serialized byte
array is sent multiple times - but the akka implementation copies it
to a private byte array within ByteString for each send.
Caching a ByteString instead of Array[Byte] did not help, since akka
does not support special casing
Reynold
thanks for the reply. It's true, this is more to Yarn communication
than Spark.
But this is a general enough problem for all the YARN_CLUSTER mode
application. I thought
just to reach out to the community.
If we choose to using Akka solution, then this is related to Spark, as
there i
Hi guys,
I'm new to Spark & MLlib and this may be a dumb question, but still
As part of my M.Sc project, i'm working on implementation of Fuzzy C-means
(FCM) algorithm in MLlib.
FCM has many things in common with K - Means algorithm, which is already
implemented, and I wanted to know whether