Re: a typical ML algorithm flow

2016-03-29 Thread Dmitriy Lyubimov
;butterfly mixing" communication patterns for power law algorithms? and hopefully without excessive spilling? Thank you very much. -d On Tue, Mar 29, 2016 at 5:31 PM, Dmitriy Lyubimov wrote: > Thanks. > > Regardless of the rationale, i wanted to confirm if the iteration is >

Re: a typical ML algorithm flow

2016-03-29 Thread Dmitriy Lyubimov
ery n iteration. > Otherwise you will over and over re-trigger the execution of previous > operators. > > Cheers, > Till > ​ > > On Tue, Mar 29, 2016 at 1:26 AM, Dmitriy Lyubimov > wrote: > > > Thanks Chiwan. > > > > I think this example still creates

Re: a typical ML algorithm flow

2016-03-28 Thread Dmitriy Lyubimov
et > [3]: > https://ci.apache.org/projects/flink/flink-docs-release-1.0/api/java/org/apache/flink/api/java/operators/IterativeDataSet.html#registerAggregationConvergenceCriterion%28java.lang.String,%20org.apache.flink.api.common.aggregators.Aggregator,%20org.apache.flink.api.common.aggregato

Re: a typical ML algorithm flow

2016-03-25 Thread Dmitriy Lyubimov
our current > > SGD implementation does a pass over the whole dataset at each iteration, > > since we cannot take a sample from the dataset > > and iterate only over that (so it's not really stochastic). > > > > The relevant JIRA is here: > > https://issues.apache.org/jir

a typical ML algorithm flow

2016-03-22 Thread Dmitriy Lyubimov
Hi, probably more of a question for Till: Imagine a common ML algorithm flow that runs until convergence. typical distributed flow would be something like that (e.g. GMM EM would be exactly like that): A: input do { stat1 = A.map.reduce A = A.update-map(stat1) conv = A.map.reduce } u

Re: Machine Learning on Apache Fink

2016-03-07 Thread Dmitriy Lyubimov
still in the works (for mahout). but soon. On Sat, Jan 9, 2016 at 3:46 AM, Ashutosh Kumar wrote: > I see lot of study materials and even book available for ml on spark. Spark > seems to be more mature for analytics related work as of now. Please > correct me if I am wrong. As I have already buil

Re: [ANNOUNCE] Apache Mahout 0.10.1 Released

2015-06-01 Thread Dmitriy Lyubimov
we need to add published links to javadoc/scaladoc stuff. Nice job btw sorting this out. http://apache.github.io/mahout/0.10.1/docs/mahout-math/ http://apache.github.io/mahout/0.10.1/docs/mahout-math-scala

Re: Drafting a roadmap for Flink

2015-02-09 Thread Dmitriy Lyubimov
fwiw re: shell, this is just scala being incredibly useful. If anything, spark is following scala. So is for example BIDMat/BIDMach (and, sigh* mahout). I don't think differentiation means throwing away common baseline tools, there's gotta be more than that. (I'm of course advocating using shell in

Re: Kicking off the Machine Learning Library

2015-02-03 Thread Dmitriy Lyubimov
those in scala (or, more concretely, MatrixWritable and VectorWritable), and I have added native kryo support for those as well (not in public version). Glossary: DRM = distributed row matrix (row-wise partitioned matrix representation). TWA= tree walking automaton On Tue, Feb 3, 2015 at 5:3

Re: Kicking off the Machine Learning Library

2015-02-03 Thread Dmitriy Lyubimov
I may be able to help. The official link on mahout talks page points to slide share, which mangles slides in a weird way, but if it helps, here's the (hidden) link to pptx source of those in case it helps: http://mahout.apache.org/users/sparkbindings/MahoutScalaAndSparkBindings.pptx On Mon, Feb

Re: Kicking off the Machine Learning Library

2015-01-13 Thread Dmitriy Lyubimov
In terms of Mahout DSL it means implementing a bunch of physical operators such as transpose, A'B or B'A on large row or column partitioned matrices. Mahout optimizer takes care of simplifying algebraic expressions such as 1+ exp(drm) => drm.apply-unary(1+exp(x)) and tracking things like identical