from:"Julien Naour"

Re: Spark Streaming: process only last events

2016-01-06 Thread Julien Naour

an't just magically ignore some time > range of rdds, because they may contain events you care about. > > On Wed, Jan 6, 2016 at 10:55 AM, Julien Naour wrote: > >> The following lines are my understanding of Spark Streaming AFAIK, I >> could be wrong: >> >>

Re: Spark Streaming: process only last events

2016-01-06 Thread Julien Naour

n you can do foreachPartition with a local map to store just a single > event per user, e.g. > > foreachPartition { p => > val m = new HashMap > p.foreach ( event => > m.put(event,user, event) > } > m.foreach { > ... do your computation > } > &g

Re: Spark Streaming: process only last events

2016-01-06 Thread Julien Naour

keys corresponding to some kind of user id. I want to process last events by each user id once ie skip intermediate events by user id. I have only one Kafka topic with all theses events. Regards, Julien Naour Le mer. 6 janv. 2016 à 16:13, Cody Koeninger a écrit : > Have you read >

Spark Streaming: process only last events

2016-01-06 Thread Julien Naour

mingContext and process the same DStream at different speed (low processing vs high)? Is it easily possible to share values (map for example) between pipelines without using an external database? I think accumulator/broadcast could work but between two pipelines I'm not sure. Regards, Julien Naour

Re: Broadcast vs simple variable

2014-08-21 Thread Julien Naour

427 > And current k-means implementation of MLlib, it's benefited from sparse > vector computing. > http://spark-summit.org/2014/talk/sparse-data-support-in-mllib-2 > > > > 2014-08-21 15:40 GMT+08:00 Julien Naour : > > My Arrays are in fact Array[Array[Long]] and l

Re: Broadcast vs simple variable

2014-08-21 Thread Julien Naour

My Arrays are in fact Array[Array[Long]] and like 17x15 (17 centers with 150 000 modalities, i'm working on qualitative variables) so they are pretty large. I'm working on it to get them smaller, it's mostly a sparse matrix. Good things to know nervertheless. Thanks, Julien

Broadcast vs simple variable

2014-08-20 Thread Julien Naour

dcast instead of simple variable? Cheers, Julien Naour

Re: about spark and using machine learning model

2014-08-05 Thread Julien Naour

You can find in the following presentation a simple example of a clustering model use to classify new incoming tweet : https://www.youtube.com/watch?v=sPhyePwo7FA Regards, Julien 2014-08-05 7:08 GMT+02:00 Xiangrui Meng : > Some extra work is needed to close the loop. One related example is > st

Accumulator and Accumulable vs classic MR

2014-08-01 Thread Julien Naour

Hi, My question is simple: could it be some performance issue using Accumulable/Accumulator instead of method like map() reduce()... ? My use case : implementation of a clustering algorithm like k-means. At the begining I used two steps, one to asign data to cluster and another to calculate new c

Re: Spark Streaming: process only last events

Re: Spark Streaming: process only last events

Re: Spark Streaming: process only last events

Spark Streaming: process only last events

Re: Broadcast vs simple variable

Re: Broadcast vs simple variable

Broadcast vs simple variable

Re: about spark and using machine learning model

Accumulator and Accumulable vs classic MR

9 matches

Site Navigation

Mail list logo

Footer information