Re: Regarding Broadcast of datasets in streaming context

2016-05-15 Thread Biplob Biswas
Hi, i read that article already but it is very simplistic and thus based on that article and other examples, i was trying to understand how my centroids can be sent to all the partitions and update accordingly. I also understood that the order of the input and the feedback stream cant be determin

Re: Regarding Broadcast of datasets in streaming context

2016-05-15 Thread Gyula Fóra
Hi, If you haven't done so far please read the respective part of the the streaming docs: https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/index.html#iterations Iterations just allow you to define cyclic flows, there is nothing magic about it. If your original input stream is

Re: Regarding Broadcast of datasets in streaming context

2016-05-15 Thread Biplob Biswas
Hi Gyula, even after trying different things, I can't seem to get the hold of things. Also, i asked another question on the working of iteration and streaming here Be

Re: Regarding Broadcast of datasets in streaming context

2016-05-11 Thread Biplob Biswas
Hi Gyula, I tried doing something like the following in the 2 flatmaps, but i am not getting desired results and still confused how the concept you put forward would work: public static final class MyCoFlatmap implements CoFlatMapFunction{ Centroid[] centroids;

Re: Regarding Broadcast of datasets in streaming context

2016-05-05 Thread Biplob Biswas
This is exactly what I am confused about, if i understand it correctly each of the map function in the co-flat map would receive one tuple each at a time .. so that would mean if i have a datastream of centroids, it would arrive one at a time on the partitions and that would defeat the purpose. Ar

Re: Regarding Broadcast of datasets in streaming context

2016-05-04 Thread Gyula Fóra
Hi, Iterating after every incoming point/centroid update means that you basically defeat the purpose of having parallelism in your Flink job. If you only "sync" the centroids periodically by the broadcast you can make your program run efficiently in parallel. This should be fine for machine learn

Re: Regarding Broadcast of datasets in streaming context

2016-05-02 Thread Biplob Biswas
Hi Gyula, Could you explain a bit why i wouldn't want the centroids to be collected after every point? I mean, once I get a streamed point via map1 function .. i would want to compare the distance of the point with a centroid which arrives via map2 function and i keep on comparing for every cent

Re: Regarding Broadcast of datasets in streaming context

2016-05-02 Thread Gyula Fóra
Hey, I think you got the good idea :) So your coflatmap will get all the centroids that you have sent to the stream in the closeWith call. This means that whenever you collect a new set of centroids they will be iterated back. This means you don't always want to send the centroids out on the coll

Re: Regarding Broadcast of datasets in streaming context

2016-05-02 Thread Biplob Biswas
Hi Gyula, I understand more now how this thing might work and its fascinating. Although I still have one question with the coflatmap function. First, let me explain what I understand and whether its correct or not: 1. The connected iterative stream ensures that the coflatmap function receive the

Re: Regarding Broadcast of datasets in streaming context

2016-04-30 Thread Biplob Biswas
Hi Gyula, I read your workaround and started reading about flink iterations, coflatmap operators and other things. Now, I do understand a few things but the solution you provided is not completely clear to me. I understand the following things from your post. 1. You initially have a datastream of

Re: Regarding Broadcast of datasets in streaming context

2016-04-28 Thread Gyula Fóra
Hi Biplob, I have implemented a similar algorithm as Aljoscha mentioned. First things to clarify are the following: There is currently no abstraction for keeping objects (in you case centroids) in a centralized way that can be updated/read by all operators. This would probably be very costly and

Re: Regarding Broadcast of datasets in streaming context

2016-04-28 Thread Biplob Biswas
That would really be great, any example would help me proceed with my work. Thanks a lot. Aljoscha Krettek wrote > Hi Biplob, > one of our developers had a stream clustering example a while back. It was > using a broadcast feedback edge with a co-operator to update the > centroids. > I'll directl

Re: Regarding Broadcast of datasets in streaming context

2016-04-28 Thread Aljoscha Krettek
Hi Biplob, one of our developers had a stream clustering example a while back. It was using a broadcast feedback edge with a co-operator to update the centroids. I'll directly include him in the email so that he will notice and can send you the example. Cheers, Aljoscha On Thu, 28 Apr 2016 at 13:

Re: Regarding Broadcast of datasets in streaming context

2016-04-28 Thread Biplob Biswas
I am pretty new to flink systems, thus can anyone atleast give me an example of how datastream.broadcast() method works? From the documentation i get the following: broadcast() Sets the partitioning of the DataStream so that the output elements are broadcasted to every parallel instance of the nex