Re: Having a single copy of an object read in a RichMapFunction

2016-08-08 Thread Theodore Vasiloudis
Thank you for the help Robert! Regarding the static field alternative you provided, I'm a bit confused about the difference between slots and instances. When you say that by using a static field it will be shared by all instances of the Map on the slot, does that mean that if the TM has multiple

Re: Having a single copy of an object read in a RichMapFunction

2016-08-08 Thread Robert Metzger
Hi Theo, I think there are some variants you can try out for the problem. I think it depends a bit on the performance characteristics you expect: - The simplest variant is to run one TM per machine with one slot only. This is probably not feasible because you can't use all the CPU cores - ... to s

Re: Having a single copy of an object read in a RichMapFunction

2016-08-05 Thread Sameer Wadkar
You mean "Connected Streams"? I use that for the same requirement. I way it works it looks like it creates multiple copies per co-map operation. I use the keyed version to match side inputs with the data. Sent from my iPhone > On Aug 5, 2016, at 12:36 PM, Theodore Vasiloudis > wrote: > > Ye

Re: Having a single copy of an object read in a RichMapFunction

2016-08-05 Thread Theodore Vasiloudis
Yes this is a streaming use case, so broadcast is not an option. If I get it correctly with connected streams I would emulate side input by "streaming" the matrix with a key that all incoming vector records match on? Wouldn't that create multiple copies of the matrix in memory? On Thu, Aug 4, 20

Re: Having a single copy of an object read in a RichMapFunction

2016-08-04 Thread Sameer W
Theodore, Broadcast variables do that when using the DataSet API - http://data-artisans.com/how-to-factorize-a-700-gb-matrix-with-apache-flink/ See the following lines in the article- To support the above presented algorithm efficiently we had to improve Flink’s broadcasting mechanism since it ea

Having a single copy of an object read in a RichMapFunction

2016-08-04 Thread Theodore Vasiloudis
Hello all, for a prototype we are looking into we would like to read a big matrix from HDFS, and for every element that comes in a stream of vectors do on multiplication with the matrix. The matrix should fit in the memory of one machine. We can read in the matrix using a RichMapFunction, but tha