Thank you for the help Robert!
Regarding the static field alternative you provided, I'm a bit confused
about the difference between slots and instances.
When you say that by using a static field it will be shared by all
instances of the Map on the slot, does that mean that if the TM has
multiple
Hi Theo,
I think there are some variants you can try out for the problem. I think it
depends a bit on the performance characteristics you expect:
- The simplest variant is to run one TM per machine with one slot only.
This is probably not feasible because you can't use all the CPU cores
- ... to s
You mean "Connected Streams"? I use that for the same requirement. I way it
works it looks like it creates multiple copies per co-map operation. I use the
keyed version to match side inputs with the data.
Sent from my iPhone
> On Aug 5, 2016, at 12:36 PM, Theodore Vasiloudis
> wrote:
>
> Ye
Yes this is a streaming use case, so broadcast is not an option.
If I get it correctly with connected streams I would emulate side input by
"streaming" the matrix with a key that all incoming vector records match on?
Wouldn't that create multiple copies of the matrix in memory?
On Thu, Aug 4, 20
Theodore,
Broadcast variables do that when using the DataSet API -
http://data-artisans.com/how-to-factorize-a-700-gb-matrix-with-apache-flink/
See the following lines in the article-
To support the above presented algorithm efficiently we had to improve
Flinkās broadcasting mechanism since it ea
Hello all,
for a prototype we are looking into we would like to read a big matrix from
HDFS, and for every element that comes in a stream of vectors do on
multiplication with the matrix. The matrix should fit in the memory of one
machine.
We can read in the matrix using a RichMapFunction, but tha