Hello all, for a prototype we are looking into we would like to read a big matrix from HDFS, and for every element that comes in a stream of vectors do on multiplication with the matrix. The matrix should fit in the memory of one machine.
We can read in the matrix using a RichMapFunction, but that would mean that a copy of the matrix is made for each Task Slot AFAIK, if the RichMapFunction is instantiated once per Task Slot. So I'm wondering how should we try address this problem, is it possible to have just one copy of the object in memory per TM? As a follow-up if we have more than one TM per node, is it possible to share memory between them? My guess is that we have to look at some external store for that. Cheers, Theo