​Hi Reza
​​
I see that ((int, int), double) pairs are generated for any combination
that meets the criteria controlled by the threshold. But assuming a simple
1x10K matrix that means I would need atleast 12GB memory per executor for
the flat map just for these pairs excluding any other overhead. Is that
correct? How can we make this scale for even larger n (when m stays small)
like 100 x 5 million.​ One is by using higher thresholds. The other is that
I use a SparseVector to begin with. Are there any other optimizations I can
take advantage of?

​Thanks
Sab

Reply via email to