Re: Join Stream with big ref table

2015-11-17 Thread Stephan Ewen
I think this pattern may be common, so some tools that share such a table across multiple tasks may make sense. Would be nice to add a handler that you give an "initializer" which reads the data and build the shared lookup map. The first to acquire the handler actually initializes the data set (re

Re: Join Stream with big ref table

2015-11-13 Thread Robert Metzger
Hi Arnaud, I'm happy that you were able to resolve the issue. If you are still interested in the first approach, you could try some things, for example using only one slot per task manager (the slots share the heap of the TM). Regards, Robert On Fri, Nov 13, 2015 at 9:18 AM, LINZ, Arnaud wrote:

RE: Join Stream with big ref table

2015-11-13 Thread LINZ, Arnaud
Hello, I’ve worked around my problem by not using the HiveServer2 JDBC driver to read the ref table. Apparently, despite all the good options passed to the Statement object, it poorly handles RAM, since converting the table into textformat and directly reading the hdfs works without any problem