Hello, I wrote a EvalFunc implementation that
1) Parses a SQL Query 2) Scans a folder for resource files and creates an index on these files 3) According to certain properties of the SQL Query accesses the corresponding file and creates a Java objects holding relevant the information of the file (for reuse). 4) Does some computation with the SQL Query and the information found in the file 5) Outputs a transformed SQL Query Currently I'm doing local tests without Hadoop and the code works fine. The problem I see, is that right now I initialize my parser in the EvalFunc, so that every time It gets instantiated a new instance of the parser is generated. Ideally only on instance per machine would be created. Even worse right now I create the index and parse the corresponding resource file once per call exec in EvalFunc and therefore do a lot of redundant computation. Just because I don't know where and how to put this shared computation. Does anybody have a solution on that? Best, Will
