So I still get the redundant work whenever the same clusternode/vm creates 
multiple instances of my EvalFunc?
And is it usual to have several instance of the EvalFunc on the same 
clusternode/vm?

Will

-----Original Message-----
From: Alan Gates [mailto:[email protected]] 
Sent: Wednesday, March 02, 2011 4:49 PM
To: [email protected]
Subject: Re: Shared resources

There is no method in the eval func that gets called on the backend before any 
exec calls.  You can keep a flag that tracks whether you have done the 
initialization so that you only do it the first time.

Alan.

On Mar 2, 2011, at 5:29 AM, Lai Will wrote:

> Hello,
>
> I wrote a EvalFunc implementation that
>
>
> 1)      Parses a SQL Query
>
> 2)      Scans a folder for resource files and creates an index on  
> these files
>
> 3)      According to certain properties of the SQL Query accesses  
> the corresponding file and creates a Java objects holding relevant the 
> information of the file (for reuse).
>
> 4)      Does some computation with the SQL Query and the information  
> found in the file
>
> 5)      Outputs a transformed SQL Query
>
> Currently I'm doing local tests without Hadoop and the code works 
> fine.
>
> The problem I see, is that right now I initialize my parser in the 
> EvalFunc, so that every time It gets instantiated a new instance of 
> the parser is generated. Ideally only on instance per machine would be 
> created.
> Even worse right now I create the index and parse the corresponding 
> resource file once per call exec in EvalFunc  and therefore do a lot 
> of redundant computation.
>
> Just because I don't know where and how to put this shared 
> computation.
> Does anybody have a solution on that?
>
> Best,
> Will

Reply via email to