Re: Pig action, REGISTER and additional jars

Eduardo Afonso Ferreira Wed, 07 Nov 2012 14:19:46 -0800

Hi there,

Just adding to the option you described, if you put your extra jars in the lib 
directory at the same level of your workflow, you don't need to use REGISTER on 
your Pig script since Oozie will include all your jars to the classpath it uses 
to run Pig.


You can also put the jars in a separate directory on HDFS and refer to that 
with the property "oozie.libpath". For example, your job.properties can have 
something like follows (besides NameNode, JobTracker and other properties you 
may need):

......
oozie.wf.application.path=hdfs://localhost:8020/user/${user.name}/your_path/your_app
oozie.libpath=/user/${user.name}/your_path/common_libs
......

You can write your Pig script with no need to REGISTER the jars you need and 
added to the common_libs directory.

If you submit your workflow as a user named "awesome", you should have your 
whole directory structure pushed to HDFS under /user/awesome/ and you're good 
to go.


Eduardo.



________________________________
 From: Harsh J <[email protected]>
To: [email protected] 
Sent: Wednesday, November 7, 2012 2:52 PM
Subject: Re: Pig action, REGISTER and additional jars
 
Grant,

Globbing is supported by Pig (for pig.additional.jars) only for LocalFileSystem.

The <file>'s pre-# component can be any arbitrary HDFS path though,
but not the argument to pig.additional.jars (these are picked up from
resources such as uber-jars or local file systems only).

On Thu, Nov 8, 2012 at 1:12 AM, Grant Ingersoll <[email protected]> wrote:
>
> On Nov 7, 2012, at 12:51 PM, Harsh J wrote:
>
>> Hi Grant,
>>
>> You can leverage the <argument> feature of the Pig action, in tandem
>> with the distributed-cache-using <file> element to do this I think
>> (over pig action schema 0.2).
>>
>> If you add after your <script>, the following:
>>
>> <argument>-Dpig.additional.jars=jar1.jar:jar2.jar</argument>
>>
>> And then in the outer section, add:
>>
>> <file>lib/jar1.jar#jar1.jar</file>
>> <file>lib/jar2.jar#jar2.jar</file>
>>
>> (Assuming your WF has a lib/ directory with jar1.jar and jar2.jar in it)
>>
>> Then Oozie will load these jars onto distributed cache, and symlink
>> them (during runtime) to the task working directory (sorta like a pwd
>> for the task). Hence, your Pig will "see" these files locally and
>> utilize them properly for the "pig.additional.jars" feature.
>>
>> Does this work for you?
>
> I'll give it a try.
>
> Is an HDFS path and glob OK?
>
>
>>
>> On Wed, Nov 7, 2012 at 10:54 PM, Grant Ingersoll <[email protected]> wrote:
>>> Hi,
>>>
>>> I was wondering how Oozie deals with additional JARs one needs for Pig 
>>> files.  Currently, I have a REGISTER statement in Pig that points at the 
>>> location of the libs, but I'd like to get away from that and use Pig's 
>>> additional.jars mechanism, but I don't see support for that in the Oozie 
>>> spec for the Pig action.
>>>
>>> Is this possible?  I'm on 3.2-SNAPSHOT.
>>>
>>> Thanks,
>>> Grant
>>>
>>> --------------------------------------------
>>> Grant Ingersoll
>>> http://www.lucidworks.com
>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>> Harsh J
>
> --------------------------------------------
> Grant Ingersoll
> http://www.lucidworks.com
>
>
>
>



-- 
Harsh J

Re: Pig action, REGISTER and additional jars

Reply via email to