Re: Why does Pig not use default resources from the Configuration object?

Prashant Kommireddi Fri, 12 Apr 2013 21:57:39 -0700

+User group

Hi Bhooshan,


By default you should be running in MapReduce mode unless specified
otherwise. Are you creating a PigServer object to run your jobs? Can you
provide your code here?

Sent from my iPhone

On Apr 12, 2013, at 6:23 PM, Bhooshan Mogal <[email protected]>
wrote:

Apologies for the premature send. I may have some more information. After I
applied the patch and set "pig.use.overriden.hadoop.configs=true", I saw an
NPE (stacktrace below) and a message saying pig was running in exectype
local -

2013-04-13 07:37:13,758 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
to hadoop file system at: local
2013-04-13 07:37:13,760 [main] WARN  org.apache.hadoop.conf.Configuration -
mapred.used.genericoptionsparser is deprecated. Instead, use
mapreduce.client.genericoptionsparser.used
2013-04-13 07:37:14,162 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1200: Pig script failed to parse:
<file test.pig, line 1, column 4> pig script failed to validate:
java.lang.NullPointerException


Here is the stacktrace =

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error
during parsing. Pig script failed to parse:
<file test.pig, line 1, column 4> pig script failed to validate:
java.lang.NullPointerException
        at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1606)
        at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1549)
        at org.apache.pig.PigServer.registerQuery(PigServer.java:549)
        at
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:971)
        at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
        at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:190)
        at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
        at org.apache.pig.Main.run(Main.java:555)
        at org.apache.pig.Main.main(Main.java:111)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: Failed to parse: Pig script failed to parse:
<file test.pig, line 1, column 4> pig script failed to validate:
java.lang.NullPointerException
        at
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:184)
        at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1598)
        ... 14 more
Caused by:
<file test.pig, line 1, column 4> pig script failed to validate:
java.lang.NullPointerException
        at
org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:438)
        at
org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3168)
        at
org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1291)
        at
org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:789)
        at
org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:507)
        at
org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:382)
        at
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:177)
        ... 15 more




On Fri, Apr 12, 2013 at 6:16 PM, Bhooshan Mogal <[email protected]>wrote:

> Yes, however I did not add core-site.xml, hdfs-site.xml, yarn-site.xml.
> Only my-filesystem-site.xml using both Configuration.addDefaultResource and
> Configuration.addResource.
>
> I see what you are saying though. The patch might require users to take
> care of adding the default config resources as well apart from their own
> resources?
>
>
> On Fri, Apr 12, 2013 at 6:06 PM, Prashant Kommireddi 
> <[email protected]>wrote:
>
>> Did you set "pig.use.overriden.hadoop.configs=true" and then add your
>> configuration resources?
>>
>>
>> On Fri, Apr 12, 2013 at 5:32 PM, Bhooshan Mogal <[email protected]
>> > wrote:
>>
>>> Hi Prashant,
>>>
>>> Thanks for your response to my question, and sorry for the delayed
>>> reply. I was not subscribed to the dev mailing list and hence did not get a
>>> notification about your reply. I have copied our thread below so you can
>>> get some context.
>>>
>>> I tried the patch that you pointed to, however with that patch looks
>>> like pig is unable to find core-site.xml. It indicates that it is running
>>> the script in local mode inspite of having fs.default.name defined as
>>> the location of the HDFS namenode.
>>>
>>> Here is what I am trying to do - I have developed my own
>>> org.apache.hadoop.fs.FileSystem implementation and am trying to use it in
>>> my pig script. This implementation requires its own *-default and
>>> *-site.xml files. I have added the path to these files in PIG_CLASSPATH as
>>> well as HADOOP_CLASSPATH and can confirm that hadoop can find these files,
>>> as I am able to read these configurations in my code. However, pig code
>>> cannot find these configuration parameters. Upon doing some debugging in
>>> the pig code, it seems to me that pig does not use all the resources added
>>> in the Configuration object, but only seems to use certain specific ones
>>> like hadoop-site, core-site, pig-cluster-hadoop-site.xml,yarn-site.xml,
>>> hdfs-site.xml (I am looking at HExecutionEngine.java). Is it possible to
>>> have pig load user-defined resources like say foo-default.xml and
>>> foo-site.xml while creating the JobConf object? I am narrowing on this as
>>> the problem, because pig can find my config parameters if I define them in
>>> core-site.xml instead of my-filesystem-site.xml.
>>>
>>> Let me know if you need more details about the issue.
>>>
>>>
>>> Here is our previous conversation -
>>>
>>> Hi Bhooshan,
>>>
>>> There is a patch that addresses what you need, and is part of 0.12
>>> (unreleased). Take a look and see if you can apply the patch to the version
>>> you are using.https://issues.apache.org/jira/browse/PIG-3135.
>>>
>>> With this patch, the following property will allow you to override the
>>> default and pass in your own configuration.
>>> pig.use.overriden.hadoop.configs=true
>>>
>>>
>>> On Thu, Mar 28, 2013 at 6:10 PM, Bhooshan Mogal 
>>> <[email protected]>wrote:
>>>
>>> > Hi Folks,
>>> >
>>> > I had implemented the Hadoop FileSystem abstract class for a storage 
>>> > system
>>> > at work. This implementation uses some config files that are similar in
>>> > structure to hadoop config files. They have a *-default.xml and a
>>> > *-site.xml for users to override default properties. In the class that
>>> > implemented the Hadoop FileSystem, I had added these configuration files 
>>> > as
>>> > default resources in a static block using
>>> > Configuration.addDefaultResource("my-default.xml") and
>>> > Configuration.addDefaultResource("my-site.xml". This was working fine and
>>> > we were able to run the Hadoop Filesystem CLI and map-reduce jobs just 
>>> > fine
>>> > for our storage system. However, when we tried using this storage system 
>>> > in
>>> > pig scripts, we saw errors indicating that our configuration parameters
>>> > were not available. Upon further debugging, we saw that the config files
>>> > were added to the Configuration object as resources, but were part of
>>> > defaultResources. However, in Main.java in the pig source, we saw that the
>>> > Configuration object was created as Configuration conf = new
>>> > Configuration(false);, thereby setting loadDefaults to false in the conf
>>> > object. As a result, properties from the default resources (including my
>>> > config files) were not loaded and hence, unavailable.
>>> >
>>> > We solved the problem by using Configuration.addResource instead of
>>> > Configuration.addDefaultResource, but still could not figure out why Pig
>>> > does not use default resources?
>>> >
>>> > Could someone on the list explain why this is the case?
>>> >
>>> > Thanks,
>>> > --
>>> > Bhooshan
>>> >
>>>
>>>
>>>
>>> --
>>> Bhooshan
>>>
>>
>>
>
>
> --
> Bhooshan
>



-- 
Bhooshan

Re: Why does Pig not use default resources from the Configuration object?

Reply via email to