Hey Prashant, Yup, I can take a stab at it. This is the first time I am looking at Pig code, so I might take some time to get started. Will get back to you if I have questions in the meantime. And yes, I will write it so it reads a pig property.
- Bhooshan. On Mon, Apr 15, 2013 at 11:58 AM, Prashant Kommireddi <[email protected]>wrote: > Hi Bhooshan, > > This makes more sense now. I think overriding fs implementation should go > into core-site.xml, but it would be useful to be able to add resources if > you have a bunch of other properties. > > Would you like to submit a patch? It should be based on a pig property > that suggests the additional resource names (myfs-site.xml) in your case. > > -Prashant > > > On Mon, Apr 15, 2013 at 10:35 AM, Bhooshan Mogal <[email protected] > > wrote: > >> Hi Prashant, >> >> >> Yes, I am running in MapReduce mode. Let me give you the steps in the >> scenario that I am trying to test - >> >> 1. I have my own implementation of org.apache.hadoop.fs.FileSystem for a >> filesystem I am trying to implement - Let's call it MyFileSystem.class. >> This filesystem uses the scheme myfs:// for its URIs >> 2. I have set fs.myfs.impl to MyFileSystem.class in core-site.xml and >> made the class available through a jar file that is part of >> HADOOP_CLASSPATH (or PIG_CLASSPATH). >> 3. In MyFileSystem.class, I have a static block as - >> static { >> Configuration.addDefaultResource("myfs-default.xml"); >> Configuration.addDefaultResource("myfs-site.xml"); >> } >> Both these files are in the classpath. To be safe, I have also added the >> my-fs-site.xml in the constructor of MyFileSystem as >> conf.addResource("myfs-site.xml"), so that it is part of both the default >> resources as well as the non-default resources in the Configuration object. >> 4. I am trying to access the filesystem in my pig script as - >> A = LOAD 'myfs://myhost.com:8999/testdata' USING PigStorage(':') AS >> (name:chararray, age:int); -- loading data >> B = FOREACH A GENERATE name; >> store B into 'myfs://myhost.com:8999/testoutput'; >> 5. The execution seems to start correctly, and MyFileSystem.class is >> invoked correctly. In MyFileSystem.class, I can also see that myfs-site.xml >> is loaded and the properties defined in it are available. >> 6. However, when Pig tries to submit the job, it cannot find these >> properties and the job fails to submit successfully. >> 7. If I move all the properties defined in myfs-site.xml to >> core-site.xml, the job gets submitted successfully, and it even succeeds. >> However, this is not ideal as I do not want to proliferate core-site.xml >> with all of the properties for a separate filesystem. >> 8. As I said earlier, upon taking a closer look at the pig code, I saw >> that while creating the JobConf object for a job, pig adds very specific >> resources to the job object, and ignores the resources that may have been >> added already (eg myfs-site.xml) in the Configuration object. >> 9. I have tested this with native map-reduce code as well as hive, and >> this approach of having a separate config file for MyFileSystem works fine >> in both those cases. >> >> So, to summarize, I am looking for a way to ask Pig to load parameters >> from my own config file before submitting a job. >> >> Thanks, >> - >> Bhooshan. >> >> >> >> On Fri, Apr 12, 2013 at 9:57 PM, Prashant Kommireddi <[email protected] >> > wrote: >> >>> +User group >>> >>> Hi Bhooshan, >>> >>> By default you should be running in MapReduce mode unless specified >>> otherwise. Are you creating a PigServer object to run your jobs? Can you >>> provide your code here? >>> >>> Sent from my iPhone >>> >>> On Apr 12, 2013, at 6:23 PM, Bhooshan Mogal <[email protected]> >>> wrote: >>> >>> Apologies for the premature send. I may have some more information. >>> After I applied the patch and set "pig.use.overriden.hadoop.configs=true", >>> I saw an NPE (stacktrace below) and a message saying pig was running in >>> exectype local - >>> >>> 2013-04-13 07:37:13,758 [main] INFO >>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting >>> to hadoop file system at: local >>> 2013-04-13 07:37:13,760 [main] WARN >>> org.apache.hadoop.conf.Configuration - mapred.used.genericoptionsparser is >>> deprecated. Instead, use mapreduce.client.genericoptionsparser.used >>> 2013-04-13 07:37:14,162 [main] ERROR org.apache.pig.tools.grunt.Grunt - >>> ERROR 1200: Pig script failed to parse: >>> <file test.pig, line 1, column 4> pig script failed to validate: >>> java.lang.NullPointerException >>> >>> >>> Here is the stacktrace = >>> >>> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error >>> during parsing. Pig script failed to parse: >>> <file test.pig, line 1, column 4> pig script failed to validate: >>> java.lang.NullPointerException >>> at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1606) >>> at >>> org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1549) >>> at org.apache.pig.PigServer.registerQuery(PigServer.java:549) >>> at >>> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:971) >>> at >>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386) >>> at >>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:190) >>> at >>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) >>> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) >>> at org.apache.pig.Main.run(Main.java:555) >>> at org.apache.pig.Main.main(Main.java:111) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:616) >>> at org.apache.hadoop.util.RunJar.main(RunJar.java:208) >>> Caused by: Failed to parse: Pig script failed to parse: >>> <file test.pig, line 1, column 4> pig script failed to validate: >>> java.lang.NullPointerException >>> at >>> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:184) >>> at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1598) >>> ... 14 more >>> Caused by: >>> <file test.pig, line 1, column 4> pig script failed to validate: >>> java.lang.NullPointerException >>> at >>> org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:438) >>> at >>> org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3168) >>> at >>> org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1291) >>> at >>> org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:789) >>> at >>> org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:507) >>> at >>> org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:382) >>> at >>> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:177) >>> ... 15 more >>> >>> >>> >>> >>> On Fri, Apr 12, 2013 at 6:16 PM, Bhooshan Mogal < >>> [email protected]> wrote: >>> >>>> Yes, however I did not add core-site.xml, hdfs-site.xml, yarn-site.xml. >>>> Only my-filesystem-site.xml using both Configuration.addDefaultResource and >>>> Configuration.addResource. >>>> >>>> I see what you are saying though. The patch might require users to take >>>> care of adding the default config resources as well apart from their own >>>> resources? >>>> >>>> >>>> On Fri, Apr 12, 2013 at 6:06 PM, Prashant Kommireddi < >>>> [email protected]> wrote: >>>> >>>>> Did you set "pig.use.overriden.hadoop.configs=true" and then add your >>>>> configuration resources? >>>>> >>>>> >>>>> On Fri, Apr 12, 2013 at 5:32 PM, Bhooshan Mogal < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi Prashant, >>>>>> >>>>>> Thanks for your response to my question, and sorry for the delayed >>>>>> reply. I was not subscribed to the dev mailing list and hence did not >>>>>> get a >>>>>> notification about your reply. I have copied our thread below so you can >>>>>> get some context. >>>>>> >>>>>> I tried the patch that you pointed to, however with that patch looks >>>>>> like pig is unable to find core-site.xml. It indicates that it is running >>>>>> the script in local mode inspite of having fs.default.name defined >>>>>> as the location of the HDFS namenode. >>>>>> >>>>>> Here is what I am trying to do - I have developed my own >>>>>> org.apache.hadoop.fs.FileSystem implementation and am trying to use it in >>>>>> my pig script. This implementation requires its own *-default and >>>>>> *-site.xml files. I have added the path to these files in PIG_CLASSPATH >>>>>> as >>>>>> well as HADOOP_CLASSPATH and can confirm that hadoop can find these >>>>>> files, >>>>>> as I am able to read these configurations in my code. However, pig code >>>>>> cannot find these configuration parameters. Upon doing some debugging in >>>>>> the pig code, it seems to me that pig does not use all the resources >>>>>> added >>>>>> in the Configuration object, but only seems to use certain specific ones >>>>>> like hadoop-site, core-site, pig-cluster-hadoop-site.xml,yarn-site.xml, >>>>>> hdfs-site.xml (I am looking at HExecutionEngine.java). Is it possible to >>>>>> have pig load user-defined resources like say foo-default.xml and >>>>>> foo-site.xml while creating the JobConf object? I am narrowing on this as >>>>>> the problem, because pig can find my config parameters if I define them >>>>>> in >>>>>> core-site.xml instead of my-filesystem-site.xml. >>>>>> >>>>>> Let me know if you need more details about the issue. >>>>>> >>>>>> >>>>>> Here is our previous conversation - >>>>>> >>>>>> Hi Bhooshan, >>>>>> >>>>>> There is a patch that addresses what you need, and is part of 0.12 >>>>>> (unreleased). Take a look and see if you can apply the patch to the >>>>>> version >>>>>> you are using.https://issues.apache.org/jira/browse/PIG-3135. >>>>>> >>>>>> With this patch, the following property will allow you to override the >>>>>> default and pass in your own configuration. >>>>>> pig.use.overriden.hadoop.configs=true >>>>>> >>>>>> >>>>>> On Thu, Mar 28, 2013 at 6:10 PM, Bhooshan Mogal >>>>>> <[email protected]>wrote: >>>>>> >>>>>> > Hi Folks, >>>>>> > >>>>>> > I had implemented the Hadoop FileSystem abstract class for a storage >>>>>> > system >>>>>> > at work. This implementation uses some config files that are similar in >>>>>> > structure to hadoop config files. They have a *-default.xml and a >>>>>> > *-site.xml for users to override default properties. In the class that >>>>>> > implemented the Hadoop FileSystem, I had added these configuration >>>>>> > files as >>>>>> > default resources in a static block using >>>>>> > Configuration.addDefaultResource("my-default.xml") and >>>>>> > Configuration.addDefaultResource("my-site.xml". This was working fine >>>>>> > and >>>>>> > we were able to run the Hadoop Filesystem CLI and map-reduce jobs just >>>>>> > fine >>>>>> > for our storage system. However, when we tried using this storage >>>>>> > system in >>>>>> > pig scripts, we saw errors indicating that our configuration parameters >>>>>> > were not available. Upon further debugging, we saw that the config >>>>>> > files >>>>>> > were added to the Configuration object as resources, but were part of >>>>>> > defaultResources. However, in Main.java in the pig source, we saw that >>>>>> > the >>>>>> > Configuration object was created as Configuration conf = new >>>>>> > Configuration(false);, thereby setting loadDefaults to false in the >>>>>> > conf >>>>>> > object. As a result, properties from the default resources (including >>>>>> > my >>>>>> > config files) were not loaded and hence, unavailable. >>>>>> > >>>>>> > We solved the problem by using Configuration.addResource instead of >>>>>> > Configuration.addDefaultResource, but still could not figure out why >>>>>> > Pig >>>>>> > does not use default resources? >>>>>> > >>>>>> > Could someone on the list explain why this is the case? >>>>>> > >>>>>> > Thanks, >>>>>> > -- >>>>>> > Bhooshan >>>>>> > >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Bhooshan >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Bhooshan >>>> >>> >>> >>> >>> -- >>> Bhooshan >>> >>> >> >> >> -- >> Bhooshan >> > > -- Bhooshan
