Hi Deepak,
I just build the pig snapshot from my PC and then I deploy a distribution to
server. Also I drop required jars into $PIG_HOME/lib directory.
After all, Seems it works fine.
Hopes this helps.
- Youngwoo
*My env for Hadoop:*
$ env | grep HADOOP
HADOOP_HOME=/usr/lib/hadoop-0.20
*My pig script for testing:*
$ cat test_embedded.py
#!/usr/bin/python
> # need to explicitly import the Pig class
from org.apache.pig.scripting import Pig
> output = 'outfile'
> p = Pig.compile("""
records = LOAD '/user/hanadmin/DUAL.TXT' USING PigStorage() AS
> (input_line:chararray);
r1 = FOREACH records GENERATE LOWER(records.input_line);
STORE r1 INTO '$out';
""")
for i in range(0, 2):
print 'Iteration: ' + str(i)
q = p.bind({'out' : output + str(i)})
r = q.runSingle()
*Run the script:*
$ bin/pig test_embedded.py
2011-01-13 11:32:02,502 [main] INFO org.apache.pig.Main - Logging error
messages to: /hanmail/pig-0.9.0-SNAPSHOT/pig_1294885922500.log
2011-01-13 11:32:02,516 [main] INFO org.apache.pig.Main - Run embedded
script: jython
2011-01-13 11:32:02,745 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
to hadoop file system at: hdfs://hadoopdev:8020
2011-01-13 11:32:03,056 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
to map-reduce job tracker at: hadoopdev:8021
Iteration: 0
2011-01-13 11:32:04,586 [main] INFO org.apache.pig.scripting.BoundScript -
Query to run:
records = LOAD '/user/hanadmin/DUAL.TXT' USING PigStorage() AS
(input_line:chararray);
r1 = FOREACH records GENERATE LOWER(records.input_line);
STORE r1 INTO 'outfile0';
2011-01-13 11:32:04,872 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
script: UNKNOWN
2011-01-13 11:32:04,873 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
pig.usenewlogicalplan is set to true. New logical plan will be used.
2011-01-13 11:32:05,096 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:
records:
Store(hdfs://hadoopdev/tmp/temp644267750/tmp-1639994869:org.apache.pig.impl.io.InterStorage)
- scope-10 Operator Key: scope-10)
2011-01-13 11:32:05,097 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: r1:
Store(hdfs://hadoopdev/user/hanadmin/outfile0:org.apache.pig.builtin.PigStorage)
- scope-16 Operator Key: scope-16)
2011-01-13 11:32:05,113 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
File concatenation threshold: 100 optimistic? false
2011-01-13 11:32:05,153 [main] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths to process : 1
2011-01-13 11:32:05,177 [main] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths (combined) to process : 1
2011-01-13 11:32:05,177 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
number of input files: 1
2011-01-13 11:32:05,177 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
number of input files: 0
2011-01-13 11:32:05,203 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 3
2011-01-13 11:32:05,204 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- Merged 1 map-only splittees.
2011-01-13 11:32:05,204 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- Merged 1 out of total 3 MR operators.
2011-01-13 11:32:05,204 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 2
2011-01-13 11:32:05,284 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added
to the job
2011-01-13 11:32:05,306 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2011-01-13 11:32:09,859 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up multi store job
2011-01-13 11:32:09,916 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
2011-01-13 11:32:10,421 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2011-01-13 11:32:10,602 [Thread-4] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths to process : 1
2011-01-13 11:32:10,605 [Thread-4] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths (combined) to process : 1
2011-01-13 11:32:11,533 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_201101121634_0007
2011-01-13 11:32:11,533 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- More information at:
http://hadoopdev:50030/jobdetails.jsp?jobid=job_201101121634_0007
2011-01-13 11:32:25,704 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 25% complete
2011-01-13 11:32:28,739 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 50% complete
2011-01-13 11:32:31,337 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added
to the job
2011-01-13 11:32:31,339 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2011-01-13 11:32:36,097 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2011-01-13 11:32:36,110 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
2011-01-13 11:32:36,362 [Thread-15] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths to process : 1
2011-01-13 11:32:36,364 [Thread-15] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths (combined) to process : 1
2011-01-13 11:32:36,613 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_201101121634_0008
2011-01-13 11:32:36,613 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- More information at:
http://hadoopdev:50030/jobdetails.jsp?jobid=job_201101121634_0008
2011-01-13 11:32:49,763 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 75% complete
2011-01-13 11:32:56,857 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2011-01-13 11:32:56,862 [main] INFO
org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt
Features
0.20.2+737 0.9.0-SNAPSHOT hanadmin 2011-01-13 11:32:05
2011-01-13 11:32:56 UNKNOWN
Success!
Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime
MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs
job_201101121634_0007 1 0 6 6 6 0 0
0 records MULTI_QUERY,MAP_ONLY
job_201101121634_0008 1 0 6 6 6 0 0
0 r1 MAP_ONLY hdfs://hadoopdev/user/hanadmin/outfile0,
Input(s):
Successfully read 1 records (379 bytes) from: "/user/hanadmin/DUAL.TXT"
Output(s):
Successfully stored 1 records (2 bytes) in:
"hdfs://hadoopdev/user/hanadmin/outfile0"
Counters:
Total records written : 1
Total bytes written : 2
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201101121634_0007 -> job_201101121634_0008,
job_201101121634_0008
2011-01-13 11:32:56,901 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Success!
Iteration: 1
2011-01-13 11:32:56,920 [main] INFO org.apache.pig.scripting.BoundScript -
Query to run:
records = LOAD '/user/hanadmin/DUAL.TXT' USING PigStorage() AS
(input_line:chararray);
r1 = FOREACH records GENERATE LOWER(records.input_line);
STORE r1 INTO 'outfile1';
2011-01-13 11:32:56,975 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
script: UNKNOWN
2011-01-13 11:32:56,975 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
pig.usenewlogicalplan is set to true. New logical plan will be used.
2011-01-13 11:32:57,015 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: r1:
Store(hdfs://hadoopdev/user/hanadmin/outfile1:org.apache.pig.builtin.PigStorage)
- scope-40 Operator Key: scope-40)
2011-01-13 11:32:57,016 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:
records:
Store(hdfs://hadoopdev/tmp/temp644267750/tmp-1348301493:org.apache.pig.impl.io.InterStorage)
- scope-34 Operator Key: scope-34)
2011-01-13 11:32:57,016 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
File concatenation threshold: 100 optimistic? false
2011-01-13 11:32:57,036 [main] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths to process : 1
2011-01-13 11:32:57,039 [main] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths (combined) to process : 1
2011-01-13 11:32:57,039 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
number of input files: 1
2011-01-13 11:32:57,039 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
number of input files: 0
2011-01-13 11:32:57,043 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 3
2011-01-13 11:32:57,044 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- Merged 1 map-only splittees.
2011-01-13 11:32:57,044 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- Merged 1 out of total 3 MR operators.
2011-01-13 11:32:57,044 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 2
2011-01-13 11:32:57,051 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added
to the job
2011-01-13 11:32:57,054 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2011-01-13 11:33:01,723 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up multi store job
2011-01-13 11:33:01,734 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
2011-01-13 11:33:02,027 [Thread-25] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths to process : 1
2011-01-13 11:33:02,030 [Thread-25] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths (combined) to process : 1
2011-01-13 11:33:02,236 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_201101121634_0009
2011-01-13 11:33:02,236 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- More information at:
http://hadoopdev:50030/jobdetails.jsp?jobid=job_201101121634_0009
2011-01-13 11:33:02,238 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2011-01-13 11:33:16,893 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 25% complete
2011-01-13 11:33:19,924 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 50% complete
2011-01-13 11:33:22,471 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added
to the job
2011-01-13 11:33:22,473 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2011-01-13 11:33:27,303 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2011-01-13 11:33:27,312 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
2011-01-13 11:33:27,567 [Thread-35] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths to process : 1
2011-01-13 11:33:27,569 [Thread-35] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths (combined) to process : 1
2011-01-13 11:33:27,815 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_201101121634_0010
2011-01-13 11:33:27,815 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- More information at:
http://hadoopdev:50030/jobdetails.jsp?jobid=job_201101121634_0010
2011-01-13 11:33:43,976 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 75% complete
2011-01-13 11:33:48,039 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2011-01-13 11:33:48,041 [main] INFO
org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt
Features
0.20.2+737 0.9.0-SNAPSHOT hanadmin 2011-01-13 11:32:57
2011-01-13 11:33:48 UNKNOWN
Success!
Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime
MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs
job_201101121634_0009 1 0 6 6 6 0 0
0 records MULTI_QUERY,MAP_ONLY
job_201101121634_0010 1 0 9 9 9 0 0
0 r1 MAP_ONLY hdfs://hadoopdev/user/hanadmin/outfile1,
Input(s):
Successfully read 1 records (379 bytes) from: "/user/hanadmin/DUAL.TXT"
Output(s):
Successfully stored 1 records (2 bytes) in:
"hdfs://hadoopdev/user/hanadmin/outfile1"
Counters:
Total records written : 1
Total bytes written : 2
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201101121634_0009 -> job_201101121634_0010,
job_201101121634_0010
2011-01-13 11:33:48,062 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Success!
*Output files:*
$ hadoop fs -ls /user/hanadmin/out*
drwxr-xr-x - hanadmin supergroup 0 2011-01-13 11:32
/user/hanadmin/outfile0/_logs
-rw-r--r-- 1 hanadmin supergroup 2 2011-01-13 11:32
/user/hanadmin/outfile0/part-m-00000
drwxr-xr-x - hanadmin supergroup 0 2011-01-13 11:33
/user/hanadmin/outfile1/_logs
-rw-r--r-- 1 hanadmin supergroup 2 2011-01-13 11:33
/user/hanadmin/outfile1/part-m-00000
2011/1/12 <[email protected]>
> Hi Youngwoo,
>
> Yes, I downloaded Pig Snapshot from Hudson. Is there some other Pig-0.9.0
> that comes bundled with Jython.jar? Please point me to it.
>
> With the snapshot version, I tried your advice.
>
> Putting jython.jar in $PIG_HOME/lib did not help. I'm getting the same
> error.
>
> The Java command doesn't seem to recognize the --embedded option.
>
> -----Original Message-----
> From: 김영우 [mailto:[email protected]]
> Sent: Wednesday, January 12, 2011 4:27 PM
> To: [email protected]
> Subject: Re: Iterative MapReduce with PIG
>
> Hi Deepak,
>
> Did you download pig distribution from Apache Hudson?
>
> IIt seems that the snapshot build does not include jython.jar
>
> Drop the jython.jar into $PIG_HOME/lib directory and then try it again.
> Also you can specify classpath in java command line. E.g., java -cp
> pig.jar:/path/jython.jar --embedded jython bedded_pig.py
>
> For me, It works fine when I specify 'Local' mode but in MapReduce mode it
> does not. I dont know why exactly but I guess it's because I'm using CDH
> beta3.
>
> - Youngwoo
>
> 2011/1/12 <[email protected]>
>
> > Hi,
> >
> > I am not able to import Pig.
> >
> > The following is throwing up import errors
> >
> > >>> from org.apache.pig.scripting import Pig
> > Traceback (most recent call last):
> > File "<stdin>", line 1, in <module>
> > ImportError: No module named apache
> >
> > Any ideas? I checked my classpath, and things look alright.
> >
> > -----Original Message-----
> > From: Richard Ding [mailto:[email protected]]
> > Sent: Tuesday, January 11, 2011 11:48 PM
> > To: pig-user-list; Deepak Choudhary N (WT01 - Product Engineering
> > Services)
> > Subject: Re: Iterative MapReduce with PIG
> >
> > The following script works with the latest 0.9 snapshot:
> >
> > #!/usr/bin/python
> > # Name - embed_pig.py
> >
> > # need to explicitly import the Pig class from
> > org.apache.pig.scripting import Pig
> >
> > p = Pig.compile("""
> > records = LOAD 'path/to/data' AS (input_line:chararray);
> > DESCRIBE records;
> > """)
> >
> > for i in range(0,2):
> > r = p.bind()
> > results = r.runSingle()
> >
> > If you just want to use the command DESCRIBE, this script works better:
> >
> > #!/usr/bin/python
> > # Name - embed_pig.py
> >
> > # need to explicitly import the Pig class from
> > org.apache.pig.scripting import Pig
> >
> > p = Pig.compile("""
> > records = LOAD 'path/to/data' AS (input_line:chararray);
> > """)
> >
> > for i in range(0,2):
> > r = p.bind()
> > r. describe('records')
> >
> > On 1/11/11 1:03 AM, "[email protected]" <[email protected]> wrote:
> >
> > Hi,
> >
> > I downloaded Pig-0.9.0-SNAPSHOT.tar.gz and set it up.
> >
> > I am trying to run this:
> >
> > --------
> > #!/usr/bin/python
> > # Name - embed_pig.py
> >
> > p = Pig.compile("""
> > records = LOAD 'path/to/data' AS (input_line:chararray);
> > DESCRIBE records;
> > """)
> >
> > for i in range(1,2):
> > r = p.bind()
> > results = r.run()
> > if results.getStatus("records") `= "FAILED":
> > raise "Pig job failed"
> >
> > -------------
> >
> > Command to Run:
> > $pig -x local embed_pig.py
> >
> > Error I got:
> > Error 1000 - Parsing Error.
> >
> > The purpose of this script to call the pig scripts twice, iteratively.
> >
> > What is the correct way to run a code like this? Any other special
> > environment variables that I need to set?
> >
> > Thanks,
> > Deepak
> >
> > -----Original Message-----
> > From: Olga Natkovich [mailto:[email protected]]
> > Sent: Monday, January 10, 2011 11:24 PM
> > To: [email protected]
> > Subject: RE: Iterative MapReduce with PIG
> >
> > The initial implementation has been checked into the trunk last
> > Friday. If you feel adventurous, you can give it a try :).
> >
> > Olga
> >
> > -----Original Message-----
> > From: Alan Gates [mailto:[email protected]]
> > Sent: Monday, January 10, 2011 8:19 AM
> > To: [email protected]
> > Subject: Re: Iterative MapReduce with PIG
> >
> > This is one of our major initiatives for 0.9. See
> > http://wiki.apache.org/pig/TuringCompletePig
> > and https://issues.apache.org/jira/browse/PIG-1479. But until that's
> > ready you'll have to use Java or piglet as recommended by Dmitriy.
> >
> > Alan.
> >
> > On Jan 10, 2011, at 3:09 AM, [email protected] wrote:
> >
> > > Hi,
> > >
> > > I need to implement an application that is iterative in nature. At
> > > the end of each iteration, I need to take the result and provide it
> > > as an input for the next iteration.
> > >
> > > Embedding PIG statements in a Java Program looks like one way to do
> > > it.
> > >
> > > But I prefer using Python for programming. How can I do this?
> > >
> > > Thanks,
> > > Deepak
> >
> >
> Please do not print this email unless it is absolutely necessary.
>
> The information contained in this electronic message and any attachments to
> this message are intended for the exclusive use of the addressee(s) and may
> contain proprietary, confidential or privileged information. If you are not
> the intended recipient, you should not disseminate, distribute or copy this
> e-mail. Please notify the sender immediately and destroy all copies of this
> message and any attachments.
>
> WARNING: Computer viruses can be transmitted via email. The recipient
> should check this email and any attachments for the presence of viruses. The
> company accepts no liability for any damage caused by any virus transmitted
> by this email.
>
> www.wipro.com
>