Pig in Windows)

Suraj Nayak M Sun, 13 Jul 2014 11:30:48 -0700

Thanks Satish.

I had tried to run Pig in Windows a year ago. But I was not successfulin installing (even after installing Cygwin). This blog might help manyWindows users to use Pig with PigUnit :)


--
Suraj Nayak

On Sunday 13 July 2014 10:19 PM, Satish Kolli wrote:

You can't do some pig operations in windows especially with hadoop1.x. Following article talks about couple of options(hacks) that youcan use to run Pig scripts on windows.


http://simpletoad.blogspot.com/2013/05/pigunit-issue-on-windows.html?m=1

On Jul 13, 2014 12:42 PM, "Krishnan K" <[email protected]<mailto:[email protected]>> wrote:


    Hi Suraj,

    Thanks for replying.

    
\tmp\hadoop-krkrishnamoorthy\mapred\staging\krkrishnamoorthy502928296\.staging
    seems to refer to a path in the unix filesystem. I dont have c:\temp.

    it is trying to set the permissions to 700, which should apparently be
    possible in the Unix environment.

    I'll try to setup cygwin. Is that all that is required ?

    Thanks!




    On Sun, Jul 13, 2014 at 7:51 AM, Suraj Nayak M <[email protected]
    <mailto:[email protected]>> wrote:

    >  Hi Krishnan,
    >
    > Regarding the error, I can see line
    >
    > 14/07/12 17:55:31 ERROR security.UserGroupInformation:
    > PriviledgedActionException as:krkrishnamoorthy
    cause:java.io.IOException:
    > Failed to set permissions of path:
    >
    >
    
\tmp\hadoop-krkrishnamoorthy\mapred\staging\krkrishnamoorthy502928296\.staging
    > to 0700
    >
    > Do you have C:\tmp in C drive ?
    >
    > I have added my replies to your questions inline below.
    >
    >
    > On Sunday 13 July 2014 09:44 AM, Krishnan K wrote:
    >
    > Hi,
    >
    > I'm running a PigScript on my Windows machine. I don't have a
    hadoop/pig
    > environment installed.
    >
    > Some questions :
    > 1. Can I run PigUnit test cases in *Windows *without having any
    *hadoop*/*pig
    > environment setup *?
    >
    >  You can run PigUnit test cases locally. I have tried in Linux,
    it works
    > and do not require hadoop to be installed. If you have cygwin
    installed,
    > you should also be able to run PigUnit test cases.
    >
    > 2. Can I run PigUnit testcases in *local *mode through eclipse
    if I can
    > configure the cluster details ? If yes, where can I provide my
    cluster
    > details ?
    >
    >  No need of cluster configuration in local mode
    >
    > 3. Can I run PigUnit testcases in *mapreduce *mode through
    eclipse if I can
    > configure the cluster details ? If yes, where can I provide my
    cluster
    > details ?
    >
    >  Copy *.xml from cluster to local machine (in a folder) and add
    the folder
    > to classpath. (I have not tested this).
    >
    >  4. Can I build maven jar without running test cases in my
    Windows machine
    > and deploy them in a cluster having hadoop/pig ?
    >
    >  Yes. U can use* -DskipTests* option in maven goal.
    Alternatively, If you
    > are using eclipse to build maven jar, in the build dialog of
    eclipse(where
    > you specify goals), you can check the option to skip the tests
    option.
    >
    >
    > Appreciate your help.
    >
    > I executed a pigunit test case and it errored out. Please find
    the log
    > below which has error details :
    >
    > 14/07/12 17:55:30 INFO pigunit.PigTest: Using default local mode
    > 14/07/12 17:55:30 INFO executionengine.HExecutionEngine:
    Connecting to
    > hadoop file system at: file:///
    > 14/07/12 17:55:30 INFO pigunit.PigTest: -- Load users from hdfs
    > users = LOAD 'src/test/resources/input/users.txt' USING
    PigStorage(',') AS
    > (id:long, firstName:chararray, lastName:chararray,
    country:chararray,
    > city:chararray, company:chararray);
    >
    > -- Load ratings from hdfs
    > awesomenessRating = LOAD 'src/test/resources/input/rating.txt' USING
    > PigStorage(',') AS (userId:long, rating:long);
    >
    > -- Join records by userId
    > joinedRecords = JOIN users BY id, awesomenessRating BY userId;
    >
    > -- Filter users with awesomenessRating > 150
    > filteredRecords = FILTER joinedRecords BY
    awesomenessRating::rating > 150;
    >
    > -- Generate fields that we are interested in
    > generatedRecords = FOREACH filteredRecords GENERATE
    >  users::id AS id,
    > users::firstName AS firstName,
    >  users::country AS country,
    > awesomenessRating::rating AS rating;
    >
    > -- Store results
    > STORE generatedRecords INTO
    'src/test/resources/results/awesomeness' USING
    > PigStorage();
    >
    > 14/07/12 17:55:30 INFO util.Utils: Default bootup file
    > C:\Users\krkrishnamoorthy/.pigbootup not found
    > users = LOAD 'src/test/resources/input/users.txt' USING
    PigStorage(',') AS
    > (id:long, firstName:chararray, lastName:chararray,
    country:chararray,
    > city:chararray, company:chararray);
    > --> users = LOAD 'src/test/resources/input/users.txt' USING
    PigStorage(',')
    > AS
    >
    
(id:long,firstName:chararray,lastName:chararray,country:chararray,city:chararray,company:chararray);
    > awesomenessRating = LOAD 'src/test/resources/input/rating.txt' USING
    > PigStorage(',') AS (userId:long, rating:long);
    >  --> awesomenessRating = LOAD
    > 'src/test/resources/input/awesomeness-rating.txt' USING
    PigStorage(',') AS
    > (userId:long, rating:long);
    > STORE generatedRecords INTO
    'src/test/resources/results/awesomeness' USING
    > PigStorage();
    > --> none
    > 14/07/12 17:55:31 INFO pigstats.ScriptState: Pig features used
    in the
    > script: HASH_JOIN
    > 14/07/12 17:55:31 INFO optimizer.LogicalPlanOptimizer:
    > {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune,
    > DuplicateForEachColumnRewrite, FilterLogicExpressionSimplifier,
    > GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer,
    > LoadTypeCastInserter, MergeFilter, MergeForEach,
    > NewPartitionFilterOptimizer, PartitionFilterOptimizer,
    > PushDownForEachFlatten, PushUpFilter, SplitFilter,
    StreamTypeCastInserter]}
    > 14/07/12 17:55:31 INFO mapReduceLayer.MRCompiler: File concatenation
    > threshold: 100 optimistic? false
    > 14/07/12 17:55:31 INFO
    > mapReduceLayer.MRCompiler$LastInputStreamingOptimizer: Rewrite:
    > POPackage->POForEach to POJoinPackage
    > 14/07/12 17:55:31 INFO mapReduceLayer.MultiQueryOptimizer: MR
    plan size
    > before optimization: 1
    > 14/07/12 17:55:31 INFO mapReduceLayer.MultiQueryOptimizer: MR
    plan size
    > after optimization: 1
    > 14/07/12 17:55:31 INFO pigstats.ScriptState: Pig script settings
    are added
    > to the job
    > 14/07/12 17:55:31 INFO mapReduceLayer.JobControlCompiler:
    > mapred.job.reduce.markreset.buffer.percent is not set, set to
    default 0.3
    > 14/07/12 17:55:31 INFO mapReduceLayer.JobControlCompiler:
    Setting up single
    > store job
    > 14/07/12 17:55:31 INFO data.SchemaTupleFrontend: Key
    [pig.schematuple] is
    > false, will not generate code.
    > 14/07/12 17:55:31 INFO data.SchemaTupleFrontend: Starting
    process to move
    > generated code to distributed cache
    > 14/07/12 17:55:31 INFO data.SchemaTupleFrontend: Distributed
    cache not
    > supported or needed in local mode. Setting key
    [pig.schematuple.local.dir]
    > with code temp directory:
    > C:\Users\KRKRIS~1\AppData\Local\Temp\1405212931260-0
    > 14/07/12 17:55:31 INFO mapReduceLayer.JobControlCompiler: Reduce
    phase
    > detected, estimating # of required reducers.
    > 14/07/12 17:55:31 INFO mapReduceLayer.JobControlCompiler: Using
    reducer
    > estimator:
    >
    
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
    > 14/07/12 17:55:31 INFO mapReduceLayer.InputSizeReducerEstimator:
    > BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=-1
    > 14/07/12 17:55:31 INFO mapReduceLayer.JobControlCompiler: Could not
    > estimate number of reducers and no requested or default
    parallelism set.
    > Defaulting to 1 reducer.
    > 14/07/12 17:55:31 INFO mapReduceLayer.JobControlCompiler: Setting
    > Parallelism to 1
    > 14/07/12 17:55:31 INFO mapReduceLayer.MapReduceLauncher: 1
    map-reduce
    > job(s) waiting for submission.
    > 14/07/12 17:55:31 WARN util.NativeCodeLoader: Unable to load
    native-hadoop
    > library for your platform... using builtin-java classes where
    applicable
    > 14/07/12 17:55:31 ERROR security.UserGroupInformation:
    > PriviledgedActionException as:krkrishnamoorthy
    cause:java.io.IOException:
    > Failed to set permissions of path:
    >
    
\tmp\hadoop-krkrishnamoorthy\mapred\staging\krkrishnamoorthy502928296\.staging
    > to 0700
    > 14/07/12 17:55:31 INFO mapReduceLayer.MapReduceLauncher: 0% complete
    > 14/07/12 17:55:31 WARN mapReduceLayer.MapReduceLauncher: Ooops!
    Some job
    > has failed! Specify -stop_on_failure if you want Pig to stop
    immediately on
    > failure.
    > 14/07/12 17:55:31 INFO mapReduceLayer.MapReduceLauncher: job
    null has
    > failed! Stop running all dependent jobs
    > 14/07/12 17:55:31 INFO mapReduceLayer.MapReduceLauncher: 100%
    complete
    > 14/07/12 17:55:31 WARN mapReduceLayer.Launcher: There is no log
    file to
    > write to.
    > 14/07/12 17:55:31 ERROR mapReduceLayer.Launcher: Backend error
    message
    > during job submission
    > java.io.IOException: Failed to set permissions of path:
    >
    
\tmp\hadoop-krkrishnamoorthy\mapred\staging\krkrishnamoorthy502928296\.staging
    > to 0700
    >  at
    org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:691)
    > at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:664)
    >  at
    >
    
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:514)
    >  at
    >
    org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:349)
    > at
    org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:193)
    >  at
    >
    
org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126)
    >  at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942)
    > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
    >  at java.security.AccessController.doPrivileged(Native Method)
    > at javax.security.auth.Subject.doAs(Subject.java:422)
    >  at
    >
    
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    >  at
    org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
    > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
    >  at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
    > at
    >
    
org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
    >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    > at
    >
    
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    >  at
    >
    
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    >  at java.lang.reflect.Method.invoke(Method.java:483)
    > at
    >
    
org.apache.pig.backend.hadoop20.PigJobControl.mainLoopAction(PigJobControl.java:157)
    >  at
    >
    org.apache.pig.backend.hadoop20.PigJobControl.run(PigJobControl.java:134)
    > at java.lang.Thread.run(Thread.java:744)
    >  at
    >
    
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270)
    >
    > 14/07/12 17:55:31 ERROR pigstats.SimplePigStats: ERROR: Failed
    to set
    > permissions of path:
    >
    
\tmp\hadoop-krkrishnamoorthy\mapred\staging\krkrishnamoorthy502928296\.staging
    > to 0700
    > 14/07/12 17:55:31 ERROR pigstats.PigStatsUtil: 1 map reduce
    job(s) failed!
    > 14/07/12 17:55:31 INFO pigstats.SimplePigStats: Detected Local
    mode. Stats
    > reported below may be incomplete
    > 14/07/12 17:55:31 INFO pigstats.SimplePigStats: Script Statistics:
    >
    > HadoopVersion PigVersion UserId StartedAt FinishedAt Features
    > 1.2.1 0.12.0 krkrishnamoorthy 2014-07-12 17:55:31 2014-07-12
    17:55:31
    > HASH_JOIN
    >
    > Failed!
    >
    > Failed Jobs:
    > JobId Alias Feature Message Outputs
    > N/A awesomenessRating,joinedRecords,users HASH_JOIN Message:
    > java.io.IOException: Failed to set permissions of path:
    >
    
\tmp\hadoop-krkrishnamoorthy\mapred\staging\krkrishnamoorthy502928296\.staging
    > to 0700
    >  at
    org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:691)
    > at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:664)
    >  at
    >
    
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:514)
    >  at
    >
    org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:349)
    > at
    org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:193)
    >  at
    >
    
org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126)
    >  at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942)
    > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
    >  at java.security.AccessController.doPrivileged(Native Method)
    > at javax.security.auth.Subject.doAs(Subject.java:422)
    >  at
    >
    
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    >  at
    org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
    > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
    >  at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
    > at
    >
    
org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
    >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    > at
    >
    
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    >  at
    >
    
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    >  at java.lang.reflect.Method.invoke(Method.java:483)
    > at
    >
    
org.apache.pig.backend.hadoop20.PigJobControl.mainLoopAction(PigJobControl.java:157)
    >  at
    >
    org.apache.pig.backend.hadoop20.PigJobControl.run(PigJobControl.java:134)
    > at java.lang.Thread.run(Thread.java:744)
    >  at
    >
    
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270)
    >  file:/tmp/temp49116140/tmp1118481539,
    >
    > Input(s):
    > Failed to read data
    
from"file:///C:/Users/krkrishnamoorthy/workspace/test/pig-unit-example/src/test/resources/input/awesomeness-rating.txt"
    > Failed to read data
    
from"file:///C:/Users/krkrishnamoorthy/workspace/test/pig-unit-example/src/test/resources/input/users.txt"
    >
    > Output(s):
    > Failed to produce result in "file:/tmp/temp49116140/tmp1118481539"
    >
    > Job DAG:
    > null
    >
    > 14/07/12 17:55:32 INFO mapReduceLayer.MapReduceLauncher: Failed!
    >
    >
    > Thanks,
    > Krishnan
    >
    >
    >
    >

Re: Error : PigUnit in Windows->Eclipse (without Hadoop/Pig in Windows)

Reply via email to