Moritz Moeller created PIG-3246:
-----------------------------------
Summary: not possible to use remote filesystems (S3) in a pig
script
Key: PIG-3246
URL: https://issues.apache.org/jira/browse/PIG-3246
Project: Pig
Issue Type: Bug
Environment: Apache Pig version 0.10.0-cdh4.2.0 (rexported)
Hadoop 2.0.0-cdh4.2.0
Reporter: Moritz Moeller
My Hadoop cluster is configured using hdfs://namenode/, hdfs dfs + Pig scripts
work fine.
Now I want to read data from S3, hdfs dfs -ls s3n://mybucket/file.csv works
fine.
A Pig script doing LOAD 's3n://mybucket/test.csv' however fails - looks as if
Pig is performing the LOAD request using a hdfs FileSystem.
I wasn't sure whether to mark this as bug or improvement as I do not know if
this should be possible - but as it is a basic feature for Hadoop I guess it
should work in Pig, too.
org.apache.pig.backend.executionengine.ExecException: ERROR 2118:
java.net.UnknownHostException: mybucket
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:288)
at
org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:452)
at
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:469)
at
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215)
at
org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:336)
at
org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:233)
at java.lang.Thread.run(Thread.java:722)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:257)
Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException:
sdfa
at
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:414)
at
org.apache.hadoop.security.SecurityUtil.buildDTServiceName(SecurityUtil.java:295)
at
org.apache.hadoop.fs.FileSystem.getCanonicalServiceName(FileSystem.java:247)
at
org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:468)
at
org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:452)
at
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121)
at
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
at
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:205)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:269)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:274)
... 13 more
Caused by: java.net.UnknownHostException: mybucket
... 25 more
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira