BTW, yes the referenced s3 bucket does exist, and hdfs dfs -ls s3n://agittens/CFSRArawtars
does list the entries, although it first prints the same warnings: 015-12-10 00:26:53,815 WARN httpclient.RestS3Service (RestS3Service.java:performRequest(393)) - Response '/CFSRArawtars' - Unexpected response code 404, expected 200 2015-12-10 00:26:53,909 WARN httpclient.RestS3Service (RestS3Service.java:performRequest(393)) - Response '/CFSRArawtars_%24folder%24' - Unexpected response code 404, expected 200 2015-12-10 00:26:54,243 WARN httpclient.RestS3Service (RestS3Service.java:performRequest(393)) - Response '/CFSRArawtars' - Unexpected response code 404, expected 200 Found 2306 items -rwxrwxrwx 1 177408 2015-11-18 00:26 s3n://agittens/CFSRArawtars/completefilelist -rwxrwxrwx 1 60 2015-11-18 00:26 s3n://agittens/CFSRArawtars/copyallfileshere.sh -rwxrwxrwx 1 1814040064 2015-11-18 00:26 s3n://agittens/CFSRArawtars/pgbh02.gdas.19790101-19790105.tar -rwxrwxrwx 1 1788727808 2015-11-18 00:27 s3n://agittens/CFSRArawtars/pgbh02.gdas.19790106-19790110.tar -rw ... On Wed, Dec 9, 2015 at 4:24 PM, AlexG <[email protected]> wrote: > I've been using the same method to launch my clusters then pull my data > from > S3 to local hdfs: > > $SPARKHOME/ec2/spark-ec2 -k mykey -i ~/.ssh/mykey.pem -s 29 > --instance-type=r3.8xlarge --placement-group=pcavariants > --copy-aws-credentials --hadoop-major-version=2 --spot-price=2.8 launch > mycluster --region=us-west-2 > > then > > ephemeral-hdfs/bin/hadoop distcp s3n://agittens/CFSRArawtars CFSRArawtars > > Before this worked as I'd expect. Within the last several days, I've been > getting this error when I run the distcp command: > 2015-12-10 00:16:43,113 WARN httpclient.RestS3Service > (RestS3Service.java:performRequest(393)) - Response '/CFSRArawtars' - > Unexpected response code 404, expected 200 > 2015-12-10 00:16:43,207 WARN httpclient.RestS3Service > (RestS3Service.java:performRequest(393)) - Response > '/CFSRArawtars_%24folder%24' - Unexpected response code 404, expected 200 > 2015-12-10 00:16:43,422 WARN httpclient.RestS3Service > (RestS3Service.java:performRequest(393)) - Response '/CFSRArawtars' - > Unexpected response code 404, expected 200 > 2015-12-10 00:16:43,513 WARN httpclient.RestS3Service > (RestS3Service.java:performRequest(393)) - Response > '/CFSRArawtars_%24folder%24' - Unexpected response code 404, expected 200 > 2015-12-10 00:16:43,737 WARN httpclient.RestS3Service > (RestS3Service.java:performRequest(393)) - Response '/CFSRArawtars' - > Unexpected response code 404, expected 200 > 2015-12-10 00:16:43,830 WARN httpclient.RestS3Service > (RestS3Service.java:performRequest(393)) - Response > '/CFSRArawtars_%24folder%24' - Unexpected response code 404, expected 200 > 2015-12-10 00:16:44,015 WARN httpclient.RestS3Service > (RestS3Service.java:performRequest(393)) - Response '/CFSRArawtars' - > Unexpected response code 404, expected 200 > 2015-12-10 00:16:46,141 WARN conf.Configuration > (Configuration.java:warnOnceIfDeprecated(824)) - io.sort.mb is deprecated. > Instead, use mapreduce.task.io.sort.mb > 2015-12-10 00:16:46,141 WARN conf.Configuration > (Configuration.java:warnOnceIfDeprecated(824)) - io.sort.factor is > deprecated. Instead, use mapreduce.task.io.sort.factor > 2015-12-10 00:16:46,630 INFO service.AbstractService > (AbstractService.java:init(81)) - > Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited. > 2015-12-10 00:16:46,630 INFO service.AbstractService > (AbstractService.java:start(94)) - > Service:org.apache.hadoop.yarn.client.YarnClientImpl is started. > 2015-12-10 00:16:47,135 INFO mapreduce.JobSubmitter > (JobSubmitter.java:submitJobInternal(368)) - number of splits:21 > > Then the job hangs and does nothing until I kill it. Any idea what the > problem is and how to fix it, or a work-around for getting my data off S3 > quickly? It is around 4 TB. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/distcp-suddenly-broken-with-spark-ec2-script-setup-tp25658.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
