BTW, yes the referenced s3 bucket does exist, and

hdfs dfs -ls s3n://agittens/CFSRArawtars

does list the entries, although it first prints the same warnings:

015-12-10 00:26:53,815 WARN  httpclient.RestS3Service
(RestS3Service.java:performRequest(393)) - Response '/CFSRArawtars' -
Unexpected response code 404, expected 200
2015-12-10 00:26:53,909 WARN  httpclient.RestS3Service
(RestS3Service.java:performRequest(393)) - Response
'/CFSRArawtars_%24folder%24' - Unexpected response code 404, expected 200
2015-12-10 00:26:54,243 WARN  httpclient.RestS3Service
(RestS3Service.java:performRequest(393)) - Response '/CFSRArawtars' -
Unexpected response code 404, expected 200
Found 2306 items
-rwxrwxrwx   1     177408 2015-11-18 00:26
s3n://agittens/CFSRArawtars/completefilelist
-rwxrwxrwx   1         60 2015-11-18 00:26
s3n://agittens/CFSRArawtars/copyallfileshere.sh
-rwxrwxrwx   1 1814040064 2015-11-18 00:26
s3n://agittens/CFSRArawtars/pgbh02.gdas.19790101-19790105.tar
-rwxrwxrwx   1 1788727808 2015-11-18 00:27
s3n://agittens/CFSRArawtars/pgbh02.gdas.19790106-19790110.tar
-rw
...

On Wed, Dec 9, 2015 at 4:24 PM, AlexG <[email protected]> wrote:

> I've been using the same method to launch my clusters then pull my data
> from
> S3 to local hdfs:
>
> $SPARKHOME/ec2/spark-ec2 -k mykey -i ~/.ssh/mykey.pem -s 29
> --instance-type=r3.8xlarge --placement-group=pcavariants
> --copy-aws-credentials --hadoop-major-version=2 --spot-price=2.8 launch
> mycluster --region=us-west-2
>
> then
>
> ephemeral-hdfs/bin/hadoop distcp s3n://agittens/CFSRArawtars CFSRArawtars
>
> Before this worked as I'd expect. Within the last several days, I've been
> getting this error when I run the distcp command:
> 2015-12-10 00:16:43,113 WARN  httpclient.RestS3Service
> (RestS3Service.java:performRequest(393)) - Response '/CFSRArawtars' -
> Unexpected response code 404, expected 200
> 2015-12-10 00:16:43,207 WARN  httpclient.RestS3Service
> (RestS3Service.java:performRequest(393)) - Response
> '/CFSRArawtars_%24folder%24' - Unexpected response code 404, expected 200
> 2015-12-10 00:16:43,422 WARN  httpclient.RestS3Service
> (RestS3Service.java:performRequest(393)) - Response '/CFSRArawtars' -
> Unexpected response code 404, expected 200
> 2015-12-10 00:16:43,513 WARN  httpclient.RestS3Service
> (RestS3Service.java:performRequest(393)) - Response
> '/CFSRArawtars_%24folder%24' - Unexpected response code 404, expected 200
> 2015-12-10 00:16:43,737 WARN  httpclient.RestS3Service
> (RestS3Service.java:performRequest(393)) - Response '/CFSRArawtars' -
> Unexpected response code 404, expected 200
> 2015-12-10 00:16:43,830 WARN  httpclient.RestS3Service
> (RestS3Service.java:performRequest(393)) - Response
> '/CFSRArawtars_%24folder%24' - Unexpected response code 404, expected 200
> 2015-12-10 00:16:44,015 WARN  httpclient.RestS3Service
> (RestS3Service.java:performRequest(393)) - Response '/CFSRArawtars' -
> Unexpected response code 404, expected 200
> 2015-12-10 00:16:46,141 WARN  conf.Configuration
> (Configuration.java:warnOnceIfDeprecated(824)) - io.sort.mb is deprecated.
> Instead, use mapreduce.task.io.sort.mb
> 2015-12-10 00:16:46,141 WARN  conf.Configuration
> (Configuration.java:warnOnceIfDeprecated(824)) - io.sort.factor is
> deprecated. Instead, use mapreduce.task.io.sort.factor
> 2015-12-10 00:16:46,630 INFO  service.AbstractService
> (AbstractService.java:init(81)) -
> Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
> 2015-12-10 00:16:46,630 INFO  service.AbstractService
> (AbstractService.java:start(94)) -
> Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
> 2015-12-10 00:16:47,135 INFO  mapreduce.JobSubmitter
> (JobSubmitter.java:submitJobInternal(368)) - number of splits:21
>
> Then the job hangs and does nothing until I kill it. Any idea what the
> problem is and how to fix it, or a work-around for getting my data off S3
> quickly? It is around 4 TB.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/distcp-suddenly-broken-with-spark-ec2-script-setup-tp25658.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to