I have a lot of large zipped (not gzipped) files sitting in an Amazon S3 bucket that I want to process. What is the easiest way to process them with a Hadoop map-reduce job? Do I need to write code to transfer them out of S3, unzip them, and then move them to HDFS before running my job, or does Hadoop have support for processing zipped input files directly from S3?
- Support for zipped input files Ken Weiner
- Re: Support for zipped input files jason hadoop
- Re: Support for zipped input files Tom White
- Re: Support for zipped input files tim robertson
- Re: Support for zipped input files Ken Weiner