Thanks very much, Tom. You saved me a lot of time by confirming that it isn't available yet. I'll go vote for HADOOP-1824.
On Tue, Mar 10, 2009 at 3:23 AM, Tom White <t...@cloudera.com> wrote: > Hi Ken, > > Unfortunately, Hadoop doesn't yet support MapReduce on zipped files > (see https://issues.apache.org/jira/browse/HADOOP-1824), so you'll > need to write a program to unzip them and write them into HDFS first. > > Cheers, > Tom > > On Tue, Mar 10, 2009 at 4:11 AM, jason hadoop <jason.had...@gmail.com> > wrote: > > Hadoop has support for S3, the compression support is handled at another > > level and should also work. > > > > > > On Mon, Mar 9, 2009 at 9:05 PM, Ken Weiner <k...@gumgum.com> wrote: > > > >> I have a lot of large zipped (not gzipped) files sitting in an Amazon S3 > >> bucket that I want to process. What is the easiest way to process them > >> with > >> a Hadoop map-reduce job? Do I need to write code to transfer them out > of > >> S3, unzip them, and then move them to HDFS before running my job, or > does > >> Hadoop have support for processing zipped input files directly from S3? > >> > > > > > > > > -- > > Alpha Chapters of my book on Hadoop are available > > http://www.apress.com/book/view/9781430219422 > > >