Hi Yu Li, The JIRA HADOOP-7823 backported support for splitting Bzip2 files plus MR support for it, into branch-1, and it is already available in the 1.1.x releases out currently.
Concatenated Bzip2 files, i.e., HADOOP-7386, is not implemented yet (AFAIK), but Chris over HADOOP-6335 suggests that HADOOP-4012 may have fixed it - so can you try and report back? On Mon, Dec 3, 2012 at 3:19 PM, Yu Li <car...@gmail.com> wrote: > Dear all, > > About splitting support for bzip2, I checked on the JIRA list and found > HADOOP-7386 marked as "Won't fix"; I also found some work done in > branch-0.21(also in trunk), say HADOOP-4012 and MAPREDUCE-830, but not > integrated/migrated into branch-1, so I guess we don't support contatenated > bzip2 in branch-1, correct? If so, is there any special reason? Many thanks! > > -- > Best Regards, > Li Yu -- Harsh J