Hi - I’m trying to use the Apache Tar package (1.8.2) for a Java program that tars large files in Hadoop. I am currently failing on a file that’s 17 GB long. Note that this code works without any problem for smaller files. I’m tarring smaller HDFS files all day long without any problem. It fails only when I have to tar that 17 GB file. I have a hard time making sense of the error message, after looking at source code for 3 days now... The exact file size at the time of the error is: 17456999265 bytes. The exception I’m seeing is:
12/19/11 5:54 PM [BDM.main] EXCEPTION request to write '65535' bytes exceeds size in header of '277130081' bytes 12/19/11 5:54 PM [BDM.main] EXCEPTION org.apache.tools.tar.TarOutputStream.write(TarOutputStream.java:238) 12/19/11 5:54 PM [BDM.main] EXCEPTION com.yahoo.ads.ngdstone.tpbdm.HDFSTar.archive(HDFSTar.java:149) My code is: TarEntry entry = new TarEntry(p.getName()); Path absolutePath = p.isAbsolute() ? p : new Path(baseDir, p); // HDFS Path FileStatus fileStatus = fs.getFileStatus(absolutePath); // HDFS fileStatus entry.setNames(fileStatus.getOwner(), fileStatus.getGroup()); entry.setUserName(user); entry.setGroupName(group); entry.setName(name); entry.setSize(fileStatus.getLen()); entry.setMode(Integer.parseInt("0100" + permissions, 8)); out.putNextEntry(entry); // out = TarOutputStream if (fileStatus.getLen() > 0) { InputStream in = fs.open(absolutePath); // large file in HDFS try { ++nEntries; int bytesRead = in.read(buf); while (bytesRead >= 0) { out.write(buf, 0, bytesRead); bytesRead = in.read(buf); } } finally { in.close(); } } out.closeEntry(); Any idea? Am I missing anything in the way I’m setting up the TarOutputStream or TarEntry? Or does tar have implicit limits that are never going to work for multi-gigabytes size files? Thanks! Frank