Hi,

I think I have found a bug in the new FileChannels.contentEquals(...) in
Commons IO 2.15.0 which then affects RandomAccessFiles.contentEquals(...),
PathUtils.fileContentEquals(...), FileUtils.contentEquals(...), and maybe
more methods. But before opening an issue in ASF JIRA, I would like to
present my findings here.

My current working hypothesis:
If two files have the exact same size and the exact same content in the
last 8192 bytes (value of IOUtils.DEFAULT_BUFFER_SIZE), then all of the
above methods will return true, even if the content of the files is
different before the last 8192 bytes.

Here is some example code:

// create two files with same size but different content
// (3 different bytes followed by 8192 equal bytes)
File file1 = new File("file1.txt");
File file2 = new File("file2.txt");

String sameContent = StringUtils.repeat("x", 8192);

String content1 = "ABC" + sameContent;
String content2 = "XYZ" + sameContent;
FileUtils.writeStringToFile(file1, content1, StandardCharsets.US_ASCII);
FileUtils.writeStringToFile(file2, content2, StandardCharsets.US_ASCII);


// compare files
boolean equals = FileUtils.contentEquals(file1, file2);
System.out.println(equals);


I would expect this to print "false" as the first 3 bytes are different,
but the code prints "true". I tested with Eclipse Temurin 11.0.1, 17.0.7,
and 21.0.1, all on MacOS.

I'm not an expert on FileChannels, but I think the problem has something to
do with the call to method FileChannel.read(ByteBuffer) in
org.apache.commons.io.channels.FileChannels. Before the call to this
method, the buffer is at position 0. After this call, the buffer is at
position 8192. With a limit of 8192, this means that there are 0 remaining
bytes to be compared in ByteBuffer.equals(...). Maybe there should be a
call to ByteBuffer.rewind() or ByteBuffer.position(0) after the read so
that the position is set back to 0 before comparing the content of the
buffers? But as stated before, I'm not an expert on this topic. I'm also
not sure whether my hypothesis is 100% accurate. But this has been my
conclusion after some experiments.

Best regards,
Ste

Reply via email to