[ 
https://issues.apache.org/jira/browse/CASSANDRA-20495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17939257#comment-17939257
 ] 

Stefan Miklosovic edited comment on CASSANDRA-20495 at 3/28/25 3:45 PM:
------------------------------------------------------------------------

I have used a tool called "diffoscope". They seem to use it a lot in Debian for 
their reproducible builds effort.

We have two artefacts to check - src tarball and bin tarball.

Src tarball is not looking bad (html in the attachments), it seems to me that 
it differs only on timestamps the documentation files were created at each 
time. However, the resulting tarball differs in size and it is not obvious why.

{code}
$ ls -la | grep src
-rw-rw-r--   1 fermat fermat 25779851 mar 28 14:37 
apache-cassandra-5.1-SNAPSHOT-src-2.tar.gz
-rw-rw-r--   1 fermat fermat 25779819 mar 28 14:34 
apache-cassandra-5.1-SNAPSHOT-src.tar.gz
{code}

Binary tarballs differ on size as well:

{code}
$ ls -la | grep bin
-rw-rw-r--   1 fermat fermat 72254180 mar 28 14:52 
apache-cassandra-5.1-SNAPSHOT-bin-2.tar.gz
-rw-rw-r--   1 fermat fermat 72254203 mar 28 14:51 
apache-cassandra-5.1-SNAPSHOT-bin.tar.gz
{code}

html report says that the timestamps differ too and 
apache-cassandra-5.1-SNAPSHOT.jar, interestingly, differs across builds as well:

12072829 2025-03-28·13:51:41.000000 
apache-cassandra-5.1-SNAPSHOT/lib/apache-cassandra-5.1-SNAPSHOT.jar
12072827 2025-03-28·13:34:38.000000 
apache-cassandra-5.1-SNAPSHOT/lib/apache-cassandra-5.1-SNAPSHOT.jar

Except timestamps, one jar differs from another in 2 bytes. Huh ...

The output mentions:

{code}
apache-cassandra-5.1-SNAPSHOT/lib/apache-cassandra-5.1-SNAPSHOT.jar
Command `'zipdetails --redact --scan --utc {}'` failed with exit code 255. 
Standard output:
    Unknown option: redact
    Unknown option: utc
    Invalid command line option
    
    
    zipdetails [OPTIONS] file
    
    Display details about the internal structure of a Zip file.
    
    This is zipdetails version 2.02  [...]
Archive contents identical but files differ, possibly due to different 
compression levels. Falling back to binary comparison.
{code}

I think that either it was compressed slightly differently or compiler produces 
a slightly different jar each time on binary level. 

(1) https://diffoscope.org/


was (Author: smiklosovic):
I have used a tool called "diffoscope". They seem to use it a lot in Debian for 
their reproducible builds effort.

We have two artefacts to check - src tarball and bin tarball.

Src tarball is not looking bad (html in the attachments), it seems to me that 
it differs only on timestamps the documentation files were created at each 
time. However, the resulting tarball differs in size and it is not obvious why.

{code}
$ ls -la | grep src
-rw-rw-r--   1 fermat fermat 25779851 mar 28 14:37 
apache-cassandra-5.1-SNAPSHOT-src-2.tar.gz
-rw-rw-r--   1 fermat fermat 25779819 mar 28 14:34 
apache-cassandra-5.1-SNAPSHOT-src.tar.gz
{code}

Binary tarballs differ on size as well:

{code}
$ ls -la | grep bin
-rw-rw-r--   1 fermat fermat 72254180 mar 28 14:52 
apache-cassandra-5.1-SNAPSHOT-bin-2.tar.gz
-rw-rw-r--   1 fermat fermat 72254203 mar 28 14:51 
apache-cassandra-5.1-SNAPSHOT-bin.tar.gz
{code}

html report says that the timestamps differ too and 
apache-cassandra-5.1-SNAPSHOT.jar, interestingly, differs across builds as well:

12072829 2025-03-28·13:51:41.000000 
apache-cassandra-5.1-SNAPSHOT/lib/apache-cassandra-5.1-SNAPSHOT.jar
12072827 2025-03-28·13:34:38.000000 
apache-cassandra-5.1-SNAPSHOT/lib/apache-cassandra-5.1-SNAPSHOT.jar

Except timestamps, one jar differs from another in 2 bytes. Huh ...

The output mentions:

{code}
apache-cassandra-5.1-SNAPSHOT/lib/apache-cassandra-5.1-SNAPSHOT.jar
Command `'zipdetails --redact --scan --utc {}'` failed with exit code 255. 
Standard output:
    Unknown option: redact
    Unknown option: utc
    Invalid command line option
    
    
    zipdetails [OPTIONS] file
    
    Display details about the internal structure of a Zip file.
    
    This is zipdetails version 2.02  [...]
Archive contents identical but files differ, possibly due to different 
compression levels. Falling back to binary comparison.
{code}

(1) https://diffoscope.org/

> Investigate what blocks us from having bit-by-bit reproducible builds / 
> release tarballs
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-20495
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20495
>             Project: Apache Cassandra
>          Issue Type: Task
>            Reporter: Stefan Miklosovic
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>         Attachments: diffoscope-cassandra-bin-tarball.html, 
> diffoscope-cassandra-src-tarball.html
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to