-1 for now, sorry.

Reviewed:

* signatures OK
* checksums  OK
* licences NOK
* reproducibility from sources


I think there is the .gitmodule problem that should be solved, also lack of
-source.tar.gz explicitly is not really good I think..

Several reasons:

1) Lack of explicit source package (this is "almost -1" for me, because
formally speaking the .sdist package is fulfilling the letter of the source
package, but IMHO it does not necessarily fulfills the spirit.

I think it's not very clear which package is "source" and which are
"convenience/binary" packages. From what I see, the .tar.gz is **something
between** source package and the .sdist. It **looks** like an sdist package
(with PKG_INFO) - but also it contains "tests" - which is unusual for sdist
packages (however there is a big debate about it  [1]). The requirement for
"source" packages published by the ASF is that it contains all the sources
needed to build code and tests [2] (which your .sdist file has, so that's
cool) - it seems to some extent it follows the expectation. I think it must
be clear which of the packages is "-source" one and naming it like that and
keeping it separate from .sdist is a good idea.

We also in Airflow - for quite a while - took some of our .sdist files as
"source" releases when we released only some of the distributions that are
part of the monorepo.  When we did it in the past -  in Airflow we
explicitly mentioned in our emails that those .sdist packages are the
"source" packages as expected by the ASF [3] .  But eventually we entirely
gave up on it (a few weeks ago) , because we opted in to include
essentially **everything** that is in the source repo of ours (we are
essentially using git archive to produce the source-tar.gz). The main
reason was that if we **only** release .sdist, some of our important code
(such as sources for docs) were not published when we released only
.sdists.

The .sdist of yours misses quite a number of files from the repo:

* big number of  examples
* docs sources - I think this is an important miss - while docs are
* telemetry folder
* .github and .gitmodules (are those gitmodules necessary to build the
project?)

It's likely that those files are excluded deliberately and something that
you do not **want** to release at all, but I find it a bit strange to
remove docs and many examples, It seems that those who unpack sources from
the official source package, cannot do all the same things as people who
check it out from repo TAG . If someone takes it as "source" and never
looks at the GitHub repo - they will miss important sources (like docs
sources) that IMHO is something that the users **should** have. Generally
users should be able to do the same with the "-source.tar.gz" as what they
can when they do `git checkout TAG` in your repo.

The AI-generated (undoubtedly but that's ok ;)  doducmentation in README.md
describes what goes in and out but it does not explain WHY. I think if you
**really** want to exclude some files from your source distribution you
should explain WHY in the documentation.

Just to add a bit of context. You might think that the "-source.tar.gz"
file is not that important, as nearly nobody will use it. Which is a fair
assesment ("nearly nobody") - but those who do are the important users -
those are downstream packagers, who might want to include burr in distros
for example. Many of the distros that are out there use the officially
signed and checksummed packages to build and install their packages. For
example this is what conda might want to do. Or Debian maintainers. Those
are important users and we need to make sure that they can do it easily.
That's the safest bet to produce explicitly "-source.tar.gz" as a "git
archive" result IMHO - and not exclude things that you would normally
commit to the repo (note that you can have generated code committed to your
repo - and there is "no compiled code in your repo" - so that would
probably be the only thing to exclude (if your build process rebuilds those
generated files automatically). This can be done via .gitattributes [4] in
airflow.

2) The .gitmodules thing is the final reason why I gave -1. I am not sure -
it's not clear- if BentoBurr mentioned whether it is needed to build the
project or not. This project is not only archived, but also misses LICENCE
information, so while it is actually **excluded** from .sdist package, I
think it should be either removed from the repo or included in
-sources.tar.gz - generally ASF project should not depend on any project
which has unknown licence.

3) At least in Airflow we are using `shasum -a 512 FILE` and it produces
SHASUM + name of the file, which I think is a good idea to have in .asc
file. Also something that can be improved in the future.

The Shasum are good, but when I diff on what shasum produces, we have this:

<
77ad9cf9ddf508645d094ae18efce76482ff86339ffd2cd9dfe46af5d0545bdfa949c00ccc7beb3f6ae5f2c65523cc1a3db9a7425921c86fde5c4d54eb893111
 apache_burr-0.41.0-py3-none-any.whl
---
>
77ad9cf9ddf508645d094ae18efce76482ff86339ffd2cd9dfe46af5d0545bdfa949c00ccc7beb3f6ae5f2c65523cc1a3db9a7425921c86fde5c4d54eb893111
Checking apache-burr-0.41.0-incubating.tar.gz.sha512
1c1
<
2e755584eb71fcede377d92f67024e3694cee4729da55e8b8d5b8739388c9046438e40cd2428003cca1e11a7b40abb897371d608db1ce3c0638d266c3de2c50a
 apache-burr-0.41.0-incubating.tar.gz
---
>
2e755584eb71fcede377d92f67024e3694cee4729da55e8b8d5b8739388c9046438e40cd2428003cca1e11a7b40abb897371d608db1ce3c0638d266c3de2c50a

4) files with unknown licences in the .sdist file (since it looks like
-sources). This is also quite hard -1 because of the .tpl file.

There are a number of files with unapproved licenses (I unpacket the
.tar.gz and downloaded and ran the
https://dist.apache.org/repos/dist/release/creadur/apache-rat-0.17/ on the
directory). While I understand why .jsonl files do not have licence (json
cannot contain comments), the best way to deal with that is to add
.rat-excludes file in your repo - see Airflow one [5] and make it part of
the source package. This way you can add -E .rat-excludes and it will
exclude those files from check. The .tpl file seems to be a JINJA template
and those files allow for comments and can easily embed license information
that will be excluded in the final generated json file.

! Unapproved:         23    A count of unapproved licenses.
! /burr/tracking/server/demo_data/demo_chatbot/chat-1-giraffe/log.jsonl
! /burr/tracking/server/demo_data/demo_chatbot/chat-2-geography/log.jsonl
! /burr/tracking/server/demo_data/demo_chatbot/chat-3-physics/log.jsonl
! /burr/tracking/server/demo_data/demo_chatbot/chat-4-philosophy/log.jsonl
! /burr/tracking/server/demo_data/demo_chatbot/chat-5-jokes/log.jsonl
!
/burr/tracking/server/demo_data/demo_chatbot/chat-6-demonstrate-errors/log.jsonl
!
/burr/tracking/server/demo_data/demo_chatbot_with_traces/chat-1-giraffe/log.jsonl
!
/burr/tracking/server/demo_data/demo_chatbot_with_traces/chat-2-geography/log.jsonl
!
/burr/tracking/server/demo_data/demo_chatbot_with_traces/chat-3-physics/log.jsonl
!
/burr/tracking/server/demo_data/demo_chatbot_with_traces/chat-4-philosophy/log.jsonl
!
/burr/tracking/server/demo_data/demo_chatbot_with_traces/chat-5-jokes/log.jsonl
!
/burr/tracking/server/demo_data/demo_chatbot_with_traces/chat-6-demonstrate-errors/log.jsonl
!
/burr/tracking/server/demo_data/demo_conversational-rag/rag-1-food/log.jsonl
!
/burr/tracking/server/demo_data/demo_conversational-rag/rag-2-work-history/log.jsonl
!
/burr/tracking/server/demo_data/demo_conversational-rag/rag-3-activities/log.jsonl
!
/burr/tracking/server/demo_data/demo_conversational-rag/rag-4-everything/log.jsonl
! /burr/tracking/server/demo_data/demo_counter/count-to-1/log.jsonl
! /burr/tracking/server/demo_data/demo_counter/count-to-10/log.jsonl
! /burr/tracking/server/demo_data/demo_counter/count-to-100/log.jsonl
! /burr/tracking/server/demo_data/demo_counter/count-to-42/log.jsonl
! /burr/tracking/server/demo_data/demo_counter/count-to-50/log.jsonl
!
/burr/tracking/server/s3/deployment/terraform/templates/ecs/burr_app.json.tpl

5) Bad naming of `sdist` file.

I am not sure how you produced the .sdist file (again - no release
instructions) but when I tried to build it and compare what's in my .sdist
and your .sdist, I got it quite different because name of my package (tried
it with flit, hatch and build packages) is (correctly)
*apache_burr-0.41.0-incubating.tar.gz* and yours was
*apache-burr-0.41.0-incubating.tar.gz*. We used to have the same in Airflow
and it caused us some serious problems when it comes to links to our .sdist
packages, and general difference of .whl vs. sdist. **Some** old tooling
used to produce such names (old setuptools and old filt) but this since has
been properly implemented by both. The thing is that the .sdist package
name SHOULD be normalized to contain the distribution name normalized -
which replaces all sequences of "_-." with a single "_" and lowercase [6]
(unlike package names in PyPI, this follows the Binary wheel naming
normalization which uses "_" rather than "-" in package name [7].

6) Easier setup of the env

I noticed some small issue with the env when preparing the release -
missing `cli` extra when setting up the venv to build release. I fixed it
in [8] - also proposed a small addition of dev dependency group (might
split it if needed) and proposed that you might use some more modern
standardised features of packaging like dependency groups and inline script
metadata. See details in the PR - we can discuss it there.

6) Reproducibility from sources:

I tried to rebuild both .sdist and .whl package following the instructions
and initially I have not compiled the UI and got them missing (of course) -
I understand that full automation with custom build hook is deferred for
later (which is OK) - but (as expected) the files in the package have
different mtime. This can be easily fixed with hard-coding the
SOURCE_DATE_EPOCH variable before the build [9] and since you are already
using instructions and scripts, that should be an easy addition in your
docs. In airflow we have a prek commit that automatically regenerates the
date when release notes change but at the beginning the mtime to be used
can be simply hard-coded to basically any date. This way whoever follows
your release process will have it closer to a truly reproducible package
and diffoscope will start showing useful diffs in case there are some [10]

Summary of things:

MUST
* .tpl licence adding - 4)
* explain (or likely remove) the .gitmodule BentoBurr reference - 2)
* explicit rules in docs about why you exclude certain files from source
package - 4)
* separate -source.tar.gz package with all files including docs and likely
all files (subject to rules about exclusion above) 1)

SHOULD:
* proper naming of sdist artifacts (with _) (needs newer flit simply and
doc update) - 5)
* add .rat-excludes that will allow to use RAT to verify the official
source packages 5)

NICE TO HAVE:
* shasum with filename - 3)
* simplify the env setup with inline metadata, dev dependency groups
(support for those already in uv, hatch and others) - 6)
* reproducibility setup 7)



[1] Debate about whether "tests" and "docs" should be included in .sdist
https://discuss.python.org/t/should-sdists-include-docs-and-tests/14578/26
[2] What should be included in source packages of ASF -
https://www.apache.org/legal/release-policy.html#source-packages
[3] Example email where Airflow PMC explicitly pointed to .sdist packages
being "source" packages (see the description of .sdist files)
https://lists.apache.org/thread/8ob972qkd7sy6k1pn5nskc2x0yjx2t2y
[4] The .gitattributes file in Airflow repo
https://github.com/apache/airflow/blob/main/.gitattributes
[5] RAT excludes in Airflow repo
https://github.com/apache/airflow/blob/main/.rat-excludes
[6] PEP-625 Filename of a Source Distribution -
https://peps.python.org/pep-0625/
[7] Binary packages distribution name normalization -
https://packaging.python.org/en/latest/specifications/binary-distribution-format/#escaping-and-unicode
[8] PR to fix missing cli extra and improving dev-env to use it
https://github.com/apache/burr/pull/604
[9] Flit reproducibility https://flit.pypa.io/en/stable/reproducible.html
[10] Diffoscope - tool to show reproducibility issues
https://diffoscope.org/

J.




On Sun, Nov 30, 2025 at 5:02 AM Elijah ben Izzy <
[email protected]> wrote:

> Hi all! Trying again!
>
>
> This is a call for a vote on releasing Apache Burr 0.41.0-incubating
> Release Candidate 2.
>
> This release includes the following changes (see CHANGELOG for details).
> See all commits since prior release:
> - https://github.com/apache/burr/compare/burr-0.40.2...main
>
> Key changes include:
> - pool-based async PG persister
> - multiple UI updates
> - Apache compatible licenses/build processes
> - bug fixes, typing, etc...
>
> The artifacts for this release candidate can be found at:
>
> https://dist.apache.org/repos/dist/dev/incubator/burr/0.41.0-incubating-RC2/
>
> The Git tag to be voted upon is: v0.41.0
>
> The release hash is 11783ba58f8c5bd161118976ced791a2f5bd78f3
>
> Release artifacts are signed with the following key:
> BB8B72B34AB9A664A109AA17A76CF4C80E4E5355
> The KEYS file is available at:
> https://downloads.apache.org/incubator/burr/KEYS
>
> Please download, verify, and test the release candidate. For testing use
> your best judgement. The following may suffice:
>
> 1. Build/run the UI following the instructions in scripts/README.md
> 2. Run the tests in tests/
> 3. Import into a jupyter notebook and play around
>
> Highly encourage you to pip install from source, run `burr` and play with
> the UI (some UI bugs I recently discovered will be filed)
>
> The vote will run for a minimum of 72 hours.
> Please vote:
>
> [ ] +1 Release this package as Apache Burr 0.41.0-incubating
> [ ] +0 No opinion
> [ ] -1 Do not release this package because... (Please provide a reason)
>
> Checklist for reference:
> [ ] Download links are valid.
> [ ] Checksums and signatures.
> [ ] LICENSE/NOTICE files exist
> [ ] No unexpected binary files
> [ ] All source files have ASF headers
> [ ] Can compile from source
>
> On behalf of the Apache Burr PPMC,
>
> Elijah ben Izzy ([email protected])
>

Reply via email to