@Jarek -- thanks, this is very clear (and absolutely worth getting right even if it means a delay in release).
I want to sum up just to make sure I understand the high-level -- here are the themes I'm picking up: 1. License on everything -- JSON is the exception but that's why we have .rat excludes (tpl looked like another JSON which I think is why I missed it) 2. Underscores versus dashes for consistency to avoid trouble later on 3. A consistent, documented internal opinion on what *is* and *isn't* source 4. Clean up weird stuff (I.E. the bento burr submodule, that probably should just be removed from the repo) 5. Anything to make development + verification easier on the developer (and any downstream consumer of the source) 6. More documentation overall I think (3) and (6) are pretty big value-adds, I.E. where I should focus some time. High-level, for this project, I want to throw this out: 1. Docs are *not* source -- not included in distribution 2. Tests *are* source -- why? These let the developer download + run in a self-verification attempt This is pretty justifiable IMO: *Our best practice across the various projects I maintain is to always run the tests from the source on the installed wheel. For each Python version and platform in our CI matrix, we do a clean sdist and wheel build (with build, of course), install the wheel in a clean env, and then run the tests from the source against that, using either a src dir, tox and/or python -I, plus pytest --import-mode=importlib, to ensure isolation from the source tree and we’re always using the installed copy. It isn’t as important that the tests themselves work when packaged for end-distribution, but rather than the code under test works.* Going to take a bit of time later/this week to prep this and might reach out with more questions. Otherwise I'll also be reading over the resources to ensure that nothing slipped up. Cheers, Elijah On Sun, Nov 30, 2025 at 8:31 AM Jarek Potiuk <[email protected]> wrote: > -1 for now, sorry. > > Reviewed: > > * signatures OK > * checksums OK > * licences NOK > * reproducibility from sources > > > I think there is the .gitmodule problem that should be solved, also lack of > -source.tar.gz explicitly is not really good I think.. > > Several reasons: > > 1) Lack of explicit source package (this is "almost -1" for me, because > formally speaking the .sdist package is fulfilling the letter of the source > package, but IMHO it does not necessarily fulfills the spirit. > > I think it's not very clear which package is "source" and which are > "convenience/binary" packages. From what I see, the .tar.gz is **something > between** source package and the .sdist. It **looks** like an sdist package > (with PKG_INFO) - but also it contains "tests" - which is unusual for sdist > packages (however there is a big debate about it [1]). The requirement for > "source" packages published by the ASF is that it contains all the sources > needed to build code and tests [2] (which your .sdist file has, so that's > cool) - it seems to some extent it follows the expectation. I think it must > be clear which of the packages is "-source" one and naming it like that and > keeping it separate from .sdist is a good idea. > > We also in Airflow - for quite a while - took some of our .sdist files as > "source" releases when we released only some of the distributions that are > part of the monorepo. When we did it in the past - in Airflow we > explicitly mentioned in our emails that those .sdist packages are the > "source" packages as expected by the ASF [3] . But eventually we entirely > gave up on it (a few weeks ago) , because we opted in to include > essentially **everything** that is in the source repo of ours (we are > essentially using git archive to produce the source-tar.gz). The main > reason was that if we **only** release .sdist, some of our important code > (such as sources for docs) were not published when we released only > .sdists. > > The .sdist of yours misses quite a number of files from the repo: > > * big number of examples > * docs sources - I think this is an important miss - while docs are > * telemetry folder > * .github and .gitmodules (are those gitmodules necessary to build the > project?) > > It's likely that those files are excluded deliberately and something that > you do not **want** to release at all, but I find it a bit strange to > remove docs and many examples, It seems that those who unpack sources from > the official source package, cannot do all the same things as people who > check it out from repo TAG . If someone takes it as "source" and never > looks at the GitHub repo - they will miss important sources (like docs > sources) that IMHO is something that the users **should** have. Generally > users should be able to do the same with the "-source.tar.gz" as what they > can when they do `git checkout TAG` in your repo. > > The AI-generated (undoubtedly but that's ok ;) doducmentation in README.md > describes what goes in and out but it does not explain WHY. I think if you > **really** want to exclude some files from your source distribution you > should explain WHY in the documentation. > > Just to add a bit of context. You might think that the "-source.tar.gz" > file is not that important, as nearly nobody will use it. Which is a fair > assesment ("nearly nobody") - but those who do are the important users - > those are downstream packagers, who might want to include burr in distros > for example. Many of the distros that are out there use the officially > signed and checksummed packages to build and install their packages. For > example this is what conda might want to do. Or Debian maintainers. Those > are important users and we need to make sure that they can do it easily. > That's the safest bet to produce explicitly "-source.tar.gz" as a "git > archive" result IMHO - and not exclude things that you would normally > commit to the repo (note that you can have generated code committed to your > repo - and there is "no compiled code in your repo" - so that would > probably be the only thing to exclude (if your build process rebuilds those > generated files automatically). This can be done via .gitattributes [4] in > airflow. > > 2) The .gitmodules thing is the final reason why I gave -1. I am not sure - > it's not clear- if BentoBurr mentioned whether it is needed to build the > project or not. This project is not only archived, but also misses LICENCE > information, so while it is actually **excluded** from .sdist package, I > think it should be either removed from the repo or included in > -sources.tar.gz - generally ASF project should not depend on any project > which has unknown licence. > > 3) At least in Airflow we are using `shasum -a 512 FILE` and it produces > SHASUM + name of the file, which I think is a good idea to have in .asc > file. Also something that can be improved in the future. > > The Shasum are good, but when I diff on what shasum produces, we have this: > > < > > 77ad9cf9ddf508645d094ae18efce76482ff86339ffd2cd9dfe46af5d0545bdfa949c00ccc7beb3f6ae5f2c65523cc1a3db9a7425921c86fde5c4d54eb893111 > apache_burr-0.41.0-py3-none-any.whl > --- > > > > 77ad9cf9ddf508645d094ae18efce76482ff86339ffd2cd9dfe46af5d0545bdfa949c00ccc7beb3f6ae5f2c65523cc1a3db9a7425921c86fde5c4d54eb893111 > Checking apache-burr-0.41.0-incubating.tar.gz.sha512 > 1c1 > < > > 2e755584eb71fcede377d92f67024e3694cee4729da55e8b8d5b8739388c9046438e40cd2428003cca1e11a7b40abb897371d608db1ce3c0638d266c3de2c50a > apache-burr-0.41.0-incubating.tar.gz > --- > > > > 2e755584eb71fcede377d92f67024e3694cee4729da55e8b8d5b8739388c9046438e40cd2428003cca1e11a7b40abb897371d608db1ce3c0638d266c3de2c50a > > 4) files with unknown licences in the .sdist file (since it looks like > -sources). This is also quite hard -1 because of the .tpl file. > > There are a number of files with unapproved licenses (I unpacket the > .tar.gz and downloaded and ran the > https://dist.apache.org/repos/dist/release/creadur/apache-rat-0.17/ on the > directory). While I understand why .jsonl files do not have licence (json > cannot contain comments), the best way to deal with that is to add > .rat-excludes file in your repo - see Airflow one [5] and make it part of > the source package. This way you can add -E .rat-excludes and it will > exclude those files from check. The .tpl file seems to be a JINJA template > and those files allow for comments and can easily embed license information > that will be excluded in the final generated json file. > > ! Unapproved: 23 A count of unapproved licenses. > ! /burr/tracking/server/demo_data/demo_chatbot/chat-1-giraffe/log.jsonl > ! /burr/tracking/server/demo_data/demo_chatbot/chat-2-geography/log.jsonl > ! /burr/tracking/server/demo_data/demo_chatbot/chat-3-physics/log.jsonl > ! /burr/tracking/server/demo_data/demo_chatbot/chat-4-philosophy/log.jsonl > ! /burr/tracking/server/demo_data/demo_chatbot/chat-5-jokes/log.jsonl > ! > > /burr/tracking/server/demo_data/demo_chatbot/chat-6-demonstrate-errors/log.jsonl > ! > > /burr/tracking/server/demo_data/demo_chatbot_with_traces/chat-1-giraffe/log.jsonl > ! > > /burr/tracking/server/demo_data/demo_chatbot_with_traces/chat-2-geography/log.jsonl > ! > > /burr/tracking/server/demo_data/demo_chatbot_with_traces/chat-3-physics/log.jsonl > ! > > /burr/tracking/server/demo_data/demo_chatbot_with_traces/chat-4-philosophy/log.jsonl > ! > > /burr/tracking/server/demo_data/demo_chatbot_with_traces/chat-5-jokes/log.jsonl > ! > > /burr/tracking/server/demo_data/demo_chatbot_with_traces/chat-6-demonstrate-errors/log.jsonl > ! > > /burr/tracking/server/demo_data/demo_conversational-rag/rag-1-food/log.jsonl > ! > > /burr/tracking/server/demo_data/demo_conversational-rag/rag-2-work-history/log.jsonl > ! > > /burr/tracking/server/demo_data/demo_conversational-rag/rag-3-activities/log.jsonl > ! > > /burr/tracking/server/demo_data/demo_conversational-rag/rag-4-everything/log.jsonl > ! /burr/tracking/server/demo_data/demo_counter/count-to-1/log.jsonl > ! /burr/tracking/server/demo_data/demo_counter/count-to-10/log.jsonl > ! /burr/tracking/server/demo_data/demo_counter/count-to-100/log.jsonl > ! /burr/tracking/server/demo_data/demo_counter/count-to-42/log.jsonl > ! /burr/tracking/server/demo_data/demo_counter/count-to-50/log.jsonl > ! > > /burr/tracking/server/s3/deployment/terraform/templates/ecs/burr_app.json.tpl > > 5) Bad naming of `sdist` file. > > I am not sure how you produced the .sdist file (again - no release > instructions) but when I tried to build it and compare what's in my .sdist > and your .sdist, I got it quite different because name of my package (tried > it with flit, hatch and build packages) is (correctly) > *apache_burr-0.41.0-incubating.tar.gz* and yours was > *apache-burr-0.41.0-incubating.tar.gz*. We used to have the same in Airflow > and it caused us some serious problems when it comes to links to our .sdist > packages, and general difference of .whl vs. sdist. **Some** old tooling > used to produce such names (old setuptools and old filt) but this since has > been properly implemented by both. The thing is that the .sdist package > name SHOULD be normalized to contain the distribution name normalized - > which replaces all sequences of "_-." with a single "_" and lowercase [6] > (unlike package names in PyPI, this follows the Binary wheel naming > normalization which uses "_" rather than "-" in package name [7]. > > 6) Easier setup of the env > > I noticed some small issue with the env when preparing the release - > missing `cli` extra when setting up the venv to build release. I fixed it > in [8] - also proposed a small addition of dev dependency group (might > split it if needed) and proposed that you might use some more modern > standardised features of packaging like dependency groups and inline script > metadata. See details in the PR - we can discuss it there. > > 6) Reproducibility from sources: > > I tried to rebuild both .sdist and .whl package following the instructions > and initially I have not compiled the UI and got them missing (of course) - > I understand that full automation with custom build hook is deferred for > later (which is OK) - but (as expected) the files in the package have > different mtime. This can be easily fixed with hard-coding the > SOURCE_DATE_EPOCH variable before the build [9] and since you are already > using instructions and scripts, that should be an easy addition in your > docs. In airflow we have a prek commit that automatically regenerates the > date when release notes change but at the beginning the mtime to be used > can be simply hard-coded to basically any date. This way whoever follows > your release process will have it closer to a truly reproducible package > and diffoscope will start showing useful diffs in case there are some [10] > > Summary of things: > > MUST > * .tpl licence adding - 4) > * explain (or likely remove) the .gitmodule BentoBurr reference - 2) > * explicit rules in docs about why you exclude certain files from source > package - 4) > * separate -source.tar.gz package with all files including docs and likely > all files (subject to rules about exclusion above) 1) > > SHOULD: > * proper naming of sdist artifacts (with _) (needs newer flit simply and > doc update) - 5) > * add .rat-excludes that will allow to use RAT to verify the official > source packages 5) > > NICE TO HAVE: > * shasum with filename - 3) > * simplify the env setup with inline metadata, dev dependency groups > (support for those already in uv, hatch and others) - 6) > * reproducibility setup 7) > > > > [1] Debate about whether "tests" and "docs" should be included in .sdist > https://discuss.python.org/t/should-sdists-include-docs-and-tests/14578/26 > [2] What should be included in source packages of ASF - > https://www.apache.org/legal/release-policy.html#source-packages > [3] Example email where Airflow PMC explicitly pointed to .sdist packages > being "source" packages (see the description of .sdist files) > https://lists.apache.org/thread/8ob972qkd7sy6k1pn5nskc2x0yjx2t2y > [4] The .gitattributes file in Airflow repo > https://github.com/apache/airflow/blob/main/.gitattributes > [5] RAT excludes in Airflow repo > https://github.com/apache/airflow/blob/main/.rat-excludes > [6] PEP-625 Filename of a Source Distribution - > https://peps.python.org/pep-0625/ > [7] Binary packages distribution name normalization - > > https://packaging.python.org/en/latest/specifications/binary-distribution-format/#escaping-and-unicode > [8] PR to fix missing cli extra and improving dev-env to use it > https://github.com/apache/burr/pull/604 > [9] Flit reproducibility https://flit.pypa.io/en/stable/reproducible.html > [10] Diffoscope - tool to show reproducibility issues > https://diffoscope.org/ > > J. > > > > > On Sun, Nov 30, 2025 at 5:02 AM Elijah ben Izzy < > [email protected]> wrote: > > > Hi all! Trying again! > > > > > > This is a call for a vote on releasing Apache Burr 0.41.0-incubating > > Release Candidate 2. > > > > This release includes the following changes (see CHANGELOG for details). > > See all commits since prior release: > > - https://github.com/apache/burr/compare/burr-0.40.2...main > > > > Key changes include: > > - pool-based async PG persister > > - multiple UI updates > > - Apache compatible licenses/build processes > > - bug fixes, typing, etc... > > > > The artifacts for this release candidate can be found at: > > > > > https://dist.apache.org/repos/dist/dev/incubator/burr/0.41.0-incubating-RC2/ > > > > The Git tag to be voted upon is: v0.41.0 > > > > The release hash is 11783ba58f8c5bd161118976ced791a2f5bd78f3 > > > > Release artifacts are signed with the following key: > > BB8B72B34AB9A664A109AA17A76CF4C80E4E5355 > > The KEYS file is available at: > > https://downloads.apache.org/incubator/burr/KEYS > > > > Please download, verify, and test the release candidate. For testing use > > your best judgement. The following may suffice: > > > > 1. Build/run the UI following the instructions in scripts/README.md > > 2. Run the tests in tests/ > > 3. Import into a jupyter notebook and play around > > > > Highly encourage you to pip install from source, run `burr` and play with > > the UI (some UI bugs I recently discovered will be filed) > > > > The vote will run for a minimum of 72 hours. > > Please vote: > > > > [ ] +1 Release this package as Apache Burr 0.41.0-incubating > > [ ] +0 No opinion > > [ ] -1 Do not release this package because... (Please provide a reason) > > > > Checklist for reference: > > [ ] Download links are valid. > > [ ] Checksums and signatures. > > [ ] LICENSE/NOTICE files exist > > [ ] No unexpected binary files > > [ ] All source files have ASF headers > > [ ] Can compile from source > > > > On behalf of the Apache Burr PPMC, > > > > Elijah ben Izzy ([email protected]) > > >
