Nice! Really clean and unambiguous. Best, Jernej
> On 24 Dec 2025, at 06:43, Stefan Krawczyk <[email protected]> wrote: > > Looks good to me! > >> On Wed, Dec 24, 2025, 7:53 AM Elijah ben Izzy < >> [email protected]> wrote: >> >> @Jarek + others -- I wanted to get your take on this. I'm pushing to get it >> out soon (finally have a bit more focus time) and I drafted this. What does >> the pod think about what is/isn't source? >> >> I'm generally happy with this, but I think the key part is that this is >> documented/clear. See table at the top: >> >> >> https://github.com/apache/burr/blob/20f7790dd60cec8e397184cfe0b2aaa564f49f48/scripts/README.md >> >> >>> On Sun, Nov 30, 2025 at 12:58 PM Jarek Potiuk <[email protected]> wrote: >>> >>> Yep. All for moving forward fast even if it means some of the things that >>> are none-MUST will be deferred. I tried to move things to the >>> "MUST/SHOULD/NICE" in the way to make it clear what is the highest >> priority >>> - I think you get it mostly right. the "_" thing is really making sure >> that >>> the right tools (flit in this case) have some min-versions - recently >>> enough to produce good naming). >>> >>> This is one more thing that I should have stressed - I think **some** >> part >>> of it is the way you make sure the tooling is recent enough. Approach for >>> Airflow is that we have ALWAYS min version of a dependency set. >> "Released 6 >>> months ago" is a good "rule of thumb". I think that if you follow the >> "env >>> setup" and the env has some "min versions" - that solves at least some >> part >>> of the issues I found in the release. The "_" in the name is definitely >>> going to be fixed if min-version is set appropriately. >>> >>> J. >>> >>> >>> On Sun, Nov 30, 2025 at 8:27 PM Elijah ben Izzy < >>> [email protected]> wrote: >>> >>>> @Jarek -- thanks, this is very clear (and absolutely worth getting >> right >>>> even if it means a delay in release). >>>> >>>> I want to sum up just to make sure I understand the high-level -- here >>> are >>>> the themes I'm picking up: >>>> >>>> 1. License on everything -- JSON is the exception but that's why we >> have >>>> .rat excludes (tpl looked like another JSON which I think is why I >> missed >>>> it) >>>> 2. Underscores versus dashes for consistency to avoid trouble later on >>>> 3. A consistent, documented internal opinion on what *is* and *isn't* >>>> source >>>> 4. Clean up weird stuff (I.E. the bento burr submodule, that probably >>>> should just be removed from the repo) >>>> 5. Anything to make development + verification easier on the developer >>> (and >>>> any downstream consumer of the source) >>>> 6. More documentation overall >>>> >>>> >>>> I think (3) and (6) are pretty big value-adds, I.E. where I should >> focus >>>> some time. High-level, for this project, I want to throw this out: >>>> 1. Docs are *not* source -- not included in distribution >>>> 2. Tests *are* source -- why? These let the developer download + run >> in a >>>> self-verification attempt >>>> >>>> This is pretty justifiable IMO: >>>> >>>> *Our best practice across the various projects I maintain is to always >>> run >>>> the tests from the source on the installed wheel. For each Python >> version >>>> and platform in our CI matrix, we do a clean sdist and wheel build >> (with >>>> build, of course), install the wheel in a clean env, and then run the >>> tests >>>> from the source against that, using either a src dir, tox and/or python >>> -I, >>>> plus pytest --import-mode=importlib, to ensure isolation from the >> source >>>> tree and we’re always using the installed copy. It isn’t as important >>> that >>>> the tests themselves work when packaged for end-distribution, but >> rather >>>> than the code under test works.* >>>> >>>> Going to take a bit of time later/this week to prep this and might >> reach >>>> out with more questions. Otherwise I'll also be reading over the >>> resources >>>> to ensure that nothing slipped up. >>>> >>>> Cheers, >>>> Elijah >>>> >>>> On Sun, Nov 30, 2025 at 8:31 AM Jarek Potiuk <[email protected]> wrote: >>>> >>>>> -1 for now, sorry. >>>>> >>>>> Reviewed: >>>>> >>>>> * signatures OK >>>>> * checksums OK >>>>> * licences NOK >>>>> * reproducibility from sources >>>>> >>>>> >>>>> I think there is the .gitmodule problem that should be solved, also >>> lack >>>> of >>>>> -source.tar.gz explicitly is not really good I think.. >>>>> >>>>> Several reasons: >>>>> >>>>> 1) Lack of explicit source package (this is "almost -1" for me, >> because >>>>> formally speaking the .sdist package is fulfilling the letter of the >>>> source >>>>> package, but IMHO it does not necessarily fulfills the spirit. >>>>> >>>>> I think it's not very clear which package is "source" and which are >>>>> "convenience/binary" packages. From what I see, the .tar.gz is >>>> **something >>>>> between** source package and the .sdist. It **looks** like an sdist >>>> package >>>>> (with PKG_INFO) - but also it contains "tests" - which is unusual for >>>> sdist >>>>> packages (however there is a big debate about it [1]). The >> requirement >>>> for >>>>> "source" packages published by the ASF is that it contains all the >>>> sources >>>>> needed to build code and tests [2] (which your .sdist file has, so >>> that's >>>>> cool) - it seems to some extent it follows the expectation. I think >> it >>>> must >>>>> be clear which of the packages is "-source" one and naming it like >> that >>>> and >>>>> keeping it separate from .sdist is a good idea. >>>>> >>>>> We also in Airflow - for quite a while - took some of our .sdist >> files >>> as >>>>> "source" releases when we released only some of the distributions >> that >>>> are >>>>> part of the monorepo. When we did it in the past - in Airflow we >>>>> explicitly mentioned in our emails that those .sdist packages are the >>>>> "source" packages as expected by the ASF [3] . But eventually we >>>> entirely >>>>> gave up on it (a few weeks ago) , because we opted in to include >>>>> essentially **everything** that is in the source repo of ours (we are >>>>> essentially using git archive to produce the source-tar.gz). The main >>>>> reason was that if we **only** release .sdist, some of our important >>> code >>>>> (such as sources for docs) were not published when we released only >>>>> .sdists. >>>>> >>>>> The .sdist of yours misses quite a number of files from the repo: >>>>> >>>>> * big number of examples >>>>> * docs sources - I think this is an important miss - while docs are >>>>> * telemetry folder >>>>> * .github and .gitmodules (are those gitmodules necessary to build >> the >>>>> project?) >>>>> >>>>> It's likely that those files are excluded deliberately and something >>> that >>>>> you do not **want** to release at all, but I find it a bit strange to >>>>> remove docs and many examples, It seems that those who unpack sources >>>> from >>>>> the official source package, cannot do all the same things as people >>> who >>>>> check it out from repo TAG . If someone takes it as "source" and >> never >>>>> looks at the GitHub repo - they will miss important sources (like >> docs >>>>> sources) that IMHO is something that the users **should** have. >>> Generally >>>>> users should be able to do the same with the "-source.tar.gz" as what >>>> they >>>>> can when they do `git checkout TAG` in your repo. >>>>> >>>>> The AI-generated (undoubtedly but that's ok ;) doducmentation in >>>> README.md >>>>> describes what goes in and out but it does not explain WHY. I think >> if >>>> you >>>>> **really** want to exclude some files from your source distribution >> you >>>>> should explain WHY in the documentation. >>>>> >>>>> Just to add a bit of context. You might think that the >> "-source.tar.gz" >>>>> file is not that important, as nearly nobody will use it. Which is a >>> fair >>>>> assesment ("nearly nobody") - but those who do are the important >> users >>> - >>>>> those are downstream packagers, who might want to include burr in >>> distros >>>>> for example. Many of the distros that are out there use the >> officially >>>>> signed and checksummed packages to build and install their packages. >>> For >>>>> example this is what conda might want to do. Or Debian maintainers. >>> Those >>>>> are important users and we need to make sure that they can do it >>> easily. >>>>> That's the safest bet to produce explicitly "-source.tar.gz" as a >> "git >>>>> archive" result IMHO - and not exclude things that you would normally >>>>> commit to the repo (note that you can have generated code committed >> to >>>> your >>>>> repo - and there is "no compiled code in your repo" - so that would >>>>> probably be the only thing to exclude (if your build process rebuilds >>>> those >>>>> generated files automatically). This can be done via .gitattributes >> [4] >>>> in >>>>> airflow. >>>>> >>>>> 2) The .gitmodules thing is the final reason why I gave -1. I am not >>>> sure - >>>>> it's not clear- if BentoBurr mentioned whether it is needed to build >>> the >>>>> project or not. This project is not only archived, but also misses >>>> LICENCE >>>>> information, so while it is actually **excluded** from .sdist >> package, >>> I >>>>> think it should be either removed from the repo or included in >>>>> -sources.tar.gz - generally ASF project should not depend on any >>> project >>>>> which has unknown licence. >>>>> >>>>> 3) At least in Airflow we are using `shasum -a 512 FILE` and it >>> produces >>>>> SHASUM + name of the file, which I think is a good idea to have in >> .asc >>>>> file. Also something that can be improved in the future. >>>>> >>>>> The Shasum are good, but when I diff on what shasum produces, we have >>>> this: >>>>> >>>>> < >>>>> >>>>> >>>> >>> >> 77ad9cf9ddf508645d094ae18efce76482ff86339ffd2cd9dfe46af5d0545bdfa949c00ccc7beb3f6ae5f2c65523cc1a3db9a7425921c86fde5c4d54eb893111 >>>>> apache_burr-0.41.0-py3-none-any.whl >>>>> --- >>>>>> >>>>> >>>>> >>>> >>> >> 77ad9cf9ddf508645d094ae18efce76482ff86339ffd2cd9dfe46af5d0545bdfa949c00ccc7beb3f6ae5f2c65523cc1a3db9a7425921c86fde5c4d54eb893111 >>>>> Checking apache-burr-0.41.0-incubating.tar.gz.sha512 >>>>> 1c1 >>>>> < >>>>> >>>>> >>>> >>> >> 2e755584eb71fcede377d92f67024e3694cee4729da55e8b8d5b8739388c9046438e40cd2428003cca1e11a7b40abb897371d608db1ce3c0638d266c3de2c50a >>>>> apache-burr-0.41.0-incubating.tar.gz >>>>> --- >>>>>> >>>>> >>>>> >>>> >>> >> 2e755584eb71fcede377d92f67024e3694cee4729da55e8b8d5b8739388c9046438e40cd2428003cca1e11a7b40abb897371d608db1ce3c0638d266c3de2c50a >>>>> >>>>> 4) files with unknown licences in the .sdist file (since it looks >> like >>>>> -sources). This is also quite hard -1 because of the .tpl file. >>>>> >>>>> There are a number of files with unapproved licenses (I unpacket the >>>>> .tar.gz and downloaded and ran the >>>>> https://dist.apache.org/repos/dist/release/creadur/apache-rat-0.17/ >> on >>>> the >>>>> directory). While I understand why .jsonl files do not have licence >>> (json >>>>> cannot contain comments), the best way to deal with that is to add >>>>> .rat-excludes file in your repo - see Airflow one [5] and make it >> part >>> of >>>>> the source package. This way you can add -E .rat-excludes and it will >>>>> exclude those files from check. The .tpl file seems to be a JINJA >>>> template >>>>> and those files allow for comments and can easily embed license >>>> information >>>>> that will be excluded in the final generated json file. >>>>> >>>>> ! Unapproved: 23 A count of unapproved licenses. >>>>> ! >> /burr/tracking/server/demo_data/demo_chatbot/chat-1-giraffe/log.jsonl >>>>> ! >>> /burr/tracking/server/demo_data/demo_chatbot/chat-2-geography/log.jsonl >>>>> ! >> /burr/tracking/server/demo_data/demo_chatbot/chat-3-physics/log.jsonl >>>>> ! >>>> >> /burr/tracking/server/demo_data/demo_chatbot/chat-4-philosophy/log.jsonl >>>>> ! /burr/tracking/server/demo_data/demo_chatbot/chat-5-jokes/log.jsonl >>>>> ! >>>>> >>>>> >>>> >>> >> /burr/tracking/server/demo_data/demo_chatbot/chat-6-demonstrate-errors/log.jsonl >>>>> ! >>>>> >>>>> >>>> >>> >> /burr/tracking/server/demo_data/demo_chatbot_with_traces/chat-1-giraffe/log.jsonl >>>>> ! >>>>> >>>>> >>>> >>> >> /burr/tracking/server/demo_data/demo_chatbot_with_traces/chat-2-geography/log.jsonl >>>>> ! >>>>> >>>>> >>>> >>> >> /burr/tracking/server/demo_data/demo_chatbot_with_traces/chat-3-physics/log.jsonl >>>>> ! >>>>> >>>>> >>>> >>> >> /burr/tracking/server/demo_data/demo_chatbot_with_traces/chat-4-philosophy/log.jsonl >>>>> ! >>>>> >>>>> >>>> >>> >> /burr/tracking/server/demo_data/demo_chatbot_with_traces/chat-5-jokes/log.jsonl >>>>> ! >>>>> >>>>> >>>> >>> >> /burr/tracking/server/demo_data/demo_chatbot_with_traces/chat-6-demonstrate-errors/log.jsonl >>>>> ! >>>>> >>>>> >>>> >>> >> /burr/tracking/server/demo_data/demo_conversational-rag/rag-1-food/log.jsonl >>>>> ! >>>>> >>>>> >>>> >>> >> /burr/tracking/server/demo_data/demo_conversational-rag/rag-2-work-history/log.jsonl >>>>> ! >>>>> >>>>> >>>> >>> >> /burr/tracking/server/demo_data/demo_conversational-rag/rag-3-activities/log.jsonl >>>>> ! >>>>> >>>>> >>>> >>> >> /burr/tracking/server/demo_data/demo_conversational-rag/rag-4-everything/log.jsonl >>>>> ! /burr/tracking/server/demo_data/demo_counter/count-to-1/log.jsonl >>>>> ! /burr/tracking/server/demo_data/demo_counter/count-to-10/log.jsonl >>>>> ! /burr/tracking/server/demo_data/demo_counter/count-to-100/log.jsonl >>>>> ! /burr/tracking/server/demo_data/demo_counter/count-to-42/log.jsonl >>>>> ! /burr/tracking/server/demo_data/demo_counter/count-to-50/log.jsonl >>>>> ! >>>>> >>>>> >>>> >>> >> /burr/tracking/server/s3/deployment/terraform/templates/ecs/burr_app.json.tpl >>>>> >>>>> 5) Bad naming of `sdist` file. >>>>> >>>>> I am not sure how you produced the .sdist file (again - no release >>>>> instructions) but when I tried to build it and compare what's in my >>>> .sdist >>>>> and your .sdist, I got it quite different because name of my package >>>> (tried >>>>> it with flit, hatch and build packages) is (correctly) >>>>> *apache_burr-0.41.0-incubating.tar.gz* and yours was >>>>> *apache-burr-0.41.0-incubating.tar.gz*. We used to have the same in >>>> Airflow >>>>> and it caused us some serious problems when it comes to links to our >>>> .sdist >>>>> packages, and general difference of .whl vs. sdist. **Some** old >>> tooling >>>>> used to produce such names (old setuptools and old filt) but this >> since >>>> has >>>>> been properly implemented by both. The thing is that the .sdist >> package >>>>> name SHOULD be normalized to contain the distribution name >> normalized - >>>>> which replaces all sequences of "_-." with a single "_" and lowercase >>> [6] >>>>> (unlike package names in PyPI, this follows the Binary wheel naming >>>>> normalization which uses "_" rather than "-" in package name [7]. >>>>> >>>>> 6) Easier setup of the env >>>>> >>>>> I noticed some small issue with the env when preparing the release - >>>>> missing `cli` extra when setting up the venv to build release. I >> fixed >>> it >>>>> in [8] - also proposed a small addition of dev dependency group >> (might >>>>> split it if needed) and proposed that you might use some more modern >>>>> standardised features of packaging like dependency groups and inline >>>> script >>>>> metadata. See details in the PR - we can discuss it there. >>>>> >>>>> 6) Reproducibility from sources: >>>>> >>>>> I tried to rebuild both .sdist and .whl package following the >>>> instructions >>>>> and initially I have not compiled the UI and got them missing (of >>>> course) - >>>>> I understand that full automation with custom build hook is deferred >>> for >>>>> later (which is OK) - but (as expected) the files in the package have >>>>> different mtime. This can be easily fixed with hard-coding the >>>>> SOURCE_DATE_EPOCH variable before the build [9] and since you are >>> already >>>>> using instructions and scripts, that should be an easy addition in >> your >>>>> docs. In airflow we have a prek commit that automatically regenerates >>> the >>>>> date when release notes change but at the beginning the mtime to be >>> used >>>>> can be simply hard-coded to basically any date. This way whoever >>> follows >>>>> your release process will have it closer to a truly reproducible >>> package >>>>> and diffoscope will start showing useful diffs in case there are some >>>> [10] >>>>> >>>>> Summary of things: >>>>> >>>>> MUST >>>>> * .tpl licence adding - 4) >>>>> * explain (or likely remove) the .gitmodule BentoBurr reference - 2) >>>>> * explicit rules in docs about why you exclude certain files from >>> source >>>>> package - 4) >>>>> * separate -source.tar.gz package with all files including docs and >>>> likely >>>>> all files (subject to rules about exclusion above) 1) >>>>> >>>>> SHOULD: >>>>> * proper naming of sdist artifacts (with _) (needs newer flit simply >>> and >>>>> doc update) - 5) >>>>> * add .rat-excludes that will allow to use RAT to verify the official >>>>> source packages 5) >>>>> >>>>> NICE TO HAVE: >>>>> * shasum with filename - 3) >>>>> * simplify the env setup with inline metadata, dev dependency groups >>>>> (support for those already in uv, hatch and others) - 6) >>>>> * reproducibility setup 7) >>>>> >>>>> >>>>> >>>>> [1] Debate about whether "tests" and "docs" should be included in >>> .sdist >>>>> >>>> >>> >> https://discuss.python.org/t/should-sdists-include-docs-and-tests/14578/26 >>>>> [2] What should be included in source packages of ASF - >>>>> https://www.apache.org/legal/release-policy.html#source-packages >>>>> [3] Example email where Airflow PMC explicitly pointed to .sdist >>> packages >>>>> being "source" packages (see the description of .sdist files) >>>>> https://lists.apache.org/thread/8ob972qkd7sy6k1pn5nskc2x0yjx2t2y >>>>> [4] The .gitattributes file in Airflow repo >>>>> https://github.com/apache/airflow/blob/main/.gitattributes >>>>> [5] RAT excludes in Airflow repo >>>>> https://github.com/apache/airflow/blob/main/.rat-excludes >>>>> [6] PEP-625 Filename of a Source Distribution - >>>>> https://peps.python.org/pep-0625/ >>>>> [7] Binary packages distribution name normalization - >>>>> >>>>> >>>> >>> >> https://packaging.python.org/en/latest/specifications/binary-distribution-format/#escaping-and-unicode >>>>> [8] PR to fix missing cli extra and improving dev-env to use it >>>>> https://github.com/apache/burr/pull/604 >>>>> [9] Flit reproducibility >>>> https://flit.pypa.io/en/stable/reproducible.html >>>>> [10] Diffoscope - tool to show reproducibility issues >>>>> https://diffoscope.org/ >>>>> >>>>> J. >>>>> >>>>> >>>>> >>>>> >>>>> On Sun, Nov 30, 2025 at 5:02 AM Elijah ben Izzy < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi all! Trying again! >>>>>> >>>>>> >>>>>> This is a call for a vote on releasing Apache Burr >> 0.41.0-incubating >>>>>> Release Candidate 2. >>>>>> >>>>>> This release includes the following changes (see CHANGELOG for >>>> details). >>>>>> See all commits since prior release: >>>>>> - https://github.com/apache/burr/compare/burr-0.40.2...main >>>>>> >>>>>> Key changes include: >>>>>> - pool-based async PG persister >>>>>> - multiple UI updates >>>>>> - Apache compatible licenses/build processes >>>>>> - bug fixes, typing, etc... >>>>>> >>>>>> The artifacts for this release candidate can be found at: >>>>>> >>>>>> >>>>> >>>> >>> >> https://dist.apache.org/repos/dist/dev/incubator/burr/0.41.0-incubating-RC2/ >>>>>> >>>>>> The Git tag to be voted upon is: v0.41.0 >>>>>> >>>>>> The release hash is 11783ba58f8c5bd161118976ced791a2f5bd78f3 >>>>>> >>>>>> Release artifacts are signed with the following key: >>>>>> BB8B72B34AB9A664A109AA17A76CF4C80E4E5355 >>>>>> The KEYS file is available at: >>>>>> https://downloads.apache.org/incubator/burr/KEYS >>>>>> >>>>>> Please download, verify, and test the release candidate. For >> testing >>>> use >>>>>> your best judgement. The following may suffice: >>>>>> >>>>>> 1. Build/run the UI following the instructions in scripts/README.md >>>>>> 2. Run the tests in tests/ >>>>>> 3. Import into a jupyter notebook and play around >>>>>> >>>>>> Highly encourage you to pip install from source, run `burr` and >> play >>>> with >>>>>> the UI (some UI bugs I recently discovered will be filed) >>>>>> >>>>>> The vote will run for a minimum of 72 hours. >>>>>> Please vote: >>>>>> >>>>>> [ ] +1 Release this package as Apache Burr 0.41.0-incubating >>>>>> [ ] +0 No opinion >>>>>> [ ] -1 Do not release this package because... (Please provide a >>> reason) >>>>>> >>>>>> Checklist for reference: >>>>>> [ ] Download links are valid. >>>>>> [ ] Checksums and signatures. >>>>>> [ ] LICENSE/NOTICE files exist >>>>>> [ ] No unexpected binary files >>>>>> [ ] All source files have ASF headers >>>>>> [ ] Can compile from source >>>>>> >>>>>> On behalf of the Apache Burr PPMC, >>>>>> >>>>>> Elijah ben Izzy ([email protected]) >>>>>> >>>>> >>>> >>> >>
