Nice! Really clean and unambiguous.

Best,
Jernej

> On 24 Dec 2025, at 06:43, Stefan Krawczyk <[email protected]> wrote:
> 
> Looks good to me!
> 
>> On Wed, Dec 24, 2025, 7:53 AM Elijah ben Izzy <
>> [email protected]> wrote:
>> 
>> @Jarek + others -- I wanted to get your take on this. I'm pushing to get it
>> out soon (finally have a bit more focus time) and I drafted this. What does
>> the pod think about what is/isn't source?
>> 
>> I'm generally happy with this, but I think the key part is that this is
>> documented/clear. See table at the top:
>> 
>> 
>> https://github.com/apache/burr/blob/20f7790dd60cec8e397184cfe0b2aaa564f49f48/scripts/README.md
>> 
>> 
>>> On Sun, Nov 30, 2025 at 12:58 PM Jarek Potiuk <[email protected]> wrote:
>>> 
>>> Yep. All for moving forward fast even if it means some of the things that
>>> are none-MUST will be deferred.  I tried to move things to the
>>> "MUST/SHOULD/NICE" in the way to make it clear what is the highest
>> priority
>>> - I think you get it mostly right. the "_" thing is really making sure
>> that
>>> the right tools (flit in this case) have some min-versions - recently
>>> enough to produce good naming).
>>> 
>>> This is one more thing that I should have stressed - I think **some**
>> part
>>> of it is the way you make sure the tooling is recent enough. Approach for
>>> Airflow is that we have ALWAYS min version of a dependency set.
>> "Released 6
>>> months ago" is a good "rule of thumb". I think that if you follow the
>> "env
>>> setup" and the env has some "min versions" - that solves at least some
>> part
>>> of the issues I found in the release. The "_" in the name is definitely
>>> going to be fixed if min-version is set appropriately.
>>> 
>>> J.
>>> 
>>> 
>>> On Sun, Nov 30, 2025 at 8:27 PM Elijah ben Izzy <
>>> [email protected]> wrote:
>>> 
>>>> @Jarek -- thanks, this is very clear (and absolutely worth getting
>> right
>>>> even if it means a delay in release).
>>>> 
>>>> I want to sum up just to make sure I understand the high-level -- here
>>> are
>>>> the themes I'm picking up:
>>>> 
>>>> 1. License on everything -- JSON is the exception but that's why we
>> have
>>>> .rat excludes (tpl looked like another JSON which I think is why I
>> missed
>>>> it)
>>>> 2. Underscores versus dashes for consistency to avoid trouble later on
>>>> 3. A consistent, documented internal opinion on what *is* and *isn't*
>>>> source
>>>> 4. Clean up weird stuff (I.E. the bento burr submodule, that probably
>>>> should just be removed from the repo)
>>>> 5. Anything to make development + verification easier on the developer
>>> (and
>>>> any downstream consumer of the source)
>>>> 6. More documentation overall
>>>> 
>>>> 
>>>> I think (3) and (6) are pretty big value-adds, I.E. where I should
>> focus
>>>> some time. High-level, for this project, I want to throw this out:
>>>> 1. Docs are *not* source -- not included in distribution
>>>> 2. Tests *are* source -- why? These let the developer download + run
>> in a
>>>> self-verification attempt
>>>> 
>>>> This is pretty justifiable IMO:
>>>> 
>>>> *Our best practice across the various projects I maintain is to always
>>> run
>>>> the tests from the source on the installed wheel. For each Python
>> version
>>>> and platform in our CI matrix, we do a clean sdist and wheel build
>> (with
>>>> build, of course), install the wheel in a clean env, and then run the
>>> tests
>>>> from the source against that, using either a src dir, tox and/or python
>>> -I,
>>>> plus pytest --import-mode=importlib, to ensure isolation from the
>> source
>>>> tree and we’re always using the installed copy. It isn’t as important
>>> that
>>>> the tests themselves work when packaged for end-distribution, but
>> rather
>>>> than the code under test works.*
>>>> 
>>>> Going to take a bit of time later/this week to prep this and might
>> reach
>>>> out with more questions. Otherwise I'll also be reading over the
>>> resources
>>>> to ensure that nothing slipped up.
>>>> 
>>>> Cheers,
>>>> Elijah
>>>> 
>>>> On Sun, Nov 30, 2025 at 8:31 AM Jarek Potiuk <[email protected]> wrote:
>>>> 
>>>>> -1 for now, sorry.
>>>>> 
>>>>> Reviewed:
>>>>> 
>>>>> * signatures OK
>>>>> * checksums  OK
>>>>> * licences NOK
>>>>> * reproducibility from sources
>>>>> 
>>>>> 
>>>>> I think there is the .gitmodule problem that should be solved, also
>>> lack
>>>> of
>>>>> -source.tar.gz explicitly is not really good I think..
>>>>> 
>>>>> Several reasons:
>>>>> 
>>>>> 1) Lack of explicit source package (this is "almost -1" for me,
>> because
>>>>> formally speaking the .sdist package is fulfilling the letter of the
>>>> source
>>>>> package, but IMHO it does not necessarily fulfills the spirit.
>>>>> 
>>>>> I think it's not very clear which package is "source" and which are
>>>>> "convenience/binary" packages. From what I see, the .tar.gz is
>>>> **something
>>>>> between** source package and the .sdist. It **looks** like an sdist
>>>> package
>>>>> (with PKG_INFO) - but also it contains "tests" - which is unusual for
>>>> sdist
>>>>> packages (however there is a big debate about it  [1]). The
>> requirement
>>>> for
>>>>> "source" packages published by the ASF is that it contains all the
>>>> sources
>>>>> needed to build code and tests [2] (which your .sdist file has, so
>>> that's
>>>>> cool) - it seems to some extent it follows the expectation. I think
>> it
>>>> must
>>>>> be clear which of the packages is "-source" one and naming it like
>> that
>>>> and
>>>>> keeping it separate from .sdist is a good idea.
>>>>> 
>>>>> We also in Airflow - for quite a while - took some of our .sdist
>> files
>>> as
>>>>> "source" releases when we released only some of the distributions
>> that
>>>> are
>>>>> part of the monorepo.  When we did it in the past -  in Airflow we
>>>>> explicitly mentioned in our emails that those .sdist packages are the
>>>>> "source" packages as expected by the ASF [3] .  But eventually we
>>>> entirely
>>>>> gave up on it (a few weeks ago) , because we opted in to include
>>>>> essentially **everything** that is in the source repo of ours (we are
>>>>> essentially using git archive to produce the source-tar.gz). The main
>>>>> reason was that if we **only** release .sdist, some of our important
>>> code
>>>>> (such as sources for docs) were not published when we released only
>>>>> .sdists.
>>>>> 
>>>>> The .sdist of yours misses quite a number of files from the repo:
>>>>> 
>>>>> * big number of  examples
>>>>> * docs sources - I think this is an important miss - while docs are
>>>>> * telemetry folder
>>>>> * .github and .gitmodules (are those gitmodules necessary to build
>> the
>>>>> project?)
>>>>> 
>>>>> It's likely that those files are excluded deliberately and something
>>> that
>>>>> you do not **want** to release at all, but I find it a bit strange to
>>>>> remove docs and many examples, It seems that those who unpack sources
>>>> from
>>>>> the official source package, cannot do all the same things as people
>>> who
>>>>> check it out from repo TAG . If someone takes it as "source" and
>> never
>>>>> looks at the GitHub repo - they will miss important sources (like
>> docs
>>>>> sources) that IMHO is something that the users **should** have.
>>> Generally
>>>>> users should be able to do the same with the "-source.tar.gz" as what
>>>> they
>>>>> can when they do `git checkout TAG` in your repo.
>>>>> 
>>>>> The AI-generated (undoubtedly but that's ok ;)  doducmentation in
>>>> README.md
>>>>> describes what goes in and out but it does not explain WHY. I think
>> if
>>>> you
>>>>> **really** want to exclude some files from your source distribution
>> you
>>>>> should explain WHY in the documentation.
>>>>> 
>>>>> Just to add a bit of context. You might think that the
>> "-source.tar.gz"
>>>>> file is not that important, as nearly nobody will use it. Which is a
>>> fair
>>>>> assesment ("nearly nobody") - but those who do are the important
>> users
>>> -
>>>>> those are downstream packagers, who might want to include burr in
>>> distros
>>>>> for example. Many of the distros that are out there use the
>> officially
>>>>> signed and checksummed packages to build and install their packages.
>>> For
>>>>> example this is what conda might want to do. Or Debian maintainers.
>>> Those
>>>>> are important users and we need to make sure that they can do it
>>> easily.
>>>>> That's the safest bet to produce explicitly "-source.tar.gz" as a
>> "git
>>>>> archive" result IMHO - and not exclude things that you would normally
>>>>> commit to the repo (note that you can have generated code committed
>> to
>>>> your
>>>>> repo - and there is "no compiled code in your repo" - so that would
>>>>> probably be the only thing to exclude (if your build process rebuilds
>>>> those
>>>>> generated files automatically). This can be done via .gitattributes
>> [4]
>>>> in
>>>>> airflow.
>>>>> 
>>>>> 2) The .gitmodules thing is the final reason why I gave -1. I am not
>>>> sure -
>>>>> it's not clear- if BentoBurr mentioned whether it is needed to build
>>> the
>>>>> project or not. This project is not only archived, but also misses
>>>> LICENCE
>>>>> information, so while it is actually **excluded** from .sdist
>> package,
>>> I
>>>>> think it should be either removed from the repo or included in
>>>>> -sources.tar.gz - generally ASF project should not depend on any
>>> project
>>>>> which has unknown licence.
>>>>> 
>>>>> 3) At least in Airflow we are using `shasum -a 512 FILE` and it
>>> produces
>>>>> SHASUM + name of the file, which I think is a good idea to have in
>> .asc
>>>>> file. Also something that can be improved in the future.
>>>>> 
>>>>> The Shasum are good, but when I diff on what shasum produces, we have
>>>> this:
>>>>> 
>>>>> <
>>>>> 
>>>>> 
>>>> 
>>> 
>> 77ad9cf9ddf508645d094ae18efce76482ff86339ffd2cd9dfe46af5d0545bdfa949c00ccc7beb3f6ae5f2c65523cc1a3db9a7425921c86fde5c4d54eb893111
>>>>> apache_burr-0.41.0-py3-none-any.whl
>>>>> ---
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>> 77ad9cf9ddf508645d094ae18efce76482ff86339ffd2cd9dfe46af5d0545bdfa949c00ccc7beb3f6ae5f2c65523cc1a3db9a7425921c86fde5c4d54eb893111
>>>>> Checking apache-burr-0.41.0-incubating.tar.gz.sha512
>>>>> 1c1
>>>>> <
>>>>> 
>>>>> 
>>>> 
>>> 
>> 2e755584eb71fcede377d92f67024e3694cee4729da55e8b8d5b8739388c9046438e40cd2428003cca1e11a7b40abb897371d608db1ce3c0638d266c3de2c50a
>>>>> apache-burr-0.41.0-incubating.tar.gz
>>>>> ---
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>> 2e755584eb71fcede377d92f67024e3694cee4729da55e8b8d5b8739388c9046438e40cd2428003cca1e11a7b40abb897371d608db1ce3c0638d266c3de2c50a
>>>>> 
>>>>> 4) files with unknown licences in the .sdist file (since it looks
>> like
>>>>> -sources). This is also quite hard -1 because of the .tpl file.
>>>>> 
>>>>> There are a number of files with unapproved licenses (I unpacket the
>>>>> .tar.gz and downloaded and ran the
>>>>> https://dist.apache.org/repos/dist/release/creadur/apache-rat-0.17/
>> on
>>>> the
>>>>> directory). While I understand why .jsonl files do not have licence
>>> (json
>>>>> cannot contain comments), the best way to deal with that is to add
>>>>> .rat-excludes file in your repo - see Airflow one [5] and make it
>> part
>>> of
>>>>> the source package. This way you can add -E .rat-excludes and it will
>>>>> exclude those files from check. The .tpl file seems to be a JINJA
>>>> template
>>>>> and those files allow for comments and can easily embed license
>>>> information
>>>>> that will be excluded in the final generated json file.
>>>>> 
>>>>> ! Unapproved:         23    A count of unapproved licenses.
>>>>> !
>> /burr/tracking/server/demo_data/demo_chatbot/chat-1-giraffe/log.jsonl
>>>>> !
>>> /burr/tracking/server/demo_data/demo_chatbot/chat-2-geography/log.jsonl
>>>>> !
>> /burr/tracking/server/demo_data/demo_chatbot/chat-3-physics/log.jsonl
>>>>> !
>>>> 
>> /burr/tracking/server/demo_data/demo_chatbot/chat-4-philosophy/log.jsonl
>>>>> ! /burr/tracking/server/demo_data/demo_chatbot/chat-5-jokes/log.jsonl
>>>>> !
>>>>> 
>>>>> 
>>>> 
>>> 
>> /burr/tracking/server/demo_data/demo_chatbot/chat-6-demonstrate-errors/log.jsonl
>>>>> !
>>>>> 
>>>>> 
>>>> 
>>> 
>> /burr/tracking/server/demo_data/demo_chatbot_with_traces/chat-1-giraffe/log.jsonl
>>>>> !
>>>>> 
>>>>> 
>>>> 
>>> 
>> /burr/tracking/server/demo_data/demo_chatbot_with_traces/chat-2-geography/log.jsonl
>>>>> !
>>>>> 
>>>>> 
>>>> 
>>> 
>> /burr/tracking/server/demo_data/demo_chatbot_with_traces/chat-3-physics/log.jsonl
>>>>> !
>>>>> 
>>>>> 
>>>> 
>>> 
>> /burr/tracking/server/demo_data/demo_chatbot_with_traces/chat-4-philosophy/log.jsonl
>>>>> !
>>>>> 
>>>>> 
>>>> 
>>> 
>> /burr/tracking/server/demo_data/demo_chatbot_with_traces/chat-5-jokes/log.jsonl
>>>>> !
>>>>> 
>>>>> 
>>>> 
>>> 
>> /burr/tracking/server/demo_data/demo_chatbot_with_traces/chat-6-demonstrate-errors/log.jsonl
>>>>> !
>>>>> 
>>>>> 
>>>> 
>>> 
>> /burr/tracking/server/demo_data/demo_conversational-rag/rag-1-food/log.jsonl
>>>>> !
>>>>> 
>>>>> 
>>>> 
>>> 
>> /burr/tracking/server/demo_data/demo_conversational-rag/rag-2-work-history/log.jsonl
>>>>> !
>>>>> 
>>>>> 
>>>> 
>>> 
>> /burr/tracking/server/demo_data/demo_conversational-rag/rag-3-activities/log.jsonl
>>>>> !
>>>>> 
>>>>> 
>>>> 
>>> 
>> /burr/tracking/server/demo_data/demo_conversational-rag/rag-4-everything/log.jsonl
>>>>> ! /burr/tracking/server/demo_data/demo_counter/count-to-1/log.jsonl
>>>>> ! /burr/tracking/server/demo_data/demo_counter/count-to-10/log.jsonl
>>>>> ! /burr/tracking/server/demo_data/demo_counter/count-to-100/log.jsonl
>>>>> ! /burr/tracking/server/demo_data/demo_counter/count-to-42/log.jsonl
>>>>> ! /burr/tracking/server/demo_data/demo_counter/count-to-50/log.jsonl
>>>>> !
>>>>> 
>>>>> 
>>>> 
>>> 
>> /burr/tracking/server/s3/deployment/terraform/templates/ecs/burr_app.json.tpl
>>>>> 
>>>>> 5) Bad naming of `sdist` file.
>>>>> 
>>>>> I am not sure how you produced the .sdist file (again - no release
>>>>> instructions) but when I tried to build it and compare what's in my
>>>> .sdist
>>>>> and your .sdist, I got it quite different because name of my package
>>>> (tried
>>>>> it with flit, hatch and build packages) is (correctly)
>>>>> *apache_burr-0.41.0-incubating.tar.gz* and yours was
>>>>> *apache-burr-0.41.0-incubating.tar.gz*. We used to have the same in
>>>> Airflow
>>>>> and it caused us some serious problems when it comes to links to our
>>>> .sdist
>>>>> packages, and general difference of .whl vs. sdist. **Some** old
>>> tooling
>>>>> used to produce such names (old setuptools and old filt) but this
>> since
>>>> has
>>>>> been properly implemented by both. The thing is that the .sdist
>> package
>>>>> name SHOULD be normalized to contain the distribution name
>> normalized -
>>>>> which replaces all sequences of "_-." with a single "_" and lowercase
>>> [6]
>>>>> (unlike package names in PyPI, this follows the Binary wheel naming
>>>>> normalization which uses "_" rather than "-" in package name [7].
>>>>> 
>>>>> 6) Easier setup of the env
>>>>> 
>>>>> I noticed some small issue with the env when preparing the release -
>>>>> missing `cli` extra when setting up the venv to build release. I
>> fixed
>>> it
>>>>> in [8] - also proposed a small addition of dev dependency group
>> (might
>>>>> split it if needed) and proposed that you might use some more modern
>>>>> standardised features of packaging like dependency groups and inline
>>>> script
>>>>> metadata. See details in the PR - we can discuss it there.
>>>>> 
>>>>> 6) Reproducibility from sources:
>>>>> 
>>>>> I tried to rebuild both .sdist and .whl package following the
>>>> instructions
>>>>> and initially I have not compiled the UI and got them missing (of
>>>> course) -
>>>>> I understand that full automation with custom build hook is deferred
>>> for
>>>>> later (which is OK) - but (as expected) the files in the package have
>>>>> different mtime. This can be easily fixed with hard-coding the
>>>>> SOURCE_DATE_EPOCH variable before the build [9] and since you are
>>> already
>>>>> using instructions and scripts, that should be an easy addition in
>> your
>>>>> docs. In airflow we have a prek commit that automatically regenerates
>>> the
>>>>> date when release notes change but at the beginning the mtime to be
>>> used
>>>>> can be simply hard-coded to basically any date. This way whoever
>>> follows
>>>>> your release process will have it closer to a truly reproducible
>>> package
>>>>> and diffoscope will start showing useful diffs in case there are some
>>>> [10]
>>>>> 
>>>>> Summary of things:
>>>>> 
>>>>> MUST
>>>>> * .tpl licence adding - 4)
>>>>> * explain (or likely remove) the .gitmodule BentoBurr reference - 2)
>>>>> * explicit rules in docs about why you exclude certain files from
>>> source
>>>>> package - 4)
>>>>> * separate -source.tar.gz package with all files including docs and
>>>> likely
>>>>> all files (subject to rules about exclusion above) 1)
>>>>> 
>>>>> SHOULD:
>>>>> * proper naming of sdist artifacts (with _) (needs newer flit simply
>>> and
>>>>> doc update) - 5)
>>>>> * add .rat-excludes that will allow to use RAT to verify the official
>>>>> source packages 5)
>>>>> 
>>>>> NICE TO HAVE:
>>>>> * shasum with filename - 3)
>>>>> * simplify the env setup with inline metadata, dev dependency groups
>>>>> (support for those already in uv, hatch and others) - 6)
>>>>> * reproducibility setup 7)
>>>>> 
>>>>> 
>>>>> 
>>>>> [1] Debate about whether "tests" and "docs" should be included in
>>> .sdist
>>>>> 
>>>> 
>>> 
>> https://discuss.python.org/t/should-sdists-include-docs-and-tests/14578/26
>>>>> [2] What should be included in source packages of ASF -
>>>>> https://www.apache.org/legal/release-policy.html#source-packages
>>>>> [3] Example email where Airflow PMC explicitly pointed to .sdist
>>> packages
>>>>> being "source" packages (see the description of .sdist files)
>>>>> https://lists.apache.org/thread/8ob972qkd7sy6k1pn5nskc2x0yjx2t2y
>>>>> [4] The .gitattributes file in Airflow repo
>>>>> https://github.com/apache/airflow/blob/main/.gitattributes
>>>>> [5] RAT excludes in Airflow repo
>>>>> https://github.com/apache/airflow/blob/main/.rat-excludes
>>>>> [6] PEP-625 Filename of a Source Distribution -
>>>>> https://peps.python.org/pep-0625/
>>>>> [7] Binary packages distribution name normalization -
>>>>> 
>>>>> 
>>>> 
>>> 
>> https://packaging.python.org/en/latest/specifications/binary-distribution-format/#escaping-and-unicode
>>>>> [8] PR to fix missing cli extra and improving dev-env to use it
>>>>> https://github.com/apache/burr/pull/604
>>>>> [9] Flit reproducibility
>>>> https://flit.pypa.io/en/stable/reproducible.html
>>>>> [10] Diffoscope - tool to show reproducibility issues
>>>>> https://diffoscope.org/
>>>>> 
>>>>> J.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Sun, Nov 30, 2025 at 5:02 AM Elijah ben Izzy <
>>>>> [email protected]> wrote:
>>>>> 
>>>>>> Hi all! Trying again!
>>>>>> 
>>>>>> 
>>>>>> This is a call for a vote on releasing Apache Burr
>> 0.41.0-incubating
>>>>>> Release Candidate 2.
>>>>>> 
>>>>>> This release includes the following changes (see CHANGELOG for
>>>> details).
>>>>>> See all commits since prior release:
>>>>>> - https://github.com/apache/burr/compare/burr-0.40.2...main
>>>>>> 
>>>>>> Key changes include:
>>>>>> - pool-based async PG persister
>>>>>> - multiple UI updates
>>>>>> - Apache compatible licenses/build processes
>>>>>> - bug fixes, typing, etc...
>>>>>> 
>>>>>> The artifacts for this release candidate can be found at:
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://dist.apache.org/repos/dist/dev/incubator/burr/0.41.0-incubating-RC2/
>>>>>> 
>>>>>> The Git tag to be voted upon is: v0.41.0
>>>>>> 
>>>>>> The release hash is 11783ba58f8c5bd161118976ced791a2f5bd78f3
>>>>>> 
>>>>>> Release artifacts are signed with the following key:
>>>>>> BB8B72B34AB9A664A109AA17A76CF4C80E4E5355
>>>>>> The KEYS file is available at:
>>>>>> https://downloads.apache.org/incubator/burr/KEYS
>>>>>> 
>>>>>> Please download, verify, and test the release candidate. For
>> testing
>>>> use
>>>>>> your best judgement. The following may suffice:
>>>>>> 
>>>>>> 1. Build/run the UI following the instructions in scripts/README.md
>>>>>> 2. Run the tests in tests/
>>>>>> 3. Import into a jupyter notebook and play around
>>>>>> 
>>>>>> Highly encourage you to pip install from source, run `burr` and
>> play
>>>> with
>>>>>> the UI (some UI bugs I recently discovered will be filed)
>>>>>> 
>>>>>> The vote will run for a minimum of 72 hours.
>>>>>> Please vote:
>>>>>> 
>>>>>> [ ] +1 Release this package as Apache Burr 0.41.0-incubating
>>>>>> [ ] +0 No opinion
>>>>>> [ ] -1 Do not release this package because... (Please provide a
>>> reason)
>>>>>> 
>>>>>> Checklist for reference:
>>>>>> [ ] Download links are valid.
>>>>>> [ ] Checksums and signatures.
>>>>>> [ ] LICENSE/NOTICE files exist
>>>>>> [ ] No unexpected binary files
>>>>>> [ ] All source files have ASF headers
>>>>>> [ ] Can compile from source
>>>>>> 
>>>>>> On behalf of the Apache Burr PPMC,
>>>>>> 
>>>>>> Elijah ben Izzy ([email protected])
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 

Reply via email to