Thanks Joe and nice to have you here <3
On Wed, Jul 9, 2025 at 2:52 PM Joe Drumgoole <j...@joedrumgoole.com> wrote: > Hi, > > I have recently been looking at Otava and getting it to support the latest > versions of Python. Everything works fine up to Python 3.10.x and > signal-processing-algorithms 1.3.5. > > However Python 3.11 changes the signature of Random.shuffle() ( > https://docs.python.org/3.10/library/random.html#random.shuffle) which > breaks signal-processing-algorithms 1.3.5. To fix this we have to upgrade > to at least signal-processing-algorithms 2.x. > > Unfortunately the 2.x series of the library has made substantial changes to > the API. As a result this breaks the existing Otava code. > > Unfortunately there isn't anyone from the MongoDB team here, nor am I aware of any new blogs or articles where they would explain how they themselves used these. But ok, they did provide a README. If only we had someone on this project who knows/likes math :-D So, it seems they have added two algorithms. First algorithm they added, Energy Statistics, is used to compare two different samples/distributions. This could be feature branch v main branch, or 7.0 vs 8.0 or whatever. It turns out datastax independently added a similar feature which is now in apache otava: https://github.com/apache/otava/blob/master/docs/BASICS.md#validating-performance-of-a-feature-branch As far as I can tell, the way Piotr implemented this for Otava, it simply appends the benchmark results from the other branch and then runs e-divisive as usual. No need for a separate algorithm. The second algorithm they added is an outlier detection algorithm. It's unclear to me what you would use outlier detection for, the whole point of e-divisive is that it is good at ignoring outliers already. As for the e-divisive implementation, last time I looked it was unchanged except for refactoring, and in particular they do not incorporate significant improvements made by Piotr/Datastax. So if we focus only on e-divisive use, I would say the MongoDB 2.0 version is far behind what we have in Otava. Main improvements in Otava would be using Student's T-test for significance check, restricting the analysis to a recent window, and incremental e-divisive (perf optimization). These are covered in https://dl.acm.org/doi/10.1145/3578244.3583719 In other words, I would be inclined to just keep the 1.3.5 code we have for e-divisive, and take over active maintainership of it, given that MongoDB is expected to work on the significantly refactored 2.0 version, (I'm open to arguments why the algorithms in 2.0 are useful, just not aware of any.) Note that some of the code added by Piotr and also myself into Otava, would more naturally belong in the core signal processing library and lives in otava purely due to organizational boundaries. > I am looking for guidance on a way forward. Out options are: > > * Rewrite Otava solely to support the new library API. This would mean > removing support for users with versions older than 3.11. > * Write a wrapper API for both versions. This will support older versions > of Python, but requires some additional coding and that wrapper has to be > maintained. > * Fork the existing 1.3.5 library and make the changes to support 3.11. We > can decide if we want to make the fork backwards compatible with older > versions and it isolates us from further changes to > signal-processing-algorithms. > * Some other option I may not have thought of. > > The way I see it we are already deviated from the active MongoDB branch (and they may internally be working on something else completely, leaving the 2.0 version essentially zombie already). So from my point of view we will live happily ever after with the 1.3.5 code, and upgrade all of it to the newest python versions. Major users (like Alex) may want to wish for specific, not so new, python versions to be supported. henrik