I reached out to a couple of people still at MongoDB. They may show up directly here later, but for now to the question about the differences between 2.x series and 1.3.5 I was told:
the main difference (as far as i remember) of 2.0 branch from 1.0 branch is > that it supports multivariate change point detection and improves the > distance calculation logic for permutation testing to make it faster. We > also cleaned up the 1.0 code in 2.0 to make it easier to read/understand > and got rid of interfaces that we were not using. I'm fairly confident that the multivariate code is not used in production, but there were some experiments in that direction to lower false positives and deal with the very large number of time series we had ended up tracking. That person is going to be out for a week. Hopefully they show up and say hello directly to the mailing list after that. I've Bcc'd them on this reply, so they will see at least this. David On Wed, Jul 9, 2025 at 2:47 PM Henrik Ingo <hen...@nyrkio.com> wrote: > Thanks Joe and nice to have you here <3 > > > > On Wed, Jul 9, 2025 at 2:52 PM Joe Drumgoole <j...@joedrumgoole.com> wrote: > > > Hi, > > > > I have recently been looking at Otava and getting it to support the > latest > > versions of Python. Everything works fine up to Python 3.10.x and > > signal-processing-algorithms 1.3.5. > > > > However Python 3.11 changes the signature of Random.shuffle() ( > > https://docs.python.org/3.10/library/random.html#random.shuffle) which > > breaks signal-processing-algorithms 1.3.5. To fix this we have to upgrade > > to at least signal-processing-algorithms 2.x. > > > > Unfortunately the 2.x series of the library has made substantial changes > to > > the API. As a result this breaks the existing Otava code. > > > > > Unfortunately there isn't anyone from the MongoDB team here, nor am I aware > of any new blogs or articles where they would explain how they themselves > used these. > > But ok, they did provide a README. If only we had someone on this project > who knows/likes math :-D > > So, it seems they have added two algorithms. First algorithm they added, > Energy Statistics, is used to compare two different samples/distributions. > This could be feature branch v main branch, or 7.0 vs 8.0 or whatever. It > turns out datastax independently added a similar feature which is now in > apache otava: > > https://github.com/apache/otava/blob/master/docs/BASICS.md#validating-performance-of-a-feature-branch > > As far as I can tell, the way Piotr implemented this for Otava, it simply > appends the benchmark results from the other branch and then runs > e-divisive as usual. No need for a separate algorithm. > > The second algorithm they added is an outlier detection algorithm. It's > unclear to me what you would use outlier detection for, the whole point of > e-divisive is that it is good at ignoring outliers already. > > As for the e-divisive implementation, last time I looked it was unchanged > except for refactoring, and in particular they do not incorporate > significant improvements made by Piotr/Datastax. So if we focus only on > e-divisive use, I would say the MongoDB 2.0 version is far behind what we > have in Otava. > > Main improvements in Otava would be using Student's T-test for significance > check, restricting the analysis to a recent window, and incremental > e-divisive (perf optimization). These are covered in > https://dl.acm.org/doi/10.1145/3578244.3583719 > > In other words, I would be inclined to just keep the 1.3.5 code we have for > e-divisive, and take over active maintainership of it, given that MongoDB > is expected to work on the significantly refactored 2.0 version, (I'm open > to arguments why the algorithms in 2.0 are useful, just not aware of any.) > > Note that some of the code added by Piotr and also myself into Otava, would > more naturally belong in the core signal processing library and lives in > otava purely due to organizational boundaries. > > > > > I am looking for guidance on a way forward. Out options are: > > > > * Rewrite Otava solely to support the new library API. This would mean > > removing support for users with versions older than 3.11. > > * Write a wrapper API for both versions. This will support older versions > > of Python, but requires some additional coding and that wrapper has to be > > maintained. > > * Fork the existing 1.3.5 library and make the changes to support 3.11. > We > > can decide if we want to make the fork backwards compatible with older > > versions and it isolates us from further changes to > > signal-processing-algorithms. > > * Some other option I may not have thought of. > > > > > The way I see it we are already deviated from the active MongoDB branch > (and they may internally be working on something else completely, leaving > the 2.0 version essentially zombie already). > > So from my point of view we will live happily ever after with the 1.3.5 > code, and upgrade all of it to the newest python versions. Major users > (like Alex) may want to wish for specific, not so new, python versions to > be supported. > > henrik >