Hi, I'd be grateful for suggestions/opinions about the API of scipy.interpolate. We currently rely on the FITPACK library to construct smoothing splines. The user-facing API is `splrep` and `UnivariateSpline`. There's a project to replicate/rework what FITPACK is doing, with two goals: - bring the equivalent functionality to CuPy - break the FITPACK monolith to make it easier to add enhancements to --- see, for instance https://github.com/scipy/scipy/issues/2579 for feature request going into its second decade of shelf life.
Specifically, here's a "stack" of three PRs aiming to replicate the spline smoothing of `splrep` and `UnivariateSpline`. The split is mainly to keep diff size manageable: - https://github.com/scipy/scipy/pull/19753 brings in linalg primitives (a version of the banded QR factorization) - https://github.com/scipy/scipy/pull/19873 adds a generator to construct successive knot vectors - https://github.com/scipy/scipy/pull/19970 includes the previous two, and adds an equivalent of `splrep`. Detailed code reviews of these PRs would of course be greatly appreciated. What I'd like to discuss here is the end API. The current state is this. We are adding two objects: 1. `generate_knots(x, y, k, s)` is a python generator which yields successive knot vectors for a given smoothing parameter `s`. The generator is finite, and the last value is what `splrep` would return (*). The use of the generator comes from https://github.com/scipy/scipy/issues/2579, only here we yield knot vectors, not splines. This way, we can decouple constructing the knots from smoothing criteria. The generator has two stopping criteria: either required smoothness has been reached (the deviation of the LSQ spline from data is less than the input `s`), or the desired number of knots have been placed. The latter functionality is present in the Fortran FITPACK, but effectively disabled in the python-to-fortran wrappers. (*) fine print: the knots are *almost* equivalent to what FITPACK returns. The core procedure is to split the data into some intervals, and place new knots into intervals with the maximum deviation. Now, if you have two intervals with nearly identical deviation, which of them is "maximal" boils down to numerical noise --- and this one may differ in Fortran and Python. 2. `make_splrep(x, y, k, s, t=None) -> BSpline` function. It accepts the same parameters as `splrep`, and returns a BSpline instance. If the knots, t, are not provided, it calls `generate_knots`, so we get the same result as `splrep(x, y, k=k, s=s)`. If knots are specified, it just uses them to construct the smoothing spline with the same procedure that FITPACK uses. So the first question is the new `make_splrep` name. Because of (*) above, just calling the new function `splrep` has backwards compat issues. Therefore, a new function, and the plan is to declare `splrep` and `UnivariateSpline` legacy and leave them be, at least for the time being. An alternative is, of course, to make a gradual transition along the lines of https://github.com/scipy/scipy/issues/19896#issuecomment-1898608008 --- hide the change behind a variable or an env var to switch "backends", and make the change from the legacy Fortran backend to the new one over several releases. Thoughts? Evgeni P.S. If you are using `splrep` or `UnivariateSpline`, it'd be super helpful if you could try `make_splrep` on your data. I would be very very interested in how it fares, whether the results are identical, and if they are not, how bad/serious the difference is.
_______________________________________________ SciPy-Dev mailing list -- scipy-dev@python.org To unsubscribe send an email to scipy-dev-le...@python.org https://mail.python.org/mailman3/lists/scipy-dev.python.org/ Member address: arch...@mail-archive.com