Hi,

I'd be grateful for suggestions/opinions about the API of
scipy.interpolate. We currently rely on the FITPACK library to construct
smoothing splines. The user-facing API is `splrep` and `UnivariateSpline`.
There's a project to replicate/rework what FITPACK is doing, with two goals:
- bring the equivalent functionality to CuPy
- break the FITPACK monolith to make it easier to add enhancements to ---
see, for instance https://github.com/scipy/scipy/issues/2579 for feature
request going into its second decade of shelf life.

Specifically, here's a "stack" of three PRs aiming to replicate the spline
smoothing of `splrep` and `UnivariateSpline`. The split is mainly to keep
diff size manageable:

- https://github.com/scipy/scipy/pull/19753 brings in linalg primitives (a
version of the banded QR factorization)
- https://github.com/scipy/scipy/pull/19873 adds a generator to construct
successive knot vectors
- https://github.com/scipy/scipy/pull/19970 includes the previous two, and
adds an equivalent of `splrep`.

Detailed code reviews of these PRs would of course be greatly appreciated.
What I'd like to discuss here is the end API. The current state is this. We
are adding two objects:

1. `generate_knots(x, y, k, s)` is a python generator which yields
successive knot vectors for a given smoothing parameter `s`. The generator
is finite, and the last value is what `splrep` would return (*). The use of
the generator comes from https://github.com/scipy/scipy/issues/2579, only
here we yield knot vectors, not splines. This way, we can decouple
constructing the knots from smoothing criteria.

The generator has two stopping criteria: either required smoothness has
been reached (the deviation of the LSQ spline from data is less than the
input `s`), or the desired number of knots have been placed. The latter
functionality is present in the Fortran FITPACK, but effectively disabled
in the python-to-fortran wrappers.

(*) fine print: the knots are *almost* equivalent to what FITPACK returns.
The core procedure is to split the data into some intervals, and place new
knots into intervals with the maximum deviation. Now, if you have two
intervals with nearly identical deviation, which of them is "maximal" boils
down to numerical noise --- and this one may differ in Fortran and Python.

2. `make_splrep(x, y, k, s, t=None) -> BSpline` function. It accepts the
same parameters as `splrep`, and returns a BSpline instance. If the knots,
t, are not provided, it calls `generate_knots`, so we get the same result
as `splrep(x, y, k=k, s=s)`.
If knots are specified, it just uses them to construct the smoothing spline
with the same procedure that FITPACK uses.

So the first question is the new `make_splrep` name. Because of (*) above,
just calling the new function `splrep` has backwards compat issues.
Therefore, a new function, and the plan is to declare `splrep` and
`UnivariateSpline` legacy and leave them be, at least for the time being.

An alternative is, of course, to make a gradual transition along the lines
of https://github.com/scipy/scipy/issues/19896#issuecomment-1898608008 ---
hide the change behind a variable or an env var to switch "backends", and
make the change from the legacy Fortran backend to the new one over several
releases.

Thoughts?

Evgeni

P.S. If you are using `splrep` or `UnivariateSpline`, it'd be super helpful
if you could try `make_splrep` on your data. I would be very very
interested in how it fares, whether the results are identical, and if they
are not, how bad/serious the difference is.
_______________________________________________
SciPy-Dev mailing list -- scipy-dev@python.org
To unsubscribe send an email to scipy-dev-le...@python.org
https://mail.python.org/mailman3/lists/scipy-dev.python.org/
Member address: arch...@mail-archive.com

Reply via email to