On 04-Apr-19 1:52 PM, Ray Kinsella wrote:
On 04/04/2019 11:54, Bruce Richardson wrote:
On Thu, Apr 04, 2019 at 10:29:19AM +0100, Burakov, Anatoly wrote:
On 03-Apr-19 4:42 PM, Ray Kinsella wrote:
Hi folks,
[SNIP]
Hi Ray,
My somewhat rambly 2 cents :)
While i think some solution has to be found for the situation, we also have
to balance this against speed of development and new features rollout.
For example, let's consider what i am intimately familiar with - the memory
rework. I have made enormous efforts to ensure that pre-18.05 and post-18.05
remain as ABI/API compatible as possible, but there were a couple of API
calls that were removed, and there couldn't have been any replacements
(these API's were exposing internal structures that shouldn't have been
exposed in the first place), and 18.05 also broke the ABI compatibility,
because there was no way to do it without it (shared internal structures
needed to change in part to support multiprocess).
So, if i understand your proposal correctly, assuming a 2-year waiting
period for the deprecation of core API's, you would essentially still be
waiting for the memory rework to land for a year more. Moreover, even
*after* it has landed, there was a continuous stream of improvements and
bugfixes, some of which has broke ABI compatibility as well. Some of them
were my fault (as in, i could've foreseen the need for those changes, but
didn't), but others came as a result of people using these new features in
the wild and reporting issues/problems/suggestions - i am but one man, after
all. Plus, you know, there's only 24 hours in a day, and some stuff takes
time to implement :)
Since this rework goes right at the heart of DPDK (arguably there isn't a
more "core" API than memory!), there is no (sane) way in the universe to 1)
keep backwards compatibility for this, or 2) keep two parallel versions of
it. We also need to test all that, and, to be honest, one validation cycle
for a release wouldn't be enough to figure out all of the kinks and
implications of such a case. It was really great that memory rework has
landed in 18.05 and we had time to improve and prepare it for an 18.11 LTS -
i think everyone can say that it's in much better shape in 18.11 than it was
in 18.05, but if we couldn't do an ABI break here or there, this rate of
improvements would have slowed down significantly.
Now, i understand that this is probably a highly exceptional case, but i'm
sure that maintainers of other parts of DPDK will have their own examples of
similar things happening.
I have no idea what a proper solution would look like. Any "splitting" of
the trees into "experimental" vs. "stable" will end up causing the same
issue - people choose to use stable over experimental because, well, it's
more stable, and new/experimental features don't get tested as much because
no one runs the thing in the first place.
TL;DR we have to be careful not to constrain the pace of
development/bugfixing just for the sake of having a stable API/ABI :)
Actually, I think we *do* need to constrain the pace of development for the
sake of ABI stability. At this stage DPDK has been around for quite a
number of years and so should be considered a fairly mature project - it
should just start acting like it.
I 100% agree.
If you break your users stuff regularly enough, they will eventually
start looking around for an alternative that doesn't break their stuff
quiet so regularly.
We often use the pace of innovation in DPDK as justification for ABI/API
breakages, but that approach is a real rarity among the Open Source
community. I can't think of any mature project off-hand that share's it.
I would ask is Linux any less innovative because they offer a stable API
and have an absolute commitment to never breaking userspace? Would Linux
have ever been as popular as it is today it they broke userspace every
quarter?
They reality is that they (Linux) find workarounds and compromise
because there is an uber-maintainer Linus who had a strong ethos from
the start not to break their users stuff - we need the same ethos in DPDK.
Now, in terms of features like the memory rework, that is indeed a case
that there was no alternative other than a massive ABI break. However, for
that rework there was a strong need for improvement in that area that we
can make the case for an ABI break to support it - and it is of a scale
that nothing other than an ABI change would do. For other areas and
examples, I doubt there are many in the last couple of years that are of
that scale.
I would also be inclined to agree with Bruce's points on memory rework
was somewhat of an outlier, we don't see many like it.
My thoughts on the matter are:
1. I think we really need to do work to start hiding more of our data
structures - like what Stephen's latest RFC does. This hiding should reduce
the scope for ABI breaks.
2. Once done, I think we should commit to having an ABI break only in the
rarest of circumstances, and only with very large justification. I want us
to get to the point where DPDK releases can immediately be picked up by all
linux distros and rolled out because they are ABI compatible.
The work that Anatoly describes removing APIs that exposed internal
structures and Stephen H's RFC similarly are good examples of the kind
of work required to prepare for this change. We need to take a good look
at the API and reduce the number of unnecessary internal structures
exposed.
I never expected it going to to be a big bang - but is a definite
direction we need to move towards over the next few release.
...in this case, we have to think long and hard about the fabled EAL
rework/split, and in general *specifying* what is it that we want to
support, and the use cases that we want to target. Right now there is a
huge mountain of technical debt and kludges and workarounds that has
accumulated over the years, and it exists precisely because "every
change breaks someone's workflow".
For example, just in memory subsystem alone, we have legacy mem, because
some use cases require huge amounts of contiguous memory, and not
everyone is using VFIO; there's all of the 32-bit related workarounds
and hacks; there's the single-file-segments stuff that could have been
the default if not for the fact that we support kernels that don't
support fallocate(); there are two different ways of doing in-memory
mode, because not all kernels support memfd's; there is a gargantuan
pile of workarounds (and "known issues", and just code in general) all
over the DPDK codebase just to support our multiprocess model and all of
the various warts that come with it.
In fact, i would even go as far as to say that *most* of EAL ABI breaks
have been due to the fact that we store data in shared memory because of
multiprocess - so there is simply no way we can change these internal
data structures without ABI breaks, because even if they're not exposed
through user-facing API, they are still exposed by virtue of secondary
processes basically having an ABI contract with primary process instances.
So, if we are to cement our core API - we have to make a concrete effort
to specify what goes and what stays, if we want it to be maintainable.
The DPDK 1.0 specification, if you will :)
--
Thanks,
Anatoly