On Wed, Dec 02, 2015 at 06:44:19AM -0500, Neil Horman wrote:
> Theres nothing "complex" about the simple fact that a project builds lots of
> libraries.  Its extreemely common. Any graphic window manager has exactly the
> same situation, as do any number of tools that have multiple hardware backends
> impelmented in user space (v4l, sane, iptables, to name just a few).
> 
> > Before I go into details, it would be nice if someone could please
> > explain why DPDK has to be "special" in needing to do this? I don't
> Its not special, see above.  Not saying the build environment cant be 
> improved,
> but the fact that there are multiple libraries is pretty straightforward.

It's fine in principle for an upstream to ship multiple shared
libraries, but it is extra and unnecessary work unless there's a
*reason* to have multiple shared libraries. What are the reasons for
DPDK?

> > In Debian and Ubuntu, we manage a library transition (an ABI bump in a
> > library together with all dependencies moving to use the new ABI) by
> > concurrently packaging both the old and new libraries at once. This
> > works well with the norm for libraries. We ship one binary package per
> > soname, with the major version as part of the package name. This allows
> > a system to have two (or more) ABIs installed simultaneously. For a
> > library transition, we just package the new version and then that can
> > land and work concurrently as we then individually update every
> > dependent (library-consuming) package.

> So thats, a distribution choice, not an upstream problem.

No, that's how shared libraries work. By design, multiple ABI versions
can be co-installed. That's why sonames have the ABI major version
inside them and the filenames reflect the sonames.

It is a distribution choice to exploit this capability. But it is an
upstream problem if this capability is broken.

By shipping multiple shared libraries, DPDK isn't breaking this
capability per se. But if the upstream expectation is that it's no
additional work for distributions because the multiple libraries can
just be bundled together into a single distribution package, then _this_
is what breaks the capability.

Instead DPDK needs to acknowledge that splitting libraries _does_ cause
additional packaging work for any distribution that wants to use the
multiple co-installed ABI feature of shared libraries as they are
designed.

Then, it becomes for upstream a question of the trade-off: does the
benefit of split libraries outweigh the extra work this creates on
packagers? To understand this, first I need to understand the rationale
for shipping multiple shared libaries specifically in DPDK, and I feel
that you (well, Red Hat) have yet to present a case.

>                                                            And it seems like a
> problem you should have already solved (note the examples above).  If you feel
> like you need to package multiple ABI versions in the same library, you can,
> just update the LIBABIVER of all the libraries, instead of the ones that truly
> change, so that each library is guaranteed a newer so version, to make the
> library file name unique.  Yes you have to make a small change from upstream,
> but thats part of the work that distribution maintainers do.

If it makes sense for upstream, it would be better for all if the code
was maintained in once place rather than fragmented across distribution
patches. My argument here is that _does_ make sense for upstream, which
is why I took the question to this list before we uploaded our first
patched version to Ubuntu.

> You must already have a solution to this, I can't imagine you package all the
> libraries for kde or gnome (or even pam) separately)

PAM modules are unversioned, since they are dynamically loaded plugins
and nothing actually links to them (in the sense that there are
no executables that link to them at exec time). The ABI is defined by
the version of PAM installed, not the version of the plugin. So I don't
think we can really compare to PAM.

I'm less familiar with KDE and GNOME packaging since I specialise on
server. But taking GNOME, for example, I am unable to find any binary
packages where multiple versioned shared objects have been bundled.
Their shared library packaging matches my expectations. For
example (source -> binary -> filename):

  gdk-pixbuf -> libgdk-pixbuf2.0-0 -> 
/usr/lib/x86_64-linux-gnu/libgdk_pixbuf-2.0.so.0
  gconf -> libgconf-2-4 -> /usr/lib/x86_64-linux-gnu/libgconf-2.so.4
  gtk+3.0 -> libgtk-3-0 -> /usr/lib/x86_64-linux-gnu/libgtk-3.so.0
  pango1.0 -> libpango-1.0-0 -> /usr/lib/x86_64-linux-gnu/libpango-1.0.so.0

Each binary package supplies no more than one soname. (Again I've ignored
unversioned pluggable modules for the same reason as PAM). If this isn't
what you mean, please can you find me a counter-example? Given a soname
you can find the binary package that provides it at
https://www.debian.org/distrib/packages under "Search the contents of
packages". I suggest you set the distribution to "testing" to find more
current sonames.

Christian points out to me that libc6 does ship multiple sonames in a
single package, but I think it's acceptable to consider this to be a
special case that DPDK cannot really look to as an example. We don't
normally co-install multiple ABI versions of libc because a major ABI
bump in libc is extremely rare, and when we do it's a very special case
that is handled as a major distribution-wide project.

In answer to "You must already have a solution to this", we do. Our
solution is to produce one binary package per soname. My point is that
in the case of DPDK, this creates extra unnecessary work. Alternatively,
we could treat DPDK packaging as the same sort of gargantuan task that
packaging GNOME and KDE are, but without a good reason to split
libraries this would be an artifical and unnecessary burden placed on
packagers by DPDK upstream, which is why I am against upstream doing
this.

> > Packaging a library is usually virtually a no-op in Debian and Ubuntu
> > nowadays. Our tooling does it all for us. But packaging DPDK is far from
> > this currently because of all this added complexity. From my perspective
> > this is unnecessary and makes no sense. We could do all kinds of things
> > to work around it (that's what packaging is about) but then we'd have to
> > maintain that specialness and I don't see why it must be awkward like
> > this instead of just doing it the same way as every other library.
> > 
> > > The combined library as it is simply is no longer a viable option.
> > > Besides just being broken (witness the strange hacks people are coming
> > > up with to work around issues in it) its ugly because it basically gives
> > > the middle finger to all the effort going into version compatibility,
> > > and its also big. Few projects will use every library in DPDK, but with
> > > the combined library they're forced to lug the 800 pound gorilla along
> > > needlessly.
> > 
> > It's broken because it's broken upstream, and that's what we should fix.
> > Why is it not viable? How does it give the middle finger to effort going
> > into version compatibility?
> Because each individual library has a version script that gets applied during
> link to version symbols properly.  Those scripts dont get applied when 
> building
> the combined library.

So this is just an upstream bug that needs resolving in the combined
library case? Then I appreciate Ferruh Yigit's efforts in fixing this
bug upstream. Thank you Ferruh Yigit.

> > Doing it the right way like every other
> > userspace library is what *gives us* version compatibility because then
> > distributions can straightforwardly install multiple ABI versions at
> > once.
> Again,  Not at all uncommon.  You're packaging methodology is the issue here,
> not the fact that there are multiple libraries.

No, our packaging methdology is sound as I hope I've explained well
enough above. The real issue is the yet-to-be-justified decision to
split libraries creating unnecessary packaging work given that we wish
to shared libraries properly rather than bundling all the sonames
together (which defeats the point of split libraries in the first
place).

> > Finally, I fail to see any "lug the 800 pound gorilla along" saving. We
> > (Ubuntu and Fedora) are both shipping all the libraries in one package,
> > whether split or combined, so they are all being lugged onto disk
> > anyway. Whether split or combined, there is no saving there. And memory
> > is hardly saved either because the kernel will just page in and out what
> > is needed in both cases. So how does this proposed change give us any
> > saving at all?
> > 
> Not true, initalization constructors for PMD's at the very least mean that 
> every
> pmd will get paged in weather you want it or not using the combined library.
> Individual libraries let you dynamically load them (via dlopen).  I think the
> same is true of several other facets of dpdk.

What's the objective impact of this? Can you quantify your claimed
saving? How does it compare to, say, the extra IOPS required in loading
multiple shared libraries and the extra pages that they could consume?
Are these things at all significant in an issue someone will face in the
real world?

On Tue, Dec 01, 2015 at 08:30:43AM -0500, Neil Horman wrote:
> On Tue, Dec 01, 2015 at 12:36:15PM +0000, Robie Basak wrote:
> > Why is limited symbol visibility a benefit in this case?
> > 
> Because it prevents an application from inadvertently using symbols that would
> otherwise appear in another library (i.e. if not using the combined library, 
> you
> know you've used a symbol in another library because you are then forced to 
> add
> that library to the build.

Does Ferruh Yigit's patch address this?

On Thu, Dec 03, 2015 at 09:59:24AM -0500, Neil Horman wrote:
> I've seen the patch, and I appreciate the effort, but it really seems to me 
> like
> more of the same.  That is to say, its a good effort but it really creates
> additional ifrastructure to allow a single library to be built, but the fact 
> of
> the matter is, a single library isn't needed.  The build system is setup to
> crate multiple libraries, and a linker scripts allows for the combined library
> functionality, without adding additional clutter to the Makefiles.  The 
> argument
> that its more work to support multiple libraries in some distributions simply
> doesn't ring true with me, because that must be a problem which is already
> solved for other popular projects which are architected in a simmilar fashion.

I think I've rebutted all of this above. If you think there's any part
left here that I've failed to address, please let me know and I can go
into it.

Thanks,

Robie

Reply via email to