On Tue, Jun 23, 2020 at 2:26 AM Jarek Potiuk <jarek.pot...@polidea.com>
wrote:
My understanding the bigger problem is the license of the dependency
(and
their dependencies) rather than the official/unofficial status. For
Apache
Yetus' test-patch functionality, we defaulted all of our plugins to
off
because we couldn't depend upon GPL'd binaries being available or
giving
the impression that they were required. By doing so, it put the onus
on
the user to specifically enable features that depends upon GPL'd
functionality. It also pretty much nukes any idea of being user
friendly.
:(
Indeed - Licensing is important, especially for source code
redistribution.
We used to have some GPL-install-on-your-own-if-you-want in the past but
those dependencies are gone already.
2) If it's not - how do we determine which images are "officially
maintained".
Keep in mind that Docker themselves brand their images as
'official' when they actually come from Docker instead of the
organizations
that own that particular piece of software. It just adds to the
complexity.
Not really. We actually plan to make our own Apache Airflow Docker
image as
official one. Docker has very clear guidelines on how to make images
"official" and it https://docs.docker.com/docker-hub/official_images/
and
there is quite a long iist of those:
https://github.com/docker-library/official-images/tree/master/library -
most of them maintained by the "authirs" of the image. Docker has a
dedicated team that reviews, checks those images and they encourage that
the "authors" maintain them. Quote from Docker's docs: "While it is
preferable to have upstream software authors maintaining their
corresponding Official Images, this is not a strict requirement."
3) If yes - how do we put the boundary - when image is acceptable?
Are
there any criteria we can use or/ constraints we can put on the
licences/organizations releasing the images we want to make
dependencies
for released code of ours?
License means everything.
For software distribution - true. It is the "blocker". But I think my
question goes a bit beyond that - i.e. whether it's ok to
encourage/depend
on the work maintained by other organizations than Apache if they are
not
"official". My take is that it's likely OK to depend on that providing
that
there is a kind of statement from those organizations that they
maintain it.
An example risk I see:
Airflow users depend heavily on helm chart to install Airflow - what
happens if the community agrees to implement something that the
organization does not want to implement (for whatever reason).
FWIW: every corporation I ever worked at would commission a
BlackDuck/Palamida
report of a total software scans for its products. There was some
amount of: "this
ASF project pulls a dependency FOO (non ASF) that is declared to be
licensed under
the license X but it actually isn't -- here's why..."
We trust our upstream dependencies, but I don't think we can verify
them as a foundation.
Hence we keep relying on that feedback coming from corporate sides.
Thanks,
Roman.
4) If some images are not acceptable, shoud we bring them in and
release
them in a community-managed registry?
For the Apache Yetus docker image, we're including everything
that
the project supports. *shrugs*
Yeah. That's perfectly OK in many cases. Our docker image is also
self-containing. However, Airflow is a bit special here. Airflow is an
orchestrator which means that it can talk to many different services. We
have 58(!) "providers" - basically external services we can talk to. And
many of those services require many dependencies - for example Cassandra
(for production installation) requires cython-compiled driver (for
performance) and it takes 10 minutes to build it. The smaller the
images -
the better - therefore the images we release contain the most "popular"
providers rather than all of them, but the user can build their own
image
from the sources if they want and add those extra dependencies they
need.
Another problem is - helm chart uses - by definition - a collection of
images - so we will always have some images that helm chart depends on
(pgbouncer is a good example). So it cannot be really self-contained. We
need to have dependencies, but the question is about "who controls them
:)"
J.
--
Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer
M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>