Re: [DISCUSS] Docker build process

Eric Yang Tue, 19 Mar 2019 14:03:56 -0700

Hi Arpit,

On Docker Hub, Hadoop tagged with version number that looks like: 
docker-hadoop-runner-latest, or jdk11.  It is hard to tell if jdk11 image is 
Hadoop 3 or Hadoop 2 because there is no consistency in tag format usage.  This 
is my reasoning against tag as your heart desired because flexible naming 
causes confusion over the long run.


There is a good article for perform maven release with M2_Release_Plugin in 
Jenkins: https://dzone.com/articles/running-maven-release-plugin
Jenkins perform maven release, tags the source code with version number and 
automatically upload artifacts to Nexus, then reset version number to next 
SNAPSHOT number.  If dockerfile plugin is used, it can upload the artifact to 
Dockerhub as part of the release.

The proposed adjustment is to put docker build in a maven profile.  User who 
wants to build it, will need to add -Pdocker flag to trigger the build.

Regards,
Eric

On 3/19/19, 12:48 PM, "Arpit Agarwal" <aagar...@cloudera.com> wrote:

    Hi Eric,
    
    > Dockerfile is most likely to change to apply the security fix.
    
    I am not sure this is always. Marton’s point about revising docker images 
independent of Hadoop versions is valid. 
    
    
    > When maven release is automated through Jenkins, this is a breeze
    > of clicking a button.  Jenkins even increment the target version
    > automatically with option to edit. 
    
    I did not understand this suggestion. Could you please explain in simpler 
terms or share a link to the description?
    
    
    > I will make adjustment accordingly unless 7 more people comes
    > out and say otherwise.
    
    What adjustment is this?
    
    Thanks,
    Arpit
    
    
    > On Mar 19, 2019, at 10:19 AM, Eric Yang <ey...@hortonworks.com> wrote:
    > 
    > Hi Marton,
    > 
    > Thank you for your input.  I agree with most of what you said with a few 
exceptions.  Security fix should result in a different version of the image 
instead of replace of a certain version.  Dockerfile is most likely to change 
to apply the security fix.  If it did not change, the source has instability 
over time, and result in non-buildable code over time.  When maven release is 
automated through Jenkins, this is a breeze of clicking a button.  Jenkins even 
increment the target version automatically with option to edit.  It makes 
release manager's job easier than Homer Simpson's job.
    > 
    > If versioning is done correctly, older branches can have the same docker 
subproject, and Hadoop 2.7.8 can be released for older Hadoop branches.  We 
don't generate timeline paradox to allow changing the history of Hadoop 2.7.1.  
That release has passed and let it stay that way.
    > 
    > There are mounting evidence that Hadoop community wants docker profile 
for developer image.  Precommit build will not catch some build errors because 
more codes are allowed to slip through using profile build process.  I will 
make adjustment accordingly unless 7 more people comes out and say otherwise.
    > 
    > Regards,
    > Eric
    > 
    > On 3/19/19, 1:18 AM, "Elek, Marton" <e...@apache.org> wrote:
    > 
    > 
    > 
    >    Thank you Eric to describe the problem.
    > 
    >    I have multiple small comments, trying to separate them.
    > 
    >    I. separated vs in-build container image creation
    > 
    >> The disadvantages are:
    >> 
    >> 1.  Require developer to have access to docker.
    >> 2.  Default build takes longer.
    > 
    > 
    >    These are not the only disadvantages (IMHO) as I wrote it in in the
    >    previous thread and the issue [1]
    > 
    >    Using in-build container image creation doesn't enable:
    > 
    >    1. to modify the image later (eg. apply security fixes to the container
    >    itself or apply improvements for the startup scripts)
    >    2. create images for older releases (eg. hadoop 2.7.1)
    > 
    >    I think there are two kind of images:
    > 
    >    a) images for released artifacts
    >    b) developer images
    > 
    >    I would prefer to manage a) with separated branch repositories but b)
    >    with (optional!) in-build process.
    > 
    >    II. Agree with Steve. I think it's better to make it optional as most 
of
    >    the time it's not required. I think it's better to support the default
    >    dev build with the default settings (=just enough to start)
    > 
    >    III. Maven best practices
    > 
    >    (https://dzone.com/articles/maven-profile-best-practices)
    > 
    >    I think this is a good article. But this is not against profiles but
    >    creating multiple versions from the same artifact with the same name
    >    (eg. jdk8/jdk11). In Hadoop, profiles are used to introduce optional
    >    steps. I think it's fine as the maven lifecycle/phase model is very
    >    static (compare it with the tree based approach in Gradle).
    > 
    >    Marton
    > 
    >    [1]: https://issues.apache.org/jira/browse/HADOOP-16091
    > 
    >    On 3/13/19 11:24 PM, Eric Yang wrote:
    >> Hi Hadoop developers,
    >> 
    >> In the recent months, there were various discussions on creating docker 
build process for Hadoop.  There was convergence to make docker build process 
inline in the mailing list last month when Ozone team is planning new 
repository for Hadoop/ozone docker images.  New feature has started to add 
docker image build process inline in Hadoop build.
    >> A few lessons learnt from making docker build inline in YARN-7129.  The 
build environment must have docker to have a successful docker build.  
BUILD.txt stated for easy build environment use Docker.  There is logic in 
place to ensure that absence of docker does not trigger docker build.  The 
inline process tries to be as non-disruptive as possible to existing 
development environment with one exception.  If docker’s presence is detected, 
but user does not have rights to run docker.  This will cause the build to fail.
    >> 
    >> Now, some developers are pushing back on inline docker build process 
because existing environment did not make docker build process mandatory.  
However, there are benefits to use inline docker build process.  The listed 
benefits are:
    >> 
    >> 1.  Source code tag, maven repository artifacts and docker hub artifacts 
can all be produced in one build.
    >> 2.  Less manual labor to tag different source branches.
    >> 3.  Reduce intermediate build caches that may exist in multi-stage 
builds.
    >> 4.  Release engineers and developers do not need to search a maze of 
build flags to acquire artifacts.
    >> 
    >> The disadvantages are:
    >> 
    >> 1.  Require developer to have access to docker.
    >> 2.  Default build takes longer.
    >> 
    >> There is workaround for above disadvantages by using -DskipDocker flag 
to avoid docker build completely or -pl !modulename to bypass subprojects.
    >> Hadoop development did not follow Maven best practice because a full 
Hadoop build requires a number of profile and configuration parameters.  Some 
evolutions are working against Maven design and require fork of separate source 
trees for different subprojects and pom files.  Maven best practice 
(https://dzone.com/articles/maven-profile-best-practices) has explained that do 
not use profile to trigger different artifact builds because it will introduce 
maven artifact naming conflicts on maven repository using this pattern.  Maven 
offers flags to skip certain operations, such as -DskipTests 
-Dmaven.javadoc.skip=true -pl or -DskipDocker.  It seems worthwhile to make 
some corrections to follow best practice for Hadoop build.
    >> 
    >> Some developers have advocated for separate build process for docker 
images.  We need consensus on the direction that will work best for Hadoop 
development community.  Hence, my questions are:
    >> 
    >> Do we want to have inline docker build process in maven?
    >> If yes, it would be developer’s responsibility to pass -DskipDocker flag 
to skip docker.  Docker is mandatory for default build.
    >> If no, what is the release flow for docker images going to look like?
    >> 
    >> Thank you for your feedback.
    >> 
    >> Regards,
    >> Eric
    >> 
    > 
    >    ---------------------------------------------------------------------
    >    To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
    >    For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
    > 
    > 
    > 
    > 
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
    > For additional commands, e-mail: common-dev-h...@hadoop.apache.org
    
    


---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: [DISCUSS] Docker build process

Reply via email to