Last year we discussed removing support for things like Hadoop 2.5 and
earlier. It was deprecated in Spark 2.1.0. I'd like to go ahead with this,
so am checking whether anyone has strong feelings about it.

The original rationale for separate Hadoop profile was bridging the
significant difference between Hadoop 1 and 2, and the moderate differences
between 2.0 alpha, 2.1 beta, and 2.2 final. 2.2 is really the "stable"
Hadoop 2, and releases from there to current are comparatively very similar
from Spark's perspective. We nevertheless continued to make a separate
build profile for every minor release, which isn't serving much purpose.

The argument here is mostly that it will simplify code a little bit (less
reflection, fewer profiles), simplify the build -- we now have 6 profiles x
2 build systems x 4 major branches in Jenkins, whereas master could go down
to 2 profiles.

Realistically, I don't know how much we'd do to support Hadoop before 2.6
anyway. Any distro user is long since on 2.6+.

Would this cause anyone significant pain? if so, let's talk about when it
would be realistic to remove this, when does that change.

Reply via email to