Re: [VOTE] Merge HDFS-3077 (QuorumJournalManager) branch to trunk

Todd Lipcon Thu, 27 Sep 2012 13:30:34 -0700

Sorry for the slowness in response - I've been in bed with a fever the
past couple days and only sporadically on email. I'll respond to a few
different points made in the above thread here:

Konst> We keep growing the HDFS umbrella with competing technologies
Konst> (http/web HDFS as an example) within it.

I suppose these are "competing technologies" in the sense that there
are multiple options to solve the same goal: NFS shared storage,
BookKeeper-based storage, and QJM-based storage. On the other hand,
they are quite distinct in my view:
- NFS shared storage: this simply re-uses existing code
(FileJournalManager) that we already have for the non-HA case. It has
several downsides for use in an HA cluster (see the HDFS-3077 design
doc for discussion of them).
- BK storage: this is advantageous for some people who already run
BookKeeper in their datacenter. However, I think for those not already
running it, it adds much complexity, and lacks several advantages of
the QJM - again, see the design discussion on HDFS-3077 for details.
The tldr version is: Hadoop-consistent metrics, security, IPC, and
configuration.
- QJM storage: I won't expand on the advantages here, please refer to
the design doc.

So, if they fulfill the same goal, why let them coexist in the codebase?

I would argue that this is one of those things that often happens in a
community-developed project: different users or vendors may have
slightly different requirements, and therefore prefer different
solutions. Some people already have an HA NFS filer and run all their
HA using NFS, so the NFS shared storage seems "safest" to them. Others
already run BookKeeper, so they feel most comfortable with that
approach (or they want to use BookKeeper for other replicated logs).
For me and a lot of our customers, QJM seems to be the best approach.

Rather than duking it out and trying to declare one the "winner", our
answer has generally been "make it pluggable, and people can pick the
implementation that's right for them". If at some point later, it
turns out that everyone's using the same plug-in, by all means we
should consider ejecting the others, eg to github. But while there are
several active committers who have pledged that they will maintain a
solution, it seems reasonable to keep it as part of HDFS core. In this
case, I'm certainly planning on maintaining it, and it seems like
others on the list are on board as well.

Konst> Which makes the project harder to stabilize and release.
Konst> Not touching MR/Yarn here.
Konst> My concern is that if we add another 6000 lines of code to Hadoop-2,
Konst> it will take yet another x months for stabilization.

As you pointed out in your first email, the new code is mostly
stand-alone. The changes to the core NN are really quite small, and
actually benefit all of the storage implementations (eg enhancing the
web UI to display non-file JMs). So, if you don't choose to add a
qjournal:// URI on your cluster, it won't affect your stability at
all.

On the other hand, I happen to believe it is as stable as the rest of
Hadoop. Several committers, as well as some folks in the community,
and our internal QA team have been testing it for a couple months now,
with extensive fault injection, real workloads, etc, and it has held
up nicely. Stability is of course subjective in some sense, so if you
want to think of it as unstable owing to the fact that it's new code,
I'll respect that. But I don't think you can reject new code from a
project just because it's new code.

Konst> While it is not clear why people cannot just use NFS filers for
shared storage,
Konst> as you originally designed.

This was always seen as a step along the way to the final solution. As
several people have said, NFS filers are not available in every
organization. It was a useful point along the way, as it provided a
working production-ready HA solution several months back, whereas
waiting for a non-NFS solution would have delayed it quite a bit. For
reference, check out Sanjay and Aaron's presentation from Hadoop World
last year: 
http://www.slideshare.net/cloudera/hadoop-world-2011-hdfs-name-node-high-availablity-aaron-myers-cloudera-sanjay-radia-hortonworks
 ("Other options to share NN metadata" under "Future Work")

Konst> Don't understand your argument. Else where?
Konst> One way or another users will be talking to Todd.

Not sure what you meant by this. Sure, I wrote a lot of the code, but
it's a contribution to the community, and in no way do I have a
monopoly on helping people use it (I certainly hope I don't!) The
branch to be merged includes full docs on how to get running, and I
certainly hope that other Hadoop vendors will ship and recommend this
option too. And those who choose to consume Hadoop directly from
Apache, rather than through a vendor, should be able to use it equally
well, without patching together bits and pieces from a number of
repositories.

Thanks
-Todd

On Thu, Sep 27, 2012 at 9:59 AM, Stack <st...@duboce.net> wrote:
> On Thu, Sep 27, 2012 at 2:06 AM, Konstantin Shvachko
> <shv.had...@gmail.com> wrote:
>> The SPOF is in HDFS. This project is about shared storage
>> implementation, that could be replaced by NFS or BookKeeper or
>> something else.
>
> You cannot equate QJM to a solution that requires an NFS filer.  A
> filer is just not possible in the deploys I am privy to.
>
>> Same if QJ failed. You go to the creators, which will be somewhat
>> closer than NetApp, because it is still Hadoop.
>>
>
> You seem to be undoing yourself with the above setup Konstantin.  At
> our deploy, we can't do NetApp so calling them will never happen.  If
> a problem in QJM, it's Apache HDFS, so it'll be fixed by the community
> -- hopefully w/ input by creators -- as any other issue would be fixed
> (or not).
>
> St.Ack

-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [VOTE] Merge HDFS-3077 (QuorumJournalManager) branch to trunk

Reply via email to