Sorry for the slowness in response - I've been in bed with a fever the past couple days and only sporadically on email. I'll respond to a few different points made in the above thread here:
Konst> We keep growing the HDFS umbrella with competing technologies Konst> (http/web HDFS as an example) within it. I suppose these are "competing technologies" in the sense that there are multiple options to solve the same goal: NFS shared storage, BookKeeper-based storage, and QJM-based storage. On the other hand, they are quite distinct in my view: - NFS shared storage: this simply re-uses existing code (FileJournalManager) that we already have for the non-HA case. It has several downsides for use in an HA cluster (see the HDFS-3077 design doc for discussion of them). - BK storage: this is advantageous for some people who already run BookKeeper in their datacenter. However, I think for those not already running it, it adds much complexity, and lacks several advantages of the QJM - again, see the design discussion on HDFS-3077 for details. The tldr version is: Hadoop-consistent metrics, security, IPC, and configuration. - QJM storage: I won't expand on the advantages here, please refer to the design doc. So, if they fulfill the same goal, why let them coexist in the codebase? I would argue that this is one of those things that often happens in a community-developed project: different users or vendors may have slightly different requirements, and therefore prefer different solutions. Some people already have an HA NFS filer and run all their HA using NFS, so the NFS shared storage seems "safest" to them. Others already run BookKeeper, so they feel most comfortable with that approach (or they want to use BookKeeper for other replicated logs). For me and a lot of our customers, QJM seems to be the best approach. Rather than duking it out and trying to declare one the "winner", our answer has generally been "make it pluggable, and people can pick the implementation that's right for them". If at some point later, it turns out that everyone's using the same plug-in, by all means we should consider ejecting the others, eg to github. But while there are several active committers who have pledged that they will maintain a solution, it seems reasonable to keep it as part of HDFS core. In this case, I'm certainly planning on maintaining it, and it seems like others on the list are on board as well. Konst> Which makes the project harder to stabilize and release. Konst> Not touching MR/Yarn here. Konst> My concern is that if we add another 6000 lines of code to Hadoop-2, Konst> it will take yet another x months for stabilization. As you pointed out in your first email, the new code is mostly stand-alone. The changes to the core NN are really quite small, and actually benefit all of the storage implementations (eg enhancing the web UI to display non-file JMs). So, if you don't choose to add a qjournal:// URI on your cluster, it won't affect your stability at all. On the other hand, I happen to believe it is as stable as the rest of Hadoop. Several committers, as well as some folks in the community, and our internal QA team have been testing it for a couple months now, with extensive fault injection, real workloads, etc, and it has held up nicely. Stability is of course subjective in some sense, so if you want to think of it as unstable owing to the fact that it's new code, I'll respect that. But I don't think you can reject new code from a project just because it's new code. Konst> While it is not clear why people cannot just use NFS filers for shared storage, Konst> as you originally designed. This was always seen as a step along the way to the final solution. As several people have said, NFS filers are not available in every organization. It was a useful point along the way, as it provided a working production-ready HA solution several months back, whereas waiting for a non-NFS solution would have delayed it quite a bit. For reference, check out Sanjay and Aaron's presentation from Hadoop World last year: http://www.slideshare.net/cloudera/hadoop-world-2011-hdfs-name-node-high-availablity-aaron-myers-cloudera-sanjay-radia-hortonworks ("Other options to share NN metadata" under "Future Work") Konst> Don't understand your argument. Else where? Konst> One way or another users will be talking to Todd. Not sure what you meant by this. Sure, I wrote a lot of the code, but it's a contribution to the community, and in no way do I have a monopoly on helping people use it (I certainly hope I don't!) The branch to be merged includes full docs on how to get running, and I certainly hope that other Hadoop vendors will ship and recommend this option too. And those who choose to consume Hadoop directly from Apache, rather than through a vendor, should be able to use it equally well, without patching together bits and pieces from a number of repositories. Thanks -Todd On Thu, Sep 27, 2012 at 9:59 AM, Stack <st...@duboce.net> wrote: > On Thu, Sep 27, 2012 at 2:06 AM, Konstantin Shvachko > <shv.had...@gmail.com> wrote: >> The SPOF is in HDFS. This project is about shared storage >> implementation, that could be replaced by NFS or BookKeeper or >> something else. > > You cannot equate QJM to a solution that requires an NFS filer. A > filer is just not possible in the deploys I am privy to. > >> Same if QJ failed. You go to the creators, which will be somewhat >> closer than NetApp, because it is still Hadoop. >> > > You seem to be undoing yourself with the above setup Konstantin. At > our deploy, we can't do NetApp so calling them will never happen. If > a problem in QJM, it's Apache HDFS, so it'll be fixed by the community > -- hopefully w/ input by creators -- as any other issue would be fixed > (or not). > > St.Ack -- Todd Lipcon Software Engineer, Cloudera