Thanks for responding Harsh. 

I agree. Hadoop Common does do a good job of maintaining a stable and public FS 
and FS Context API. The pro for maintaining client libraries outside of Hadoop 
Common is that the release owner of the library has much more autonomy and 
agility in maintaining the library. From the glusterfs plugin perspective, I 
concur with this. In contrast, if my library was managed inside of Hadoop 
Common, I'd have to spend the time to earn committer status to have an 
equivalent amount of autonomy and agility, which is overkill for someone just 
wanting to maintain 400 lines of code.

I ruminated a bit about one con which might be that because it doesn't get 
shipped with Hadoop Common it might make it harder for the Hadoop User 
community to find out about it and obtain it. However, if you consider the LZO 
codec, the fact that its not bundled certainly doesn't hamper its adoption.

You mentioned testing. I don't think regression across Hadoop releases is as 
big of an issue as (based on my understanding) you really just have two 
FileSystem interfaces (abstract class) to worry about WRT to compliance, namely 
the FileSystem interface reflected for Hadoop 1.0 and the FileSystem interface 
reflected for Hadoop 2.0. However, this is a broader topic that I also want to 
discuss so I'll tee it up in a separate thread.

Regards
Steve Watt


----- Original Message -----
From: "Harsh J" <ha...@cloudera.com>
To: common-dev@hadoop.apache.org
Sent: Thursday, May 23, 2013 1:37:30 PM
Subject: Re: [DISCUSS] - Committing client code to 3rd Party FileSystems within 
Hadoop Common

I think we do a fairly good work maintaining a stable and public FileSystem
and FileContext API for third-party plugins to exist outside of Apache
Hadoop but still be able to work well across versions.

The question of test pops up though, specifically that of testing against
trunk to catch regressions across various implementations, but it'd be much
work for us to also maintain glusterfs dependencies and mechanisms as part
of trunk.

We do provide trunk build snapshot artifacts publicly for downstream
projects to test against, which I think may help cover the continuous
testing concerns, if there are those.

Right now, I don't think the S3 FS we maintain really works all that well.
I also recall, per recent conversations on the lists, that AMZN has started
shipping their own library for a better implementation rather than
perfecting the implementation we have here (correct me if am wrong but I
think the changes were not all contributed back). I see some work going on
for OpenStack's Swift, for which I think Steve also raised a similar
discussion here: http://search-hadoop.com/m/W1S5h2SrxlG, but I don't recall
if the conversation proceeded at the time.

What's your perspective as the releaser though? Would you not find
maintaining this outside easier, especially in terms of maintaining your
code for quicker releases, for both bug fixes and features - also given
that you can CI it against Apache Hadoop trunk at the same time?


On Thu, May 23, 2013 at 11:47 PM, Stephen Watt <sw...@redhat.com> wrote:

> (Resending - I think the first time I sent this out it got lost within all
> the ByLaws voting)
>
> Hi Folks
>
> My name is Steve Watt and I am presently working on enabling glusterfs to
> be used as a Hadoop FileSystem. Most of the work thus far has involved
> developing a Hadoop FileSystem plugin for glusterfs. I'm getting to the
> point where the plugin is becoming stable and I've been trying to
> understand where the right place is to host/manage/version it.
>
> Steve Loughran was kind enough to point out a few past threads in the
> community (such as
> http://lucene.472066.n3.nabble.com/Need-to-add-fs-shim-to-use-QFS-td4012118.html)
> that show a project disposition to move away from Hadoop Common containing
> client code (plugins) for 3rd party FileSystems. This makes sense and
> allows the filesystem plugin developer more autonomy as well as reduces
> Hadoop Common's dependence on 3rd Party libraries.
>
> Before I embark down that path, can the PMC/Committers verify that the
> preference is still to have client code for 3rd Party FileSystems hosted
> and managed outside of Hadoop Common?
>
> Regards
> Steve Watt
>



-- 
Harsh J

Reply via email to