Re: Feature request to provide DFSInputStream subclassing mechanism

Suresh Srinivas Thu, 08 Aug 2013 14:18:11 -0700

This is being targeted for release 2.3.

2.1.x release stream is for stabilizing. When it reaches stability, 2.2 GA
will be released. The current features in development will make it to 2.3,
including HDFS-2832.



On Thu, Aug 8, 2013 at 2:04 PM, Matevz Tadel <mta...@ucsd.edu> wrote:

> Thanks Colin, I subscribed to HDFS-2832 so that I can follow the
> development there. I assume this is targeting release 2.1.
>
> Best,
> Matevz
>
>
>
> On 08/08/13 12:10, Colin McCabe wrote:
>
>> There is work underway to decouple the block layer and the namespace
>> layer of HDFS from each other.  Once this is done, block behaviors
>> like the one you describe will be easy to implement.  It's a use case
>> very similar to the hierarchical storage management (HSM) use case
>> that we've discussed before.  Check out HDFS-2832.  Hopefully there
>> will be a design document posted soon.
>>
>> cheers,
>> Colin
>>
>>
>> On Thu, Aug 8, 2013 at 11:52 AM, Matevz Tadel <mta...@ucsd.edu> wrote:
>>
>>> Hi everybody,
>>>
>>> I'm jumping in as Jeff is away due to an unexpected annoyance involving
>>> Californian wildlife.
>>>
>>>
>>> On 8/7/13 7:47 PM, Andrew Wang wrote:
>>>
>>>>
>>>> Blocks are supposed to be an internal abstraction within HDFS, and
>>>> aren't
>>>> an
>>>> inherent part of FileSystem (the user-visible class used to access all
>>>> Hadoop
>>>> filesystems).
>>>>
>>>
>>>
>>> Yes, but it's a really useful abstraction :) Do you really believe the
>>> blocks could be abandoned in the next couple of years? I mean, it's such
>>> a
>>> simple and effective solution ...
>>>
>>>
>>>  Is it possible to instead deal with files and offsets? On a read
>>>> failure,
>>>> you
>>>> could open a stream to the same file on the backup filesystem, seek to
>>>> the
>>>> old
>>>> file position, and retry the read. This feels like it's possible via
>>>> wrapping.
>>>>
>>>
>>>
>>> As Jeff briefly mentioned, all USCMS sites export their data via XRootd
>>> (not
>>> all of them use HDFS!) and we developed a specialization of XRootd
>>> caching
>>> proxy that can fetch only requested blocks (block size is passed from our
>>> input stream class to XRootd client (via JNI) and on to the proxy server)
>>> and keep them in a local cache. This allows as to do three things:
>>>
>>> 1. the first time we notice a block is missing, a whole block is fetched
>>> from elsewhere and further access requests from the same process get
>>> fulfilled with zero latency;
>>>
>>> 2. later requests from other processes asking for this block are
>>> fulfilled
>>> immediately (well, after the initial 3 retries);
>>>
>>> 3. we have a list of blocks that were fetched and we could (this is what
>>> we
>>> want to look into in the near future) re-inject them into HDFS if the
>>> data
>>> loss turns out to be permanent (bad disk vs. node that was
>>> offline/overloaded for a while).
>>>
>>> Handling exceptions at the block level thus gives us just what we need.
>>> As
>>> input stream is the place where these errors become known it is, I think,
>>> also the easiest place to handle them.
>>>
>>> I'll understand if you find opening-up of the interfaces in the central
>>> repository unacceptable. We can always apply the patch at the OSG level
>>> where rpms for all our deployments get built.
>>>
>>> Thanks & Best regards,
>>> Matevz
>>>
>>>
>>>> On Wed, Aug 7, 2013 at 3:29 PM, Jeff Dost <jd...@ucsd.edu
>>>> <mailto:jd...@ucsd.edu>> wrote:
>>>>
>>>>      Thank you for the suggestion, but we don't see how simply wrapping
>>>> a
>>>>      FileSystem object would be sufficient in our use case.  The reason
>>>> why
>>>> is we
>>>>      need to catch and handle read exceptions at the block level.  There
>>>> aren't
>>>>      any public methods available in the high level FileSystem
>>>> abstraction
>>>> layer
>>>>      that would give us the fine grained control we need at block level
>>>> read
>>>>      failures.
>>>>
>>>>      Perhaps if I outline the steps more clearly it will help explain
>>>> what
>>>> we are
>>>>      trying to do.  Without our enhancements, suppose a user opens a
>>>> file
>>>> stream
>>>>      and starts reading the file from Hadoop. After some time, at some
>>>> position
>>>>      into the file, if there happen to be no replicas available for a
>>>> particular
>>>>      block for whatever reason, datanodes have gone down due to disk
>>>> issues, etc.
>>>>      the stream will throw an IOException (BlockMissingException or
>>>> similar) and
>>>>      the read will fail.
>>>>
>>>>      What we are doing is rather than letting the stream fail, we have
>>>> another
>>>>      stream queued up that knows how to fetch the blocks elsewhere
>>>> outside
>>>> of our
>>>>      Hadoop cluster that couldn't be retrieved.  So we need to be able
>>>> to
>>>> catch
>>>>      the exception at this point, and these externally fetched bytes
>>>> then
>>>> get
>>>>      read into the user supplied read buffer.  Now Hadoop can proceed to
>>>> read in
>>>>      the stream the next blocks in the file.
>>>>
>>>>      So as you can see this method of fail over on demand allows an
>>>> input
>>>> stream
>>>>      to keep reading data, without having to start it all over again if
>>>> a
>>>> failure
>>>>      occurs (assuming the remote bytes were successfully fetched).
>>>>
>>>>      As a final note I would like to mention that we will be providing
>>>> our
>>>>      failover module to the Open Science Grid.  Since we hope to provide
>>>> this as
>>>>      a benefit to all OSG users running at participating T2 computing
>>>> clusters,
>>>>      we will be committed to maintaining this software and any changes
>>>> to
>>>> Hadoop
>>>>      needed to make it work.  In other words we will be willing to
>>>> maintain
>>>> any
>>>>      implementation changes that may become necessary as Hadoop
>>>> internals
>>>> change
>>>>      in future releases.
>>>>
>>>>      Thanks,
>>>>      Jeff
>>>>
>>>>
>>>>      On 8/7/13 11:30 AM, Andrew Wang wrote:
>>>>
>>>>          I don't think exposing DFSClient and DistributedFileSystem
>>>> members
>>>> is
>>>>          necessary to achieve what you're trying to do. We've got
>>>> wrapper
>>>>          FileSystems like FilterFileSystem and ViewFileSystem which you
>>>> might be
>>>>          able to use for inspiration, and the HCFS wiki lists some
>>>> third-party
>>>>          FileSystems that might also be helpful too.
>>>>
>>>>
>>>>          On Wed, Aug 7, 2013 at 11:11 AM, Joe Bounour <jboun...@ddn.com
>>>>          <mailto:jboun...@ddn.com>> wrote:
>>>>
>>>>              Hello Jeff
>>>>
>>>>              Is it something that could go under HCFS project?
>>>>              
>>>> http://wiki.apache.org/hadoop/**__HCFS<http://wiki.apache.org/hadoop/__HCFS>
>>>>
>>>>              
>>>> <http://wiki.apache.org/**hadoop/HCFS<http://wiki.apache.org/hadoop/HCFS>
>>>> >
>>>>              (I might be wrong?)
>>>>
>>>>              Joe
>>>>
>>>>
>>>>              On 8/7/13 10:59 AM, "Jeff Dost" <jd...@ucsd.edu
>>>>              <mailto:jd...@ucsd.edu>> wrote:
>>>>
>>>>                  Hello,
>>>>
>>>>                  We work in a software development team at the UCSD CMS
>>>> Tier2
>>>>                  Center.  We
>>>>                  would like to propose a mechanism to allow one to
>>>> subclass
>>>> the
>>>>                  DFSInputStream in a clean way from an external package.
>>>> First
>>>>                  I'd like
>>>>                  to give some motivation on why and then will proceed
>>>> with
>>>> the
>>>>                  details.
>>>>
>>>>                  We have a 3 Petabyte Hadoop cluster we maintain for the
>>>> LHC
>>>>                  experiment
>>>>                  at CERN.  There are other T2 centers worldwide that
>>>> contain
>>>>                  mirrors of
>>>>                  the same data we host.  We are working on an extension
>>>> to
>>>> Hadoop
>>>>                  that,
>>>>                  on reading a file, if it is found that there are no
>>>> available
>>>>                  replicas
>>>>                  of a block, we use an external interface to retrieve
>>>> this
>>>> block
>>>>                  of the
>>>>                  file from another data center.  The external interface
>>>> is
>>>> necessary
>>>>                  because not all T2 centers involved in CMS are running
>>>> a
>>>> Hadoop
>>>>                  cluster
>>>>                  as their storage backend.
>>>>
>>>>                  In order to implement this functionality, we need to
>>>> subclass the
>>>>                  DFSInputStream and override the read method, so we can
>>>> catch
>>>>                  IOExceptions that occur on client reads at the block
>>>> level.
>>>>
>>>>                  The basic steps required:
>>>>                  1. Invent a new URI scheme for the customized
>>>> "FileSystem"
>>>> in
>>>>                  core-site.xml:
>>>>                      <property>
>>>>                        <name>fs.foofs.impl</name>
>>>>                        <value>my.package.__**FooFileSystem</value>
>>>>
>>>>                        <description>My Extended FileSystem for foofs:
>>>>                  uris.</description>
>>>>                      </property>
>>>>
>>>>                  2. Write new classes included in the external package
>>>> that
>>>>                  subclass the
>>>>                  following:
>>>>                  FooFileSystem subclasses DistributedFileSystem
>>>>                  FooFSClient subclasses DFSClient
>>>>                  FooFSInputStream subclasses DFSInputStream
>>>>
>>>>                  Now any client commands that explicitly use the
>>>> foofs://
>>>> scheme
>>>>                  in paths
>>>>                  to access the hadoop cluster can open files with a
>>>> customized
>>>>                  InputStream that extends functionality of the default
>>>> hadoop client
>>>>                  DFSInputStream.  In order to make this happen for our
>>>> use
>>>> case,
>>>>                  we had
>>>>                  to change some access modifiers in the
>>>> DistributedFileSystem,
>>>>                  DFSClient,
>>>>                  and DFSInputStream classes provided by Hadoop.  In
>>>> addition, we
>>>>                  had to
>>>>                  comment out the check in the namenode code that only
>>>> allows for URI
>>>>                  schemes of the form "hdfs://".
>>>>
>>>>                  Attached is a patch file we apply to hadoop.  Note
>>>> that we
>>>>                  derived this
>>>>                  patch by modding the Cloudera release
>>>> hadoop-2.0.0-cdh4.1.1
>>>>                  which can be
>>>>                  found at:
>>>>
>>>> http://archive.cloudera.com/__**cdh4/cdh/4/hadoop-2.0.0-cdh4._**
>>>> _1.1.tar.gz<http://archive.cloudera.com/__cdh4/cdh/4/hadoop-2.0.0-cdh4.__1.1.tar.gz>
>>>>
>>>>
>>>> <http://archive.cloudera.com/**cdh4/cdh/4/hadoop-2.0.0-cdh4.**
>>>> 1.1.tar.gz<http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.1.1.tar.gz>
>>>> >
>>>>
>>>>                  We would greatly appreciate any advise on whether or
>>>> not
>>>> this
>>>>                  approach
>>>>                  sounds reasonable, and if you would consider accepting
>>>> these
>>>>                  modifications into the official Hadoop code base.
>>>>
>>>>                  Thank you,
>>>>                  Jeff, Alja & Matevz
>>>>                  UCSD Physics
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>


-- 
http://hortonworks.com/download/

Re: Feature request to provide DFSInputStream subclassing mechanism

Reply via email to