Hi Matt,

I'd also recommend implementing this in a somewhat pluggable way -- eg a
configuration for a Deleter class. The default Deleter can be the one we
use today which just removes the file, and you could plug in a
SecureDeleter. I'd also see some use cases for a Deleter implementation
which doesn't actually delete the block, but instead moves it to a local
trash directory which is deleted a day or two later. This sort of policy
can help recover data as a last ditch effort if there is some kind of
accidental deletion and there aren't snapshots in place.

-Todd

On Thu, Aug 15, 2013 at 11:50 AM, Andrew Wang <andrew.w...@cloudera.com>wrote:

> Hi Matt,
>
> Here are some code pointers:
>
> - When doing a file deletion, the NameNode turns the file into a set of
> blocks that need to be deleted.
> - When datanodes heartbeat in to the NN (see BPServiceActor#offerService),
> the NN replies with blocks to be invalidated (see BlockCommand and
> DatanodeProtocol.DNA_INVALIDATE).
> - The DN processes these invalidates in
> BPServiceActor#processCommandFromActive (look for DNA_INVALIDATE again).
> - The magic lines you're looking for are probably in
> FsDatasetAsyncDiskService#run, since we delete blocks in the background
>
> Best,
> Andrew
>
>
> On Thu, Aug 15, 2013 at 5:31 AM, Matt Fellows <
> matt.fell...@bespokesoftware.com> wrote:
>
> > Hi,
> > I'm looking into writing a patch for HDFS which will provide a new method
> > within HDFS which can securely delete the contents of a block on all the
> > nodes upon which it exists. By securely delete I mean, overwrite with
> > 1's/0's/random data cyclically such that the data could not be recovered
> > forensically.
> >
> > I'm not currently aware of any existing code / methods which provide
> this,
> > so was going to implement this myself.
> >
> > I figured the DataNode.java was probably the place to start looking into
> > how this could be done, so I've read the source for this, but it's not
> > really enlightened me a massive amount.
> >
> > I'm assuming I need to tell the NameServer that all DataNodes with a
> > particular block id would be required to be deleted, then as each
> DataNode
> > calls home, the DataNode would be instructed to securely delete the
> > relevant block, and it would oblige.
> >
> > Unfortunately I have no idea where to begin and was looking for some
> > pointers?
> >
> > I guess specifically I'd like to know:
> >
> > 1. Where the hdfs CLI commands are implemented
> > 2. How a DataNode identifies a block / how a NameServer could inform a
> > DataNode to delete a block
> > 3. Where the existing "delete" is implemented so I can make sure my
> secure
> > delete makes use of it after successfully blanking the block contents
> > 4. If I've got the right idea about this at all?
> >
> > Kind regards,
> > Matt Fellows
> >
> > --
> > [image: cid:1CBF4038-3F0F-4FC2-A1FF-6DC81B8B6F94]
> >  First Option Software Ltd
> > Signal House
> > Jacklyns Lane
> > Alresford
> > SO24 9JJ
> > Tel: +44 (0)1962 738232
> > Mob: +44 (0)7710 160458
> > Fax: +44 (0)1962 600112
> > Web: www.b <http://www.fosolutions.co.uk/>espokesoftware.com<
> http://bespokesoftware.com/>
> >
> > ______________________________**______________________
> >
> > This is confidential, non-binding and not company endorsed - see full
> > terms at www.fosolutions.co.uk/**emailpolicy.html<
> http://www.fosolutions.co.uk/emailpolicy.html>
> >
> > First Option Software Ltd Registered No. 06340261
> > Signal House, Jacklyns Lane, Alresford, Hampshire, SO24 9JJ, U.K.
> > ______________________________**______________________
> >
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to