Re: Finding file size during block placement

Colin McCabe Fri, 25 Jul 2014 10:55:48 -0700

On Wed, Jul 23, 2014 at 8:15 AM, Arjun <baksh...@mail.uc.edu> wrote:

> Hi,
>
> I want to write a block placement policy that takes the size of the file
> being placed into account. Something like what is done in CoHadoop or BEEMR
> paper. I have the following questions:
>
>
Hadoop uses a stream metaphor.  So at the time you're deciding what blocks
to use for a DFSOutputStream, you don't know how many bytes the user code
is going to write.  It could be terabytes, or nothing.


You could potentially start placing the later replicas differently, once
the first few blocks had been written.  You would probably need to modify
the BlockPlacementPolicy interface to supply this information.  I could be
wrong, but as far as I can see, there's no way to access that with the
current API.

cheers,
Colin



> 1- Is srcPath in chooseTarget the path to the original un-chunked file, or
> it is a path to a single block?
>
> 2- Will a simple new File(srcPath) will do?
>
> 3- I've spent time looking at hadoop source code. I can't find a way to go
> from srcPath in chooseTarget to a file size. Every function I think can do
> it, in FSNamesystem, FSDirectory, etc., is either non-public, or cannot be
> called from inside the blockmanagement package or blockplacement class.
>
> How do I go from srcPath in blockplacement class to size of the file being
> placed?
>
> Thank you,
>
> AB
>

Re: Finding file size during block placement

Reply via email to