Hi,

Thanks Vinay for your response. I dont need blocks of variable size. But
setting only the block size probably wont help in my case. Let me give an
example to explain what I am trying to do.

Let say the main file has 12 integers 1 to 12. The block size is such that
each block will have 3 integers. Now if I ask hdfs to create the blocks, it
would create 4 blocks - first one would have 1-3, second one would have
4-7. According to my requirement, the data in the main file is partitioned
into 3 clusters. (1,2,3,4), (5,6,7,8) and (9,10,11,12). Now when the blocks
will be created, I need data from all partitions get represented in each
block. So in this case, the first block would have (1,5,9), second one
would have (2,6,10) etc... So i want to change how the data is allocated in
each of the blocks.

Is it feasible to change  the default block creation policy in current
implementation?

Regards,
Abhishek Das

On Tue, Feb 17, 2015 at 2:25 AM, Vinayakumar B <vinayakum...@apache.org>
wrote:

> Hi abhishek,
> Is Your partitions of same sizes? If yes, then you can set that as block
> size.
>
> If not you can use the latest feature.. variable block size.
> To verify your use case.
> You can close the current block after each partition data is written and
> append to new block for new partition data.
> This feature is not yet available in any of the release. Hope to see in
> future 2.7 release. As of now you can verify in any of the trunk/branch-2
> builds.
>
> Hope this helps.
>
> -Vinay
> On Feb 17, 2015 8:30 AM, "Abhishek Das" <abhishek.b...@gmail.com> wrote:
>
> > Hi,
> >
> > I am new in this group. I had a question regarding block creation in
> HDFS.
> > By default the file is split into multiple blocks of size equal to block
> > size. I need to introduce new block creation policy into the system. In
> my
> > case the main file is divided into multiple partitions. My goal is to
> > create the blocks where data is represented from each partition of the
> > file. Is it possible to introduce the new policy ? If yes, what would the
> > starting point in the code I should look at.
> >
> > Regards,
> > Abhishek Das
> >
>

Reply via email to