Re: merging into MapFile

Elia Mazzawi Tue, 09 Dec 2008 14:22:00 -0800

it has to do with the data block size,

I had many small files and the performance because much better when imerged them,

the default block size is 64Mb so redo your files to <= 64MB (what i didand recommend)

or reconfigure your hadoop.

<property>
 <name>dfs.block.size</name>
 <value>67108864</value>
 <description>The default block size for new files.</description>
</property>

do something like
cat * | rotatelogs ./merged/m 64M
it will merge and chop the data for you.

yoav.morag wrote:

hi all -
can anyone comment on the performance cost of merging many small files into
an increasingly large MapFile ? will that cost be dependent on the size of
the larger MapFile (since I have to rewrite it) or is there a built-in
strategy to split it into smaller parts, affecting only those which were

touched ?thanks -

Yoav.

Re: merging into MapFile

Reply via email to