Hello hadoop users,
I have an idea about a small feature for the getmerge tool. I recently
was in the need of using the new line option -nl because the files I
needed to merge simply didn't had one.
I was merging all the files from one directory and unfortunately this
directory also included empty files, which effectively led to multiple
newlines append after some files.
I needed to remove them manually afterwards.
In this situation it is maybe good to have another argument that allows
skipping empty files. I just wrote down 2 change one could try at the
end. Do you guys consider this as a good improvement to the command line
tools?
Thing one could try to implement this feature:
The call for IOUtils.copyBytes(in, out, getConf(), false); doesn't
return the number of bytes copied which would be convenient as one could
skip append the new line when 0 bytes where copied
Or one would check the file size before.
Please let me know If you would consider this useful and is worth a
feature ticket in Jira.
Thank you
Jan