Re: hdfs.fileType = CompressedStream

Jeff Lord Thu, 30 Jan 2014 13:59:49 -0800

You are using gzip so the files won't splittable.
You may be better off using snappy and sequence files.



On Thu, Jan 30, 2014 at 10:51 AM, Jimmy <jimmyj...@gmail.com> wrote:

> I am running few tests and would like to confirm whether this is true...
>
> hdfs.codeC = gzip
> hdfs.fileType = CompressedStream
> hdfs.writeFormat = Text
> hdfs.batchSize = 100
>
>
> now lets assume I have large number of transactions I roll file every 10
> minutes
>
> it seems the tmp file stay 0bytes and flushes at once after 10 minutes vs
> if I dont use compression, the file will grow as data are written to HDFS
>
> is this correct?
>
> Do you see any drawback in using compressedstream and with very large
> files? In my case 120MB compressed file (block size) is 10x uncompressed
>
>

Re: hdfs.fileType = CompressedStream

Reply via email to