Thanks both for explaining! Snappy is doing fine for me at the moment but I was curious about the other options.
I'll have look at the parquet tool and see if that can help me a bit as well. Op wo 22 aug. 2018 om 08:05 schreef Jörn Franke <jornfra...@gmail.com>: > No parquet and orc have internal compression which must be used over the > external compression that you are referring to. > > Internal compression can be decompressed in parallel which is > significantly faster. Internally parquet supports only snappy, gzip,lzo, > brotli (2.4.), lz4 (2.4), zstd (2.4). > > On 22. Aug 2018, at 07:33, Tanvi Thacker <tanvithack...@gmail.com> wrote: > > Hi Patrick, > > *What are other formats supported? * > - As far as I know, you can set any compression with any format (ORC, Text > with snappy ,gzip etc). Are you looking for any specific format or > compression? > > How can I verify a file is compressed and with what algorithm? > - you may check parquet-tools > <https://github.com/apache/parquet-mr/tree/master/parquet-tools> if they > provide any meta information about compression. > > And, on another note, if you are already having an uncompressed data and > you are creating a table with snappy compression, you need to do use > "CREATE into new_compressed table as select * from un_compressed_table" in > order to actually compress the data > > Regards, > Tanvi Thacker > > On Fri, Aug 10, 2018 at 6:30 AM Patrick Duin <patd...@gmail.com> wrote: > >> Hi, >> >> I got some hive tables in Parquet format and I am trying to find out how >> best to enable compression. >> >> Done a bit of searching and the information is a bit scattered but I >> found I can use this hive property to enable compression.It needs to be set >> before doing an insert. >> >> set parquet.compression=SNAPPY; >> >> What other formats are supported? >> How can I verify a file is compressed and with what algorithm? >> >> Thanks, >> Patrick >> >