Thanks both for explaining!

Snappy is doing fine for me at the moment but I was curious about the other
options.

I'll have look at the parquet tool and see if that can help me a bit as
well.



Op wo 22 aug. 2018 om 08:05 schreef Jörn Franke <jornfra...@gmail.com>:

> No parquet and orc have internal compression which must be used over the
> external compression that you are referring to.
>
>  Internal compression can be decompressed in parallel which is
> significantly faster. Internally parquet supports only snappy, gzip,lzo,
> brotli (2.4.), lz4 (2.4), zstd (2.4).
>
> On 22. Aug 2018, at 07:33, Tanvi Thacker <tanvithack...@gmail.com> wrote:
>
> Hi Patrick,
>
> *What are other formats supported? *
> - As far as I know, you can set any compression with any format (ORC, Text
> with snappy ,gzip etc). Are you looking for any specific format or
> compression?
>
> How can I verify a file is compressed and with what algorithm?
> -  you may check parquet-tools
> <https://github.com/apache/parquet-mr/tree/master/parquet-tools> if they
> provide any meta information about compression.
>
> And, on another note, if you are already having an uncompressed data and
> you are creating a table with snappy compression, you need to do use
> "CREATE into new_compressed table as select * from un_compressed_table" in
> order to actually compress the data
>
> Regards,
> Tanvi Thacker
>
> On Fri, Aug 10, 2018 at 6:30 AM Patrick Duin <patd...@gmail.com> wrote:
>
>> Hi,
>>
>> I got some hive tables in Parquet format and I am trying to find out how
>> best to enable compression.
>>
>> Done a bit of searching and the information is a bit scattered but I
>> found I can use this hive property to enable compression.It needs to be set
>> before doing an insert.
>>
>> set parquet.compression=SNAPPY;
>>
>> What other formats are supported?
>> How can I verify a file is compressed and with what algorithm?
>>
>> Thanks,
>> Patrick
>>
>

Reply via email to