subject:"\[C\+\+\]\[Parquet\] Support different compression algorithms per row group"

Re: [C++][Parquet] Support different compression algorithms per row group

2024-03-25 Thread Antoine Pitrou

Regardless of whether they have different compression ratios, it doesn't explain why you would want a different compression *algorithm* altogether. The choice of a compression algorithm should basically be driven by two concerns: the acceptable space/time tradeoff (do you want to minimize d

Re: [C++][Parquet] Support different compression algorithms per row group

2024-03-25 Thread Gang Wu

Sometimes rows from different row groups may have different compression ratios when data distribution varies a lot among them. It seems to me that a harder problem is how would you figure out that pattern before the data is written and compressed. If that is not a problem in your case, it would be

Re: [C++][Parquet] Support different compression algorithms per row group

2024-03-23 Thread Aldrin

Hi Andrei, I tried finding more details on block compression in parquet (or compression per data page) and I couldn't find anything to satisfy my curiosity about how it can be used and how it performs. I hate being the person to just say "test it first," so I want to also recommend figuring out

Re: [C++][Parquet] Support different compression algorithms per row group

2024-03-23 Thread Antoine Pitrou

Hello Andrei, Le 23/03/2024 à 13:23, Andrei Lazăr a écrit : At this very moment, specifying different compression algorithms per column is supported and in my use case it is extremely helpful, as I have some columns (mostly containing floats), for which a compression algorithm like Snappy (or

Re: [C++][Parquet] Support different compression algorithms per row group

2024-03-23 Thread Andrei Lazăr

Hi Aldrin, thanks for taking the time to reply to my email! In my understanding, compression on Parquet files happens on the Data Page level for every column, meaning that even across a row group, there can be multiple units of data compression, and most certainly there are going to be different

Re: [C++][Parquet] Support different compression algorithms per row group

2024-03-22 Thread Aldrin

Hello! I don't do much with compression, so I could be wrong, but I assume a compression algorithm spans the whole column and areas of large variance generally benefit less from the compression, but the encoding still provides benefits across separate areas (e.g. separate row groups). My impress

Re: [C++][Parquet] Support different compression algorithms per row group

2024-03-22 Thread Andrei Lazăr

Hi Gang, Thanks a lot for getting back to me! So the use case I am having is relatively simple: I was playing around with some data and I wanted to benchmark different compression algorithms in an effort to speed up data retrieval in a simple Parquet based database that I am playing around with.

Re: [C++][Parquet] Support different compression algorithms per row group

2024-03-20 Thread Gang Wu

Hi Andrei, What is your use case? IMHO, exposing this kind of configuration will force users to know how will the writer split row groups, which does not look simple to me. Best, Gang On Thu, Mar 21, 2024 at 2:25 AM Andrei Lazăr wrote: > Hi all, > > I would like proposing adding support for wr

[C++][Parquet] Support different compression algorithms per row group

2024-03-20 Thread Andrei Lazăr

Hi all, I would like proposing adding support for writing a Parquet file with different compression algorithms for every row group. In my understanding, the Parquet format allows this, however it seems to me that there is no way to achieve this from the C++ implementation. Does anyone have any t

Re: [C++][Parquet] Support different compression algorithms per row group

Re: [C++][Parquet] Support different compression algorithms per row group

Re: [C++][Parquet] Support different compression algorithms per row group

Re: [C++][Parquet] Support different compression algorithms per row group

Re: [C++][Parquet] Support different compression algorithms per row group

Re: [C++][Parquet] Support different compression algorithms per row group

Re: [C++][Parquet] Support different compression algorithms per row group

Re: [C++][Parquet] Support different compression algorithms per row group

[C++][Parquet] Support different compression algorithms per row group

9 matches

Site Navigation

Mail list logo

Footer information