Re: Compression in Arrow - Question

2020-08-30 Thread Micah Kornfield
The data is often index: time, value float, OR Index:Float > (length measure), Value:Float, But not always: Value could be one of > int(8,16,32,64), float(32,64), string, vector(float32/64), etc. Hence > why I'm liking Arrow as the standard 'format' for this da

RE: Compression in Arrow - Question

2020-08-30 Thread mark
be safely encoded within. -Original Message- From: Micah Kornfield Sent: Sunday, August 30, 2020 6:20 PM To: Wes McKinney Cc: dev Subject: Re: Compression in Arrow - Question Agreed, I think it would be useful to make sure the "compute" interfaces have the right hooks to sup

Re: Compression in Arrow - Question

2020-08-30 Thread Micah Kornfield
Agreed, I think it would be useful to make sure the "compute" interfaces have the right hooks to support alternate encodings. On Sunday, August 30, 2020, Wes McKinney wrote: > That said, there is nothing preventing the development of programming > interfaces for compressed / encoded data right n

Re: Compression in Arrow - Question

2020-08-30 Thread Wes McKinney
That said, there is nothing preventing the development of programming interfaces for compressed / encoded data right now. When it comes to transporting such data, that's when we will have to decide on what to support and what new metadata structures are required. For example, we could add RLE to C

Re: Compression in Arrow - Question

2020-08-29 Thread Micah Kornfield
Hi Mark, See the most recent previous discussion about alternate encodings [1]. This is something that in the long run should be added, I'd personally prefer to start with simpler encodings. I don't think we should add anything more with regard to compression/encoding until at least 3 languages su

Compression in Arrow - Question

2020-08-29 Thread mark
I was looking at compression in arrow had a couple questions. If I've understood compression currently, it is only used 'in flight' in either IPC or Arrow Flight, using a block compression, but still decoded into Ram at the destination in full array form. Is this correct ? Given that