Re: [Discuss] Support an alternative memory layout for varchar/varbinary vectors

2019-07-14 Thread Fan Liya
@Wes McKinney, Thanks a lot for your comments and effort. The JIRA looks good. I will track it. Best, Liya Fan On Fri, Jul 12, 2019 at 10:31 PM Wes McKinney wrote: > hi Liya -- yes, it seems reasonable to defer the conversion from your > pointer-based extension representation to a proper VarCh

Re: [Discuss] Support an alternative memory layout for varchar/varbinary vectors

2019-07-12 Thread Wes McKinney
hi Liya -- yes, it seems reasonable to defer the conversion from your pointer-based extension representation to a proper VarCharVector until you need to send over IPC. Note that there is no mechanism yet in Java with extension types to cause a conversion to take place when the IPC step is reached.

Re: [Discuss] Support an alternative memory layout for varchar/varbinary vectors

2019-07-11 Thread Fan Liya
@Wes McKinney, Thanks a lot for the brainstorming. I think your ideas are reasonable and feasible. About IPC, my idea is that we can send the vector as a PointerStringVector, and receive it as a VarCharVector, so that the overhead of memory compaction can be hidden. What do you think? Best, Liya

Re: [Discuss] Support an alternative memory layout for varchar/varbinary vectors

2019-07-11 Thread Fan Liya
@Uwe L. Korn Thanks a lot for the suggestion. I think this is exactly what we are doing right now. Best, Liya Fan On Thu, Jul 11, 2019 at 9:44 PM Wes McKinney wrote: > hi Liya -- have you thought about implementing this as an > ExtensionType / ExtensionVector? You actually can already do this,

Re: [Discuss] Support an alternative memory layout for varchar/varbinary vectors

2019-07-11 Thread Wes McKinney
hi Liya -- have you thought about implementing this as an ExtensionType / ExtensionVector? You actually can already do this, so if this helps you reference strings stored in some external memory then that seems reasonable. Such a PointerStringVector could have a method that converts it into the Arr

Re: [Discuss] Support an alternative memory layout for varchar/varbinary vectors

2019-07-11 Thread Uwe L. Korn
Hello Liya Fan, here your best approach is to copy into the Arrow format as you can then use this as the basis for working with the Arrow-native representation as well as your internal representation. You will have to use two different offset vector as those two will always differ but in the ca

Re: [Discuss] Support an alternative memory layout for varchar/varbinary vectors

2019-07-11 Thread Antoine Pitrou
Same as Uwe. Regards Antoine. Le 11/07/2019 à 14:05, Uwe L. Korn a écrit : > Hello Liya, > > I'm quite -1 on this type as Arrow is about efficient columnar structures. We > have opened the standard also to matrix-like types but always keep the > constraint of consecutive memory. Now also a

Re: [Discuss] Support an alternative memory layout for varchar/varbinary vectors

2019-07-11 Thread Fan Liya
Hi Korn, Thanks a lot for your comments. In my opinion, your comments make sense to me. Allowing non-consecutive memory segments will break some good design choices of Arrow. However, there are wide-spread user requirements for non-consecutive memory segments. I am wondering how can we help such

Re: [Discuss] Support an alternative memory layout for varchar/varbinary vectors

2019-07-11 Thread Uwe L. Korn
Hello Liya, I'm quite -1 on this type as Arrow is about efficient columnar structures. We have opened the standard also to matrix-like types but always keep the constraint of consecutive memory. Now also adding types where memory is no longer consecutive but spread in the heap will make the sco

[Discuss] Support an alternative memory layout for varchar/varbinary vectors

2019-07-10 Thread Fan Liya
Hi all, We are thinking of providing varchar/varbinary vectors with a different memory layout which exists in a wide range of systems. The memory layout is different from that of VarCharVector in the following ways: 1. Instead of storing (start offset, end offset), the new layout stores