Re: [DISCUSS][C++] Store C++ shared_ptr in arrow table

2024-10-30 Thread Yi Cao
Hi Jorge, I find your previous comments here can probably solve my issue. Could you please provide more insights how this can be achieved equivalently for C++ objects(shared_ptr) putting into arrow table ? Thanks a lot in advance. “This use-case seems semantically equivalent with storing python

Re: [DISCUSS][C++] Store C++ shared_ptr in arrow table

2024-10-11 Thread Yi Cao
First of all, thank you so much for your inputs and great insights! Integer-Pointer round trip does not seem a reliable way to me. We experienced subtle UB in some case before, which is one of the reasons we look at Arrow. Regarding Jorge's 4 options, Option 1-3 is not considered due to (de)serial

Re: [DISCUSS][C++] Store C++ shared_ptr in arrow table

2024-10-11 Thread Jorge Cardoso Leitão
AFAIK uintptr_t being internally stored as an integer does not make it equivalent to uint64_t - compilers use the type to set them apart, see the example in [1]. ptr2int2ptr can result in UB in subtle ways, due to how C/C++ are specified and translated to LLVM IR. Storying pointers as arrow intege

Re: [DISCUSS][C++] Store C++ shared_ptr in arrow table

2024-10-10 Thread Andrew Bell
On Thu, Oct 10, 2024 at 4:18 PM Felipe Oliveira Carvalho wrote: > > Hi, > > Yi Cao's request comes from a misunderstanding of where the performance of > Arrow comes from. > > Arrow arrays follow the SoA paradigm [1]. The moment you start thinking about > individual objects with an associated ref

Re: [DISCUSS][C++] Store C++ shared_ptr in arrow table

2024-10-10 Thread Felipe Oliveira Carvalho
Hi, Yi Cao's request comes from a misunderstanding of where the performance of Arrow comes from. Arrow arrays follow the SoA paradigm [1]. The moment you start thinking about individual objects with an associated ref-count (std::shared_ptr) is the moment you've given up the SoA approach and you a

Re: [DISCUSS][C++] Store C++ shared_ptr in arrow table

2024-10-10 Thread Aldrin
I'm fairly sure uintptr_t is an integer type for holding a pointer in C++ (docs specifically say "to void" aka `void*`). It should be equivalent to uint64_t on 64-bit systems, but where I agree it is risky is that it is going to be platform dependent and there are likely nuances for certain comp

Re: [DISCUSS][C++] Store C++ shared_ptr in arrow table

2024-10-10 Thread Jorge Cardoso Leitão
Hi, This use-case seems semantically equivalent with storing python objects in arrow for the purpose of putting them in an arrow table. This can be achieved by some form of pickling or indirection (I recall Polars and others doing one of these). Imo there are different approaches with different t

Re: [DISCUSS][C++] Store C++ shared_ptr in arrow table

2024-10-09 Thread Weston Pace
If your goal is to use Arrow to do the computation then having shared pointers will not help. Arrow's computation kernels (filters, selection, etc.) are designed to be fast because they run on columns of data. If you have a collection of objects (rows) then there isn't going to be anything in Arr

Re: [DISCUSS][C++] Store C++ shared_ptr in arrow table

2024-10-09 Thread Aldrin
Hello! I think the main goal you're trying to achieve is to use Arrow for processing some product details (e.g. brand name) in a tabular format without storing the entirety of product details in the table itself. I would think that you could store all of the product details in Arrow without to

Re: [DISCUSS][C++] Store C++ shared_ptr in arrow table

2024-10-09 Thread Andrew Bell
You could give each product an ID number and use that as a proxy. On Wed, Oct 9, 2024 at 5:01 PM Yi Cao wrote: > > Let's take a simple example. No network connection is involved. Say I can > have an array table of digital products, which has one column of shared_ptr > pointing to a product obj

Re: [DISCUSS][C++] Store C++ shared_ptr in arrow table

2024-10-09 Thread Yi Cao
Let's take a simple example. No network connection is involved. Say I can have an array table of digital products, which has one column of shared_ptr pointing to a product object allocated on heap. I would like to do filtering on the column "brand" using the value "Samsung". Therefore I can get al

Re: [DISCUSS][C++] Store C++ shared_ptr in arrow table

2024-10-09 Thread Andrew Bell
On Wed, Oct 9, 2024, 12:27 PM Yi Cao wrote: > If I place these shared ptrs in a vector, how can I make this vector saved > in Arrow table as a column? Is it possible? > What do you mean by "saved"? I don't understand the point of placing shared pointers in an arrow array. It's essentially equiv

Re: [DISCUSS][C++] Store C++ shared_ptr in arrow table

2024-10-09 Thread Yi Cao
If I place these shared ptrs in a vector, how can I make this vector saved in Arrow table as a column? Is it possible? On Wed, 9 Oct 2024 at 16:59, Andrew Bell wrote: > On Wed, Oct 9, 2024 at 11:41 AM Yi Cao wrote: > > > > > > Hi, > > > I want to store pointers to avoid copy of large amount of

Re: [DISCUSS][C++] Store C++ shared_ptr in arrow table

2024-10-09 Thread Yi Cao
Can you elaborate more about this? How can I use shared_ptr as a buffer in array? On Wed, 9 Oct 2024 at 16:50, Felipe Oliveira Carvalho wrote: > You would have to use a std::shared_ptr as a buffer in one of the > array layouts in a manner that’s compatible with the type. > > On Wed, 9 Oct 2024 a

Re: [DISCUSS][C++] Store C++ shared_ptr in arrow table

2024-10-09 Thread Andrew Bell
On Wed, Oct 9, 2024 at 11:41 AM Yi Cao wrote: > > Hi, > I want to store pointers to avoid copy of large amount of data. And then I > can pass such table and extract pointers from the column and access object it > points to. Is there some reason not to place your shared_ptr's in a C++ container

Re: [DISCUSS][C++] Store C++ shared_ptr in arrow table

2024-10-09 Thread Felipe Oliveira Carvalho
You would have to use a std::shared_ptr as a buffer in one of the array layouts in a manner that’s compatible with the type. On Wed, 9 Oct 2024 at 12:41 Yi Cao wrote: > Hi, > I want to store pointers to avoid copy of large amount of data. And then I > can pass such table and extract pointers fro

Re: [DISCUSS][C++] Store C++ shared_ptr in arrow table

2024-10-09 Thread Yi Cao
Hi, I want to store pointers to avoid copy of large amount of data. And then I can pass such table and extract pointers from the column and access object it points to. Thanks On Wed, 9 Oct 2024 at 13:14, Xiufeng Huang wrote: > I think arrow structures are supposed to hold data. Why would you wa

Re: [DISCUSS][C++] Store C++ shared_ptr in arrow table

2024-10-09 Thread Xiufeng Huang
I think arrow structures are supposed to hold data. Why would you want to store pointers in arrow structures any way? On Wed, Oct 9, 2024 at 3:29 PM Yi Cao wrote: > Hi Arrow community, > Need some advice here! > > Our C++ application processes tabular data and Apache Arrow looks > promising in o

[DISCUSS][C++] Store C++ shared_ptr in arrow table

2024-10-09 Thread Yi Cao
Hi Arrow community, Need some advice here! Our C++ application processes tabular data and Apache Arrow looks promising in our case. I try to implement the scenario below in c++ arrow however, cannot find a solution or a similar example. Could anyone please share your thoughts ? Say here is a tabl