Re: [DISCUSS][C++] Store C++ shared_ptr in arrow table

2024-10-09 Thread Yi Cao
Can you elaborate more about this? How can I use shared_ptr as a buffer in array? On Wed, 9 Oct 2024 at 16:50, Felipe Oliveira Carvalho wrote: > You would have to use a std::shared_ptr as a buffer in one of the > array layouts in a manner that’s compatible with the type. > > On Wed, 9 Oct 2024 a

Re: [DISCUSS][C++] Store C++ shared_ptr in arrow table

2024-10-09 Thread Yi Cao
If I place these shared ptrs in a vector, how can I make this vector saved in Arrow table as a column? Is it possible? On Wed, 9 Oct 2024 at 16:59, Andrew Bell wrote: > On Wed, Oct 9, 2024 at 11:41 AM Yi Cao wrote: > > > > > > Hi, > > > I want to store pointers to avoid copy of large amount of

Extract objects from CompressedOutputStream

2024-10-09 Thread Robert McLeod
I am trying to write multiple tables/tensors into a single stream/file. Writing is straightforward, and I can read everything back out, but every effort I have tried to pick an individual element out of a compressed stream has failed. E.g. I would like to only extract Tensor #1 from the stream. I p

Re: [DISCUSS][C++] Store C++ shared_ptr in arrow table

2024-10-09 Thread Aldrin
Hello! I think the main goal you're trying to achieve is to use Arrow for processing some product details (e.g. brand name) in a tabular format without storing the entirety of product details in the table itself. I would think that you could store all of the product details in Arrow without to

Re: [DISCUSS][C++] Store C++ shared_ptr in arrow table

2024-10-09 Thread Felipe Oliveira Carvalho
You would have to use a std::shared_ptr as a buffer in one of the array layouts in a manner that’s compatible with the type. On Wed, 9 Oct 2024 at 12:41 Yi Cao wrote: > Hi, > I want to store pointers to avoid copy of large amount of data. And then I > can pass such table and extract pointers fro

Re: [DISCUSS][C++] Store C++ shared_ptr in arrow table

2024-10-09 Thread Yi Cao
Let's take a simple example. No network connection is involved. Say I can have an array table of digital products, which has one column of shared_ptr pointing to a product object allocated on heap. I would like to do filtering on the column "brand" using the value "Samsung". Therefore I can get al

Re: [DISCUSS][C++] Store C++ shared_ptr in arrow table

2024-10-09 Thread Andrew Bell
On Wed, Oct 9, 2024, 12:27 PM Yi Cao wrote: > If I place these shared ptrs in a vector, how can I make this vector saved > in Arrow table as a column? Is it possible? > What do you mean by "saved"? I don't understand the point of placing shared pointers in an arrow array. It's essentially equiv

Re: [DISCUSS][C++] Store C++ shared_ptr in arrow table

2024-10-09 Thread Andrew Bell
You could give each product an ID number and use that as a proxy. On Wed, Oct 9, 2024 at 5:01 PM Yi Cao wrote: > > Let's take a simple example. No network connection is involved. Say I can > have an array table of digital products, which has one column of shared_ptr > pointing to a product obj

Re: [DISCUSS][C++] Store C++ shared_ptr in arrow table

2024-10-09 Thread Yi Cao
Hi, I want to store pointers to avoid copy of large amount of data. And then I can pass such table and extract pointers from the column and access object it points to. Thanks On Wed, 9 Oct 2024 at 13:14, Xiufeng Huang wrote: > I think arrow structures are supposed to hold data. Why would you wa

Re: [DISCUSS][C++] Store C++ shared_ptr in arrow table

2024-10-09 Thread Andrew Bell
On Wed, Oct 9, 2024 at 11:41 AM Yi Cao wrote: > > Hi, > I want to store pointers to avoid copy of large amount of data. And then I > can pass such table and extract pointers from the column and access object it > points to. Is there some reason not to place your shared_ptr's in a C++ container

Re: [DISCUSS][C++] Store C++ shared_ptr in arrow table

2024-10-09 Thread Weston Pace
If your goal is to use Arrow to do the computation then having shared pointers will not help. Arrow's computation kernels (filters, selection, etc.) are designed to be fast because they run on columns of data. If you have a collection of objects (rows) then there isn't going to be anything in Arr

[DISCUSS][C++] Store C++ shared_ptr in arrow table

2024-10-09 Thread Yi Cao
Hi Arrow community, Need some advice here! Our C++ application processes tabular data and Apache Arrow looks promising in our case. I try to implement the scenario below in c++ arrow however, cannot find a solution or a similar example. Could anyone please share your thoughts ? Say here is a tabl

PyArrow <-> Pandas Timestamp Conversion Error

2024-10-09 Thread Karthik Deivasigamani via user
Hi, I have a simple usecase of merging data from multiple parquet file into a single file. Usually I'm dealing with 50 files of size 100k and trying to form a single parquet file. The code looks something like this : dfs = [] full_schema = None for s3_url in s3_urls: table = ds.dataset(s3_url

Re: [DISCUSS][C++] Store C++ shared_ptr in arrow table

2024-10-09 Thread Xiufeng Huang
I think arrow structures are supposed to hold data. Why would you want to store pointers in arrow structures any way? On Wed, Oct 9, 2024 at 3:29 PM Yi Cao wrote: > Hi Arrow community, > Need some advice here! > > Our C++ application processes tabular data and Apache Arrow looks > promising in o

Re: Extract objects from CompressedOutputStream

2024-10-09 Thread Aldrin
I could be wrong, but I think zstd naively (or by default) requires the whole stream to be decompressed before you can access any data within it (it is not "splittable" and does not support random access). There are ways to provide this capability by essentially compressing in segments. The best