Hello!

I think the main goal you're trying to achieve is to use Arrow for processing 
some product details (e.g. brand name) in a tabular format without storing the 
entirety of product details in the table itself.

I would think that you could store all of the product details in Arrow without 
too much overhead (when you first load it into memory), but I'll not dive into 
details there since you want to avoid it.

As Andrew mentioned, you could use a column of vector positions instead of a 
column of shared_ptr, then use the vector positions to access wherever you're 
storing your shared pointers. This is similar to a foreign key to a different 
table.

An alternate, but delicate (aka real risky), approach could be to store the raw 
pointer as a column of type uintptr_t (which you might approximate with a 
uint64_t). There may not be much benefit compared to the foreign key approach, 
since you'd have to iterate over the column values and do a type cast in order 
to dereference the pointer, but it may reduce the hit of an indirect lookup 
depending on how you're storing your shared pointers.




# ------------------------------

# Aldrin


https://github.com/drin/

https://gitlab.com/octalene

https://keybase.io/octalene


On Wednesday, October 9th, 2024 at 14:12, Andrew Bell 
<andrew.bell...@gmail.com> wrote:

> You could give each product an ID number and use that as a proxy.
> 

> On Wed, Oct 9, 2024 at 5:01 PM Yi Cao cao.yi.s...@gmail.com wrote:
> 

> > Let's take a simple example. No network connection is involved. Say I can 
> > have an array table of digital products, which has one column of shared_ptr 
> > pointing to a product object allocated on heap. I would like to do 
> > filtering on the column "brand" using the value "Samsung". Therefore I can 
> > get all rows of "Samsung" products and by accessing the column of shared 
> > pointer , I can access details of this product. Without using a shared 
> > pointer, I would have to copy the product details into multiple columns of 
> > this table. If I save all these shared pointers in a separate vector, then 
> > I cannot do filtering like that in the arrow table.
> > 

> > The challenge for me is how to store a shared_ptr in a "cell" of an arrow 
> > table. It seems to me only the primitive types are supported, but I would 
> > like to confirm. I think the "extension" type might help with my scenario 
> > but I'm not sure how to make it work. If it's a simple type like integer, I 
> > can do IntBuilder to build an array and make a record batch out of it.
> > 

> > Hope this provides a bit of clarity. Thank you.
> > 

> > On Wed, 9 Oct 2024 at 19:12, Andrew Bell andrew.bell...@gmail.com wrote:
> > 

> > > On Wed, Oct 9, 2024, 12:27 PM Yi Cao cao.yi.s...@gmail.com wrote:
> > > 

> > > > If I place these shared ptrs in a vector, how can I make this vector 
> > > > saved in Arrow table as a column? Is it possible?
> > > 

> > > What do you mean by "saved"?
> > > 

> > > I don't understand the point of placing shared pointers in an arrow 
> > > array. It's essentially equivalent to storing the pointers in a vector. 
> > > You can't write shared pointers to a data store or send them across a 
> > > network connection.
> 

> 

> 

> 

> --
> Andrew Bell
> andrew.bell...@gmail.com

Attachment: publickey - octalene.dev@pm.me - 0x21969656.asc
Description: application/pgp-keys

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to