Hello! I think the main goal you're trying to achieve is to use Arrow for processing some product details (e.g. brand name) in a tabular format without storing the entirety of product details in the table itself.
I would think that you could store all of the product details in Arrow without too much overhead (when you first load it into memory), but I'll not dive into details there since you want to avoid it. As Andrew mentioned, you could use a column of vector positions instead of a column of shared_ptr, then use the vector positions to access wherever you're storing your shared pointers. This is similar to a foreign key to a different table. An alternate, but delicate (aka real risky), approach could be to store the raw pointer as a column of type uintptr_t (which you might approximate with a uint64_t). There may not be much benefit compared to the foreign key approach, since you'd have to iterate over the column values and do a type cast in order to dereference the pointer, but it may reduce the hit of an indirect lookup depending on how you're storing your shared pointers. # ------------------------------ # Aldrin https://github.com/drin/ https://gitlab.com/octalene https://keybase.io/octalene On Wednesday, October 9th, 2024 at 14:12, Andrew Bell <andrew.bell...@gmail.com> wrote: > You could give each product an ID number and use that as a proxy. > > On Wed, Oct 9, 2024 at 5:01 PM Yi Cao cao.yi.s...@gmail.com wrote: > > > Let's take a simple example. No network connection is involved. Say I can > > have an array table of digital products, which has one column of shared_ptr > > pointing to a product object allocated on heap. I would like to do > > filtering on the column "brand" using the value "Samsung". Therefore I can > > get all rows of "Samsung" products and by accessing the column of shared > > pointer , I can access details of this product. Without using a shared > > pointer, I would have to copy the product details into multiple columns of > > this table. If I save all these shared pointers in a separate vector, then > > I cannot do filtering like that in the arrow table. > > > > The challenge for me is how to store a shared_ptr in a "cell" of an arrow > > table. It seems to me only the primitive types are supported, but I would > > like to confirm. I think the "extension" type might help with my scenario > > but I'm not sure how to make it work. If it's a simple type like integer, I > > can do IntBuilder to build an array and make a record batch out of it. > > > > Hope this provides a bit of clarity. Thank you. > > > > On Wed, 9 Oct 2024 at 19:12, Andrew Bell andrew.bell...@gmail.com wrote: > > > > > On Wed, Oct 9, 2024, 12:27 PM Yi Cao cao.yi.s...@gmail.com wrote: > > > > > > > If I place these shared ptrs in a vector, how can I make this vector > > > > saved in Arrow table as a column? Is it possible? > > > > > > What do you mean by "saved"? > > > > > > I don't understand the point of placing shared pointers in an arrow > > > array. It's essentially equivalent to storing the pointers in a vector. > > > You can't write shared pointers to a data store or send them across a > > > network connection. > > > > > -- > Andrew Bell > andrew.bell...@gmail.com
publickey - octalene.dev@pm.me - 0x21969656.asc
Description: application/pgp-keys
signature.asc
Description: OpenPGP digital signature