I think your original code roundtripping through RecordBatch (`pa.RecordBatch.from_pandas(df).to_struct_array()`) is the best option at the moment. The RecordBatch<->StructArray part is a cheap (zero-copy) conversion, and by using RecordBatch.from_pandas, you can rely on all pandas<->arrow conversion logic that is implemented in pyarrow (and which keeps the data columnar, in contrast to `df.itertuples()` which converts the data into rows of python objects as intermediate).
Given that the conversion through RecordBatch works nicely, I am not sure it is worth it to add new APIs to directly convert between StructArray and pandas DataFrames. Joris On Mon, 12 Jun 2023 at 20:32, Spencer Nelson <swnel...@uw.edu> wrote: > > Here's a one-liner that does it, but I expect it's moderately slower than > the RecordBatch version: > > pa.array(df.itertuples(index=False), type=pa.struct([pa.field(col, > pa.from_numpy_dtype(df.dtypes[col])) for col in df.columns])) > > Most of the complexity is in the 'type'. It's less scary than it looks, and > if you can afford multiple lines I think it's almost readable: > > fields = [pa.field(col, pa.from_numpy_dtype(df.dtypes[col])) for col in > df.columns] > pa_type = pa.struct(fields) > pa.array(df.itertuples(index=False, type=pa_type) > > But this seems like a classic XY problem. What is the root issue you're > trying to solve? Why avoid RecordBatch? > > On Mon, Jun 12, 2023 at 11:14 AM Li Jin <ice.xell...@gmail.com> wrote: > > > !-------------------------------------------------------------------| > > This Message Is From an Untrusted Sender > > You have not previously corresponded with this sender. > > See https://itconnect.uw.edu/email-tags for additional > > information. Please contact the UW-IT Service Center, > > h...@uw.edu 206.221.5000, for assistance. > > |-------------------------------------------------------------------! > > > > Gentle bump. > > > > Not a big deal if I need to use the API above to do so, but bump in case > > someone has a better way. > > > > On Fri, Jun 9, 2023 at 4:34 PM Li Jin <ice.xell...@gmail.com> wrote: > > > > > Hello, > > > > > > I am looking for the best ways for converting Pandas DataFrame <-> Struct > > > Array. > > > > > > Currently I have: > > > > > > pa.RecordBatch.from_pandas(df).to_struct_array() > > > > > > and > > > > > > pa.RecordBatch.from_struct_array(s_array).to_pandas() > > > > > > - I wonder if there is a direct way to go from DataFrame <-> Struct Array > > > without going through RecordBatch? > > > > > > Thanks, > > > Li > > > > >