Here's a one-liner that does it, but I expect it's moderately slower than the RecordBatch version:
pa.array(df.itertuples(index=False), type=pa.struct([pa.field(col, pa.from_numpy_dtype(df.dtypes[col])) for col in df.columns])) Most of the complexity is in the 'type'. It's less scary than it looks, and if you can afford multiple lines I think it's almost readable: fields = [pa.field(col, pa.from_numpy_dtype(df.dtypes[col])) for col in df.columns] pa_type = pa.struct(fields) pa.array(df.itertuples(index=False, type=pa_type) But this seems like a classic XY problem. What is the root issue you're trying to solve? Why avoid RecordBatch? On Mon, Jun 12, 2023 at 11:14 AM Li Jin <ice.xell...@gmail.com> wrote: > !-------------------------------------------------------------------| > This Message Is From an Untrusted Sender > You have not previously corresponded with this sender. > See https://itconnect.uw.edu/email-tags for additional > information. Please contact the UW-IT Service Center, > h...@uw.edu 206.221.5000, for assistance. > |-------------------------------------------------------------------! > > Gentle bump. > > Not a big deal if I need to use the API above to do so, but bump in case > someone has a better way. > > On Fri, Jun 9, 2023 at 4:34 PM Li Jin <ice.xell...@gmail.com> wrote: > > > Hello, > > > > I am looking for the best ways for converting Pandas DataFrame <-> Struct > > Array. > > > > Currently I have: > > > > pa.RecordBatch.from_pandas(df).to_struct_array() > > > > and > > > > pa.RecordBatch.from_struct_array(s_array).to_pandas() > > > > - I wonder if there is a direct way to go from DataFrame <-> Struct Array > > without going through RecordBatch? > > > > Thanks, > > Li > > >