Here's a one-liner that does it, but I expect it's moderately slower than
the RecordBatch version:

pa.array(df.itertuples(index=False), type=pa.struct([pa.field(col,
pa.from_numpy_dtype(df.dtypes[col])) for col in df.columns]))

Most of the complexity is in the 'type'. It's less scary than it looks, and
if you can afford multiple lines I think it's almost readable:

fields = [pa.field(col, pa.from_numpy_dtype(df.dtypes[col])) for col in
df.columns]
pa_type = pa.struct(fields)
pa.array(df.itertuples(index=False, type=pa_type)

But this seems like a classic XY problem. What is the root issue you're
trying to solve? Why avoid RecordBatch?

On Mon, Jun 12, 2023 at 11:14 AM Li Jin <ice.xell...@gmail.com> wrote:

> !-------------------------------------------------------------------|
>   This Message Is From an Untrusted Sender
>   You have not previously corresponded with this sender.
>   See https://itconnect.uw.edu/email-tags for additional
>   information.  Please contact the UW-IT Service Center,
>   h...@uw.edu 206.221.5000, for assistance.
> |-------------------------------------------------------------------!
>
> Gentle bump.
>
> Not a big deal if I need to use the API above to do so, but bump in case
> someone has a better way.
>
> On Fri, Jun 9, 2023 at 4:34 PM Li Jin <ice.xell...@gmail.com> wrote:
>
> > Hello,
> >
> > I am looking for the best ways for converting Pandas DataFrame <-> Struct
> > Array.
> >
> > Currently I have:
> >
> > pa.RecordBatch.from_pandas(df).to_struct_array()
> >
> > and
> >
> > pa.RecordBatch.from_struct_array(s_array).to_pandas()
> >
> > - I wonder if there is a direct way to go from DataFrame <-> Struct Array
> > without going through RecordBatch?
> >
> > Thanks,
> > Li
> >
>

Reply via email to