[ https://issues.apache.org/jira/browse/ARROW-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17662308#comment-17662308 ]
Rok Mihevc commented on ARROW-5286: ----------------------------------- This issue has been migrated to [issue #21754|https://github.com/apache/arrow/issues/21754] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Python] support Structs in Table.from_pandas given a known schema > ------------------------------------------------------------------ > > Key: ARROW-5286 > URL: https://issues.apache.org/jira/browse/ARROW-5286 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Reporter: Joris Van den Bossche > Assignee: Joris Van den Bossche > Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 40m > Remaining Estimate: 0h > > ARROW-2073 implemented creating a StructArray from an array of tuples (in > addition to from dicts). > This works in {{pyarrow.array}} (specifying the proper type): > {code} > In [2]: df = pd.DataFrame({'tuples': [(1, 2), (3, 4)]}) > > > In [3]: struct_type = pa.struct([('a', pa.int64()), ('b', pa.int64())]) > > > In [4]: pa.array(df['tuples'], type=struct_type) > > > Out[4]: > <pyarrow.lib.StructArray object at 0x7f1b02ff6818> > -- is_valid: all not null > -- child 0 type: int64 > [ > 1, > 3 > ] > -- child 1 type: int64 > [ > 2, > 4 > ] > {code} > But does not yet work when converting a DataFrame to Table while specifying > the type in a schema: > {code} > In [5]: pa.Table.from_pandas(df, schema=pa.schema([('tuples', struct_type)])) > > > --------------------------------------------------------------------------- > KeyError Traceback (most recent call last) > ~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in > get_logical_type(arrow_type) > 68 try: > ---> 69 return logical_type_map[arrow_type.id] > 70 except KeyError: > KeyError: 24 > During handling of the above exception, another exception occurred: > NotImplementedError Traceback (most recent call last) > <ipython-input-5-c18748f9b954> in <module> > ----> 1 pa.Table.from_pandas(df, schema=pa.schema([('tuples', struct_type)])) > ~/scipy/repos/arrow/python/pyarrow/table.pxi in > pyarrow.lib.Table.from_pandas() > ~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in > dataframe_to_arrays(df, schema, preserve_index, nthreads, columns, safe) > 483 metadata = construct_metadata(df, column_names, index_columns, > 484 index_descriptors, preserve_index, > --> 485 types) > 486 return all_names, arrays, metadata > 487 > ~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in construct_metadata(df, > column_names, index_levels, index_descriptors, preserve_index, types) > 207 metadata = get_column_metadata(df[col_name], > name=sanitized_name, > 208 arrow_type=arrow_type, > --> 209 field_name=sanitized_name) > 210 column_metadata.append(metadata) > 211 > ~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in > get_column_metadata(column, name, arrow_type, field_name) > 149 dict > 150 """ > --> 151 logical_type = get_logical_type(arrow_type) > 152 > 153 string_dtype, extra_metadata = get_extension_dtype_info(column) > ~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in > get_logical_type(arrow_type) > 77 elif isinstance(arrow_type, pa.lib.Decimal128Type): > 78 return 'decimal' > ---> 79 raise NotImplementedError(str(arrow_type)) > 80 > 81 > NotImplementedError: struct<a: int64, b: int64> > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)