[ https://issues.apache.org/jira/browse/ARROW-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17661372#comment-17661372 ]
Rok Mihevc commented on ARROW-4350: ----------------------------------- This issue has been migrated to [issue #20917|https://github.com/apache/arrow/issues/20917] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Python] dtype=object arrays cannot be converted to a list-of-list ListArray > ---------------------------------------------------------------------------- > > Key: ARROW-4350 > URL: https://issues.apache.org/jira/browse/ARROW-4350 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.11.1, 0.12.0 > Reporter: yu peng > Assignee: Wes McKinney > Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Nested numpy arrays (as the scalar value) cannot be converted to a > list-of-list type array: > {code} > arr = np.empty(2, dtype=object) > arr[:] = [np.array([1, 2]), np.array([2, 3])] > pa.array([arr, arr]) > {code} > results in > {code:java} > ArrowTypeError: only size-1 arrays can be converted to Python scalars > {code} > Starting from lists of lists works fine: > {code} > lists = [[1, 2], [2, 3]] > pa.array([lists, lists]).type > {code} > {code:none} > ListType(list<item: list<item: int64>>) > {code} > Specifying the type explicitly as {{pa.array([arr, arr], > type=pa.list_(pa.list_(pa.int64())))}} does not help. > Due to this, a round-trip is not working, as the list of list type gives back > an array of arrays in python: > {code} > In [2]: lists = [[1, 2], [2, 3]] > ...: a = pa.array([lists, lists]) > > > In [3]: a.to_pandas() > > > Out[3]: > array([array([array([1, 2]), array([2, 3])], dtype=object), > array([array([1, 2]), array([2, 3])], dtype=object)], dtype=object) > In [4]: pa.array(a.to_pandas()) > > > --------------------------------------------------------------------------- > ArrowTypeError Traceback (most recent call last) > <ipython-input-4-9fee6dc9d0b8> in <module> > ----> 1 pa.array(a.to_pandas()) > ~/scipy/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib.array() > ~/scipy/repos/arrow/python/pyarrow/array.pxi in > pyarrow.lib._ndarray_to_array() > ~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status() > ArrowTypeError: only size-1 arrays can be converted to Python scalars > {code} > ---- > Origingal report: > {code:java} > In [19]: df = pd.DataFrame({'a': [[[1], [2]], [[2], [3]]], 'b': [1, 2]}) > In [20]: df.iloc[0].to_dict() > Out[20]: {'a': [[1], [2]], 'b': 1} > In [21]: pa.Table.from_pandas(df).to_pandas().iloc[0].to_dict() > Out[21]: {'a': array([array([1]), array([2])], dtype=object), 'b': 1} > In [24]: np.array(df.iloc[0].to_dict()['a']).shape > Out[24]: (2, 1) > In [25]: pa.Table.from_pandas(df).to_pandas().iloc[0].to_dict()['a'].shape > Out[25]: (2,) > {code} > Adding extra array type is not functioning as expected. > > More importantly, this would fail > > {code:java} > In [108]: df = pd.DataFrame({'a': [[[1, 2],[2, 3]], [[1,2], [2, 3]]], 'b': > [[1, 2],[2, 3]]}) > In [109]: df > Out[109]: > a b > 0 [[1, 2], [2, 3]] [1, 2] > 1 [[1, 2], [2, 3]] [2, 3] > In [110]: pa.Table.from_pandas(pa.Table.from_pandas(df).to_pandas()) > --------------------------------------------------------------------------- > ArrowTypeError Traceback (most recent call last) > <ipython-input-110-4a09836f807e> in <module>() > ----> 1 pa.Table.from_pandas(pa.Table.from_pandas(df).to_pandas()) > /Users/pengyu/.pyenv/virtualenvs/starscream/2.7.11/lib/python2.7/site-packages/pyarrow/table.pxi > in pyarrow.lib.Table.from_pandas() > 1215 <pyarrow.lib.Table object at 0x7f05d1fb1b40> > 1216 """ > -> 1217 names, arrays, metadata = pdcompat.dataframe_to_arrays( > 1218 df, > 1219 schema=schema, > /Users/pengyu/.pyenv/virtualenvs/starscream/2.7.11/lib/python2.7/site-packages/pyarrow/pandas_compat.pyc > in dataframe_to_arrays(df, schema, preserve_index, nthreads, columns, safe) > 379 arrays = [convert_column(c, t) > 380 for c, t in zip(columns_to_convert, > --> 381 convert_types)] > 382 else: > 383 from concurrent import futures > /Users/pengyu/.pyenv/virtualenvs/starscream/2.7.11/lib/python2.7/site-packages/pyarrow/pandas_compat.pyc > in convert_column(col, ty) > 374 e.args += ("Conversion failed for column {0!s} with type {1!s}" > 375 .format(col.name, col.dtype),) > --> 376 raise e > 377 > 378 if nthreads == 1: > ArrowTypeError: ('only size-1 arrays can be converted to Python scalars', > 'Conversion failed for column a with type object') > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)