Joachim Haga created ARROW-7112:
-----------------------------------
Summary: Wrong contents when initializinga pyarrow.Table from
boolean DataFrame
Key: ARROW-7112
URL: https://issues.apache.org/jira/browse/ARROW-7112
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 0.14.1
Environment: Tested with 0.14.1 and 0.14.0.RAY from pip3 on ubuntu
Reporter: Joachim Haga
When initializing a Table from a boolean pandas.DataFrame _that is not in
Fortran order_, the contents of the resulting Table is different from the
contents of the DataFrame.
Sample:
{code:java}
import pandas as pd
import pyarrow as pa
import numpy as np
mask = np.full((3,3), False)
mask[:,1] = True
df = pd.DataFrame(mask)
print(df)
print(pa.table(df).to_pandas())
{code}
The output:
{noformat}
0 1 2
0 False True False
1 False True False
2 False True False
0 1 2
0 False True False
1 False False False
2 False False False
{noformat}
I.e., column 1 is different before and after roundtripping through pa.Table.
If I add *{{order='F'}}* to the *{{np.full}}* invocation, the result is as
expected. Also, the problem seems to disappear if I use {{*dtype=int*}}.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)