Joachim Haga created ARROW-7112: ----------------------------------- Summary: Wrong contents when initializinga pyarrow.Table from boolean DataFrame Key: ARROW-7112 URL: https://issues.apache.org/jira/browse/ARROW-7112 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.14.1 Environment: Tested with 0.14.1 and 0.14.0.RAY from pip3 on ubuntu Reporter: Joachim Haga
When initializing a Table from a boolean pandas.DataFrame _that is not in Fortran order_, the contents of the resulting Table is different from the contents of the DataFrame. Sample: {code:java} import pandas as pd import pyarrow as pa import numpy as np mask = np.full((3,3), False) mask[:,1] = True df = pd.DataFrame(mask) print(df) print(pa.table(df).to_pandas()) {code} The output: {noformat} 0 1 2 0 False True False 1 False True False 2 False True False 0 1 2 0 False True False 1 False False False 2 False False False {noformat} I.e., column 1 is different before and after roundtripping through pa.Table. If I add *{{order='F'}}* to the *{{np.full}}* invocation, the result is as expected. Also, the problem seems to disappear if I use {{*dtype=int*}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)