[ https://issues.apache.org/jira/browse/ARROW-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17660933#comment-17660933 ]
Rok Mihevc commented on ARROW-3909: ----------------------------------- This issue has been migrated to [issue #20522|https://github.com/apache/arrow/issues/20522] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Python] Table.from_pandas call that seemingly should zero copy does not > ------------------------------------------------------------------------ > > Key: ARROW-3909 > URL: https://issues.apache.org/jira/browse/ARROW-3909 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Reporter: Wes McKinney > Priority: Major > Fix For: 0.12.0 > > > While doing some performance testing, I noticed that a {{Table.from_pandas}} > call that ought to be zero-copy / free was taking 50ms > {code} > import pandas as pd > import pyarrow as pa > import numpy as np > K = 1000 > N = 50000000 > df = pd.DataFrame({'ints': np.tile(np.arange(K), N // K)}) > table = pa.Table.from_pandas(df) > {code} > I see > {code} > In [14]: timeit table = pa.Table.from_pandas(df) > 51.9 ms ± 751 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) > {code} > I haven't determined what's going on (is it counting nulls?), and initial > attempts to get a Flamegraph produced a bunch of "unknown" entries -- This message was sent by Atlassian Jira (v8.20.10#820010)