[ 
https://issues.apache.org/jira/browse/ARROW-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17660933#comment-17660933
 ] 

Rok Mihevc commented on ARROW-3909:
-----------------------------------

This issue has been migrated to [issue 
#20522|https://github.com/apache/arrow/issues/20522] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [Python] Table.from_pandas call that seemingly should zero copy does not
> ------------------------------------------------------------------------
>
>                 Key: ARROW-3909
>                 URL: https://issues.apache.org/jira/browse/ARROW-3909
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Wes McKinney
>            Priority: Major
>             Fix For: 0.12.0
>
>
> While doing some performance testing, I noticed that a {{Table.from_pandas}} 
> call that ought to be zero-copy / free was taking 50ms
> {code}
> import pandas as pd
> import pyarrow as pa
> import numpy as np
> K = 1000
> N = 50000000
> df = pd.DataFrame({'ints': np.tile(np.arange(K), N // K)})
> table = pa.Table.from_pandas(df)
> {code}
> I see
> {code}
> In [14]: timeit table = pa.Table.from_pandas(df)
> 51.9 ms ± 751 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
> {code}
> I haven't determined what's going on (is it counting nulls?), and initial 
> attempts to get a Flamegraph produced a bunch of "unknown" entries



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to