On Tue, 1 Oct 2019 at 21:03, Maarten Ballintijn wrote:
>
> I ran cProfile to understand better what is going on in Pandas. Using your
> code below I find that
> Pandas runs a loop over generic the datetime64 conversion in case the
> datetime64 is not in ’ns’.
> The conversion unpacks the time int
I ran cProfile to understand better what is going on in Pandas. Using your code
below I find that
Pandas runs a loop over generic the datetime64 conversion in case the
datetime64 is not in ’ns’.
The conversion unpacks the time into a date-time struct and converts the
date-time struct back
into
Some answers to the other questions:
On Sat, 28 Sep 2019 at 22:16, Maarten Ballintijn wrote:
> ...
> This leaves me with the following questions:
>
> - Who should I talk to to get this resolved in Pandas?
>
> You can open an issue on their tracker:
https://github.com/pandas-dev/pandas/issues/
On Sat, Sep 28, 2019 at 3:16 PM Maarten Ballintijn wrote:
>
> Hi Joris,
>
> Thanks for your detailed analysis!
>
> We can leave the impact of the large DateTimeIndex on the performance for
> another time.
> (Notes: my laptop has sufficient memory to support it, no error is thrown, the
> resulting
Hi Joris,
Thanks for your detailed analysis!
We can leave the impact of the large DateTimeIndex on the performance for
another time.
(Notes: my laptop has sufficient memory to support it, no error is thrown, the
resulting DateTimeIndex from the expression is identical to your version or the
ot
>From looking a little bit further into this, it seems that it is mainly
pandas who is slower in creating a Series from an array of datetime64
compared from an array of ints.
And especially if it is not nanosecond resolution:
In [29]: a_int = pa.array(np.arange(10))
In [30]: %timeit a_int.to_
Hi Maarten,
Thanks for the reproducible script. I ran it on my laptop on pyarrow
master, and not seeing the difference between both datetime indexes:
Versions:
Python: 3.7.3 | packaged by conda-forge | (default, Mar 27 2019,
23:01:00)
[GCC 7.3.0] on linux
numpy:1.16.4
pandas: 0.26.0.dev0+
Hi,
The code to show the performance issue with DateTimeIndex is at:
https://gist.github.com/maartenb/256556bcd6d7c7636d400f3b464db18c
It shows three case 0) int index, 1) datetime index, 2) date time index created
in a slightly roundabout way
I’m a little confused by the two d
hi
On Tue, Sep 24, 2019 at 9:26 AM Maarten Ballintijn wrote:
>
> Hi Wes,
>
> Thanks for your quick response.
>
> Yes, we’re using Python 3.7.4, from miniconda and conda-forge, and:
>
> numpy: 1.16.5
> pandas: 0.25.1
> pyarrow: 0.14.1
>
> It looks like 0.15 is close, so
Hi Wes,
Thanks for your quick response.
Yes, we’re using Python 3.7.4, from miniconda and conda-forge, and:
numpy: 1.16.5
pandas: 0.25.1
pyarrow: 0.14.1
It looks like 0.15 is close, so I can wait for that.
Theoretically I see three components driving the performance:
hi Maarten,
Are you using the master branch or 0.14.1? There are a number of
performance regressions in 0.14.0/0.14.1 that are addressed in the
master branch, to appear as 0.15.0 relatively soon.
As a file format, Parquet (and columnar formats in general) is not
known to perform well with more th
11 matches
Mail list logo