[ https://issues.apache.org/jira/browse/ARROW-18298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17635819#comment-17635819 ]
Joris Van den Bossche commented on ARROW-18298: ----------------------------------------------- bq. I thought initially it was just how it was presented, as going back to pandas in this example from the table gives the "correct" representation of the value: Yes, this is in this case the cause of the confusion. The dates are not "wrong" after conversion to arrow, they are just confusingly printed in UTC without any indication of this. We have ARROW-14567 to track this issue. bq. However, placing mixed timezones makes the behavior more apparent in that it is coercing to the first timezone. That's a separate issue (and something that doesn't happen that often, for example also pandas requires a single timezone for a column, if you have a datetime64 dtype). But indeed, Arrow's timestamp type requires a single timezone, and thus when encountering multiple ones, we currently coerce to the first one. I think it would be better to coerce to UTC instead (-> ARROW-5912). There is some discussion about the use case of actually having multiple timezones in a single array at ARROW-16540 > [Python] datetime shifted when using pyarrow.Table.from_pandas to load a > pandas DateFrame containing datetime with timezone > --------------------------------------------------------------------------------------------------------------------------- > > Key: ARROW-18298 > URL: https://issues.apache.org/jira/browse/ARROW-18298 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 9.0.0 > Environment: MacOS M1, Python 3.8.13 > Reporter: Adam Ling > Priority: Major > > Problem: > When using pyarrow.Table.from_pandas to load a pandas DataFrame which > contains a timestamp object with timezone information, the created Table > object will shift the datetime, while still keeping the timezone information. > Please see my scripts. > > Reproduce scripts: > {code:java} > import pandas as pd > import pyarrow > ts = pd.Timestamp("2022-10-21 22:46:17", tz="America/Los_Angeles") > df = pd.DataFrame({"TS": [ts]}) > table = pyarrow.Table.from_pandas(df) > print(df) > """ > TS > 0 2022-10-21 22:46:17-07:00 > """ > print(table) > """ > pyarrow.Table > TS: timestamp[ns, tz=America/Los_Angeles] > ---- > TS: [[2022-10-22 05:46:17.000000000]]""" {code} > Expected results: > The table should not shift the datetime when timezone information is provided. -- This message was sent by Atlassian Jira (v8.20.10#820010)