[ 
https://issues.apache.org/jira/browse/ARROW-18298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17635819#comment-17635819
 ] 

Joris Van den Bossche commented on ARROW-18298:
-----------------------------------------------

bq. I thought initially it was just how it was presented, as going back to 
pandas in this example from the table gives the "correct" representation of the 
value:

Yes, this is in this case the cause of the confusion. The dates are not "wrong" 
after conversion to arrow, they are just confusingly printed in UTC without any 
indication of this. We have ARROW-14567 to track this issue.

bq. However, placing mixed timezones makes the behavior more apparent in that 
it is coercing to the first timezone.

That's a separate issue (and something that doesn't happen that often, for 
example also pandas requires a single timezone for a column, if you have a 
datetime64 dtype). But indeed, Arrow's timestamp type requires a single 
timezone, and thus when encountering multiple ones, we currently coerce to the 
first one. I think it would be better to coerce to UTC instead (-> ARROW-5912). 
There is some discussion about the use case of actually having multiple 
timezones in a single array at ARROW-16540



> [Python] datetime shifted when using pyarrow.Table.from_pandas to load a 
> pandas DateFrame containing datetime with timezone
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-18298
>                 URL: https://issues.apache.org/jira/browse/ARROW-18298
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 9.0.0
>         Environment: MacOS M1, Python 3.8.13
>            Reporter: Adam Ling
>            Priority: Major
>
> Problem:
> When using pyarrow.Table.from_pandas to load a pandas DataFrame which 
> contains a timestamp object with timezone information, the created Table 
> object will shift the datetime, while still keeping the timezone information. 
> Please see my scripts.
>  
> Reproduce scripts:
> {code:java}
> import pandas as pd
> import pyarrow
> ts = pd.Timestamp("2022-10-21 22:46:17", tz="America/Los_Angeles")
> df = pd.DataFrame({"TS": [ts]})
> table = pyarrow.Table.from_pandas(df)
> print(df)
> """
>                          TS
> 0 2022-10-21 22:46:17-07:00
> """
> print(table)
> """
> pyarrow.Table
> TS: timestamp[ns, tz=America/Los_Angeles]
> ----
> TS: [[2022-10-22 05:46:17.000000000]]""" {code}
> Expected results:
> The table should not shift the datetime when timezone information is provided.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to