This is an awesome sentimate, thank you Release orchestrstors and
contributors!
Cheers,
Lucas
On Thu, Aug 27, 2020 at 1:26 PM Jorge Cardoso Leitão <
jorgecarlei...@gmail.com> wrote:
> Hi,
>
>
>
> I am writing to just thank all those involved in the release process.
>
> Sometimes the work of rele
So it seems in 'pyarrow==0.15.0' `Table.columns` now returns ChunkedArray
instead of Column. This has broken `Table.cast()` as it just calls
`Table.itercolumns` and expects the yielded values to have a `.cast()` method,
which ChunkedArray doesn't.
Was `Table.cast()` missed in cleaning up after
Lucas Pickup created ARROW-1436:
---
Summary: PyArrow Timestamps written to Parquet as INT96 appear in
Spark as 'bigint'
Key: ARROW-1436
URL: https://issues.apache.org/jira/browse/ARROW-1436
Lucas Pickup created ARROW-1435:
---
Summary: PyArrow not propagating timezone information from Parquet
to Pyhon
Key: ARROW-1435
URL: https://issues.apache.org/jira/browse/ARROW-1435
Project: Apache Arrow
Please reply to: lucas.pic...@microsoft.com
Outlook isn't playing nice.
Apologies, Lucas Pickup
-Original Message-
From: Lucas Pickup [mailto:lucas.pic...@microsoft.com.INVALID]
Sent: Wednesday, August 30, 2017 10:47 AM
To: dev@arrow.apache.org
Subject: PyArrow not retaining Pa
_: int64
-- metadata --
pandas: {"pandas_version": "0.20.3", "columns": [{"name": "DateNaive",
"pandas_type": "datetime", "numpy_type": "datetime64[ns]", "metadata": null},
{"name": "DateAware", "pandas_type": "datetimetz", "numpy_type":
"datetime64[ns, UTC]", "metadata": {"timezone": "UTC"}}], "index_columns":
["__index_level_0__"]}
>>>
>>> pyarrowDF = pyarrowTable.to_pandas()
>>> pyarrowDF
DateNaive DateAware
0 2015-07-05 23:50:00 2015-07-05 23:50:00
>>>
This was on PyArrow 0.6.0.
Cheers, Lucas Pickup
Here is the pyspark script I used to see this difference.
On Mon, 28 Aug 2017 at 09:20 Lucas Pickup
wrote:
> Hi all,
>
> Very sorry if people already responded to this at:
> lucas.pic...@microsoft.com There was an INVALID identifier attached to
> the end of the reply address
'ns', tz='GMT'))
newColumn = pa.Column.from_array(newField, newArray)
table = table.remove_column(i)
table = table.add_column(i, newColumn)
return table
Cheers, Lucas Pickup
l in chunkedToArray(table[i].data)],
pa.timestamp('ns', tz='GMT'))
newColumn = pa.Column.from_array(newField, newArray)
table = table.remove_column(i)
table = table.add_column(i, newColumn)
return table
Cheers, Lucas Pickup
From: Lucas Pick
Date
0 2015-07-06 06:50:00
1 2015-07-06 06:30:00
I would've expected to end up with the same datetime from both readers since
there was no timezone attached at any point. It just a date and time value.
Am I missing anything here? Or is this a bug.
Cheers, Lucas Pickup
is
here<https://github.com/apache/spark/blob/cba826d00173a945b0c9a7629c66e36fa73b723e/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala#L565>.
I was wondering if there was a reason why the implementations have such a major
difference when it comes to schema generation?
Cheers, Lucas Pickup
11 matches
Mail list logo