Hi all,
I’ve been playing around with the Vector and Matrix UDTs in pyspark.ml and
I’ve found myself wanting more.
There is a minor issue in that with the arrow serialization enabled, these
types don’t serialize properly in python UDF calls or in toPandas. There’s
a natural representation for the
Great.
If we can upgrade the parquet dependency from 1.8.2 to 1.8.3 in Apache
Spark 2.3.1, let's upgrade orc dependency from 1.4.1 to 1.4.3 together.
Currently, the patch is only merged into master branch now. 1.4.1 has the
following issue.
https://issues.apache.org/jira/browse/SPARK-23340
Best
Seems like this would make sense... we usually make maintenance releases
for bug fixes after a month anyway.
On Wed, Apr 11, 2018 at 12:52 PM, Henry Robinson wrote:
>
>
> On 11 April 2018 at 12:47, Ryan Blue wrote:
>
>> I think a 1.8.3 Parquet release makes sense for the 2.3.x releases of
>> S
On 11 April 2018 at 12:47, Ryan Blue wrote:
> I think a 1.8.3 Parquet release makes sense for the 2.3.x releases of
> Spark.
>
> To be clear though, this only affects Spark when reading data written by
> Impala, right? Or does Parquet CPP also produce data like this?
>
I don't know about parquet
I think a 1.8.3 Parquet release makes sense for the 2.3.x releases of Spark.
To be clear though, this only affects Spark when reading data written by
Impala, right? Or does Parquet CPP also produce data like this?
On Wed, Apr 11, 2018 at 12:35 PM, Henry Robinson wrote:
> Hi all -
>
> SPARK-2385
Hi all -
SPARK-23852 (where a query can silently give wrong results thanks to a
predicate pushdown bug in Parquet) is a fairly bad bug. In other projects
I've been involved with, we've released maintenance releases for bugs of
this severity.
Since Spark 2.4.0 is probably a while away, I wanted to
Hi,
I'm looking into the Parquet format support for the File source in
Structured Streaming.
The docs mention the use of the option 'mergeSchema' to merge the schemas
of the part files found.[1]
What would be the practical use of that in a streaming context?
In its batch counterpart, `mergeSchem