Can we fix this bug in Spark 3.5.0?
https://issues.apache.org/jira/browse/SPARK-44884
On Wed, Aug 30, 2023 at 11:51 AM Sean Owen wrote:
> It looks good except that I'm getting errors running the Spark Connect
> tests at the end (Java 17, Scala 2.13) It looks like I missed something
> necessary t
It looks good except that I'm getting errors running the Spark Connect
tests at the end (Java 17, Scala 2.13) It looks like I missed something
necessary to build; is anyone getting this?
[ERROR] [Error]
/tmp/spark-3.5.0/connector/connect/server/target/generated-test-sources/protobuf/java/org/apach
Thanks for the detailed explanation.
Regards,
Chetan
On Tue, Aug 29, 2023, 4:50 PM Mich Talebzadeh
wrote:
> OK, let us take a deeper look here
>
> ANALYSE TABLE mytable COMPUTE STATISTICS FOR COLUMNS *(c1, c2), c3*
>
> In above, we are *explicitly grouping columns c1 and c2 together for
> wh
+1 (non binding)
Tested Spark Connect fully isolated and with PySpark build. Tested as well
some of the new PySpark ML Connect features
On Tue 29. Aug 2023 at 18:25 Yuanjian Li wrote:
> Please vote on releasing the following candidate(RC3) as Apache Spark
> version 3.5.0.
>
> The vote is open u
Please vote on releasing the following candidate(RC3) as Apache Spark
version 3.5.0.
The vote is open until 11:59pm Pacific time Aug 31st and passes if a
majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
[ ] +1 Release this package as Apache Spark 3.5.0
[ ] -1 Do not release this pac
OK, let us take a deeper look here
ANALYSE TABLE mytable COMPUTE STATISTICS FOR COLUMNS *(c1, c2), c3*
In above, we are *explicitly grouping columns c1 and c2 together for which
we want to compute statistic*s. Additionally, we are also *computing
statistics for column c3 independen*t*ly*. This ap
Thank you, Martin! I got it working now using the same shading rules in my
project as in Spark.
From: Martin Grund
Date: Monday, 28. August 2023 at 17:58
To: Stefan Hagedorn
Cc: dev@spark.apache.org
Subject: Re: Spark Connect: API mismatch in SparkSesession#execute
Hi Stefan,
There are some c
Hi,
If we are taking this up, then would ask can we support multicolumn stats
such as :
ANALYZE TABLE mytable COMPUTE STATISTICS FOR COLUMNS (c1,c2), c3
This should help in estimating better for conditions involving c1 and c2
Thanks.
On Tue, 29 Aug 2023 at 09:05, Mich Talebzadeh
wrote:
> short
short answer on top of my head
My point was with regard to Cost Based Optimizer (CBO) in traditional
databases. The concept of a rowkey in HBase is somewhat similar to that of
a primary key in RDBMS.
Now in databases with automatic deduplication features (i.e. ignore
duplication of rowkey), inser