@Dongjoon Hyun Thanks! This is a blocking
ticket. It returns a wrong result due to our undefined behavior. I agree we
should revert the newly added map-oriented functions. In 3.0 release, we
need to define the behavior of duplicate keys in the data type MAP and fix
all the related issues that are
Ah now I see the problem. `map_filter` has a very weird semantic that is
neither "earlier entry wins" or "latter entry wins".
I've opened https://github.com/apache/spark/pull/22821 , to remove these
newly added map-related functions from FunctionRegistry(for 2.4.0), so that
they are invisible to e
Yeah, I can pretty much agree with that. Before we get into release
candidates, it's not as big a deal if something gets labeled as a blocker.
Once we are into an RC, I'd like to see any discussions as to whether
something is or isn't a blocker at least cross-referenced in the RC VOTE
thread so tha
For the first question, it's `bin/spark-sql` result. I didn't check STS,
but it will return the same with `bin/spark-sql`.
> I think map_filter is implemented correctly. map(1,2,1,3) is actually
map(1,2) according to the "earlier entry wins" semantic. I don't think this
will change in 2.4.1.
For
Just my two cents of the past experience. As a release manager of Spark
2.3.2, I felt significantly delay during the release by block issues. Vote
was failed several times by one or two "block issue". I think during the RC
time, each "block issue" should be carefully evaluated by the related PMCs
a
> Let's understand statements like "X is not a blocker" to mean "I don't
think that X is a blocker". Interpretations not proclamations, backed up by
reasons, not all of which are appeals to policy and precedent.
Might not be a big deal and out of the topic but I rather hope people
explicitly avoid
> spark-sql> select map(1,2,1,3); // Spark 2.4.0 RC4
> {1:3}
Are you running in the thrift-server? Then maybe this is caused by the bug
in `Dateset.collect` as I mentioned above.
I think map_filter is implemented correctly. map(1,2,1,3) is actually
map(1,2) according to the "earlier entry wins" s
Thank you for the follow-ups.
Then, Spark 2.4.1 will return `{1:2}` differently from the followings
(including Spark/Scala) in the end?
I hoped to fix the `map_filter`, but now Spark looks inconsistent in many
ways.
scala> sql("select map(1,2,1,3)").show // Spark 2.2.2
+---+
|map(1,
Hi Dongjoon,
Thanks for reporting it! This is indeed a bug that needs to be fixed.
The problem is not about the function `map_filter`, but about how the map
type values are created in Spark, when there are duplicated keys.
In programming languages like Java/Scala, when creating map, the later
en
I don't think separate API or RPCs etc might be necessary for queryable
state if the state can be exposed as just another datasource. Then the sql
queries can be issued against it just like executing sql queries against
any other data source.
For now I think the "memory" sink could be used as a s
Hi, All.
-0 due to the following issue. From Spark 2.4.0, users may get an incorrect
result when they use new `map_fitler` with `map_concat` functions.
https://issues.apache.org/jira/browse/SPARK-25823
SPARK-25823 is only aiming to fix the data correctness issue from
`map_filter`.
PMC members a
I don't know; possibly just because it wasn't available whenever Kryo
was first used in the project.
Skimming the code, the KryoSerializerInstance looks like a wrapper
that provides a Kryo object to do work. It already maintains a 'pool'
of just 1 instance. Is the point that KryoSerializer can sha
Hi,
I am wondering about the implementation of KryoSerializer, specifically the
lack of use of KryoPool, which is recommended by Kryo themselves.
Looking at the code, it seems that frequently KryoSerializer.newInstance is
called, followed by a serialize and then this instance goes out of scope,
t
Shifting this to dev@. See the PR https://github.com/apache/spark/pull/22144
for more context.
There will be no objective, complete definition of blocker, or even
regression or correctness issue. Many cases are clear, some are not. We can
draw up more guidelines, and feel free to open PRs against
Any update on this, or anybody facing/faced similar issue. Any suggestion will
be appreciated.
Thanks
-Davinder
From: Davinder Kumar
Sent: Wednesday, October 17, 2018 11:01 AM
To: dev
Subject: Hadoop-Token-Across-Kerberized -Cluster
Hello All,
Need one he
Severity: Low
Vendor: The Apache Software Foundation
Versions Affected:
1.3.x release branch and later, including master
Description:
Spark's Apache Maven-based build includes a convenience script, 'build/mvn',
that downloads and runs a zinc server to speed up compilation. This server
will accep
16 matches
Mail list logo