A very simple example is
sql("select create_map(1, 'a', 2, 'b')")
.union(sql("select create_map(2, 'b', 1, 'a')"))
.distinct
By definition a map should not care about the order of its entries, so the
above query should return one record. However it returns 2 records before
SPARK-19893
On Sat,
Thanks!
yes, this would be an option of course.
HDFS or Alluxio.
Sincerely,
Michael Shtelma
On Fri, Jan 12, 2018 at 3:26 PM, Georg Heiler wrote:
> You could store the jar in hdfs. Then even in yarn cluster mode your give
> workaround should work.
> Michael Shtelma schrieb am Fr. 12. Jan. 2018 u
Hi,
It looks like ResolvedDataSourceSuite [1] is a left-over
(after ResolveDataSource?).
If not to be deleted, ResolvedDataSourceSuite should surely be renamed.
Correct?
[1]
https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/sources/ResolvedDataSourceSuite.s
Hi dev,
I have a question about how join strategies are defined.
I see that CartesianProductExec is used only for InnerJoin, while for other
kind of joins BroadcastNestedLoopJoinExec is used.
For reference:
https://github.com/apache/spark/blob/cd9f49a2aed3799964976ead06080a0f7044a0c3/sql/core/src
Hello everyone,
after one month without any reply on stackoverflow (
https://stackoverflow.com/questions/47789265/inconsistency-in-handling-duplicate-names-in-dataframe-schema)
I try to pose the question here.
Context: I am refactoring some code of mine, transforming scala methods
with a signature
The signatures and licenses look OK. Except for the missing k8s package,
the contents look OK. Tests look pretty good with "-Phive -Phadoop-2.7
-Pyarn" on Ubuntu 17.10, except that KafkaContinuousSourceSuite seems to
hang forever. That was just fixed and needs to get into an RC?
Aside from the Blo
Wan, Thanks a lot,! I see the issue now.
Do we have any JIRA's open for the future work to be done on this?
--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
-
To unsubscribe e-mail: dev-unsubscr...@sp