Re: Distinct on Map data type -- SPARK-19893

2018-01-13 Thread Wenchen Fan
A very simple example is sql("select create_map(1, 'a', 2, 'b')") .union(sql("select create_map(2, 'b', 1, 'a')")) .distinct By definition a map should not care about the order of its entries, so the above query should return one record. However it returns 2 records before SPARK-19893 On Sat,

Re: Compiling Spark UDF at runtime

2018-01-13 Thread Michael Shtelma
Thanks! yes, this would be an option of course. HDFS or Alluxio. Sincerely, Michael Shtelma On Fri, Jan 12, 2018 at 3:26 PM, Georg Heiler wrote: > You could store the jar in hdfs. Then even in yarn cluster mode your give > workaround should work. > Michael Shtelma schrieb am Fr. 12. Jan. 2018 u

Remove or rename? What does ResolvedDataSourceSuite test?

2018-01-13 Thread Jacek Laskowski
Hi, It looks like ResolvedDataSourceSuite [1] is a left-over (after ResolveDataSource?). If not to be deleted, ResolvedDataSourceSuite should surely be renamed. Correct? [1] https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/sources/ResolvedDataSourceSuite.s

Join Strategies

2018-01-13 Thread Marco Gaido
Hi dev, I have a question about how join strategies are defined. I see that CartesianProductExec is used only for InnerJoin, while for other kind of joins BroadcastNestedLoopJoinExec is used. For reference: https://github.com/apache/spark/blob/cd9f49a2aed3799964976ead06080a0f7044a0c3/sql/core/src

transformSchema method policy for "duplicated" column names

2018-01-13 Thread Alessandro Solimando
Hello everyone, after one month without any reply on stackoverflow ( https://stackoverflow.com/questions/47789265/inconsistency-in-handling-duplicate-names-in-dataframe-schema) I try to pose the question here. Context: I am refactoring some code of mine, transforming scala methods with a signature

Re: [VOTE] Spark 2.3.0 (RC1)

2018-01-13 Thread Sean Owen
The signatures and licenses look OK. Except for the missing k8s package, the contents look OK. Tests look pretty good with "-Phive -Phadoop-2.7 -Pyarn" on Ubuntu 17.10, except that KafkaContinuousSourceSuite seems to hang forever. That was just fixed and needs to get into an RC? Aside from the Blo

Re: Distinct on Map data type -- SPARK-19893

2018-01-13 Thread ckhari4u
Wan, Thanks a lot,! I see the issue now. Do we have any JIRA's open for the future work to be done on this? -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@sp