-Phadoop-provided still includes hadoop jars

2020-10-12 Thread Kimahriman
When I try to build a distribution with either -Phive or -Phadoop-cloud along with -Phadoop-provided, I still end up with hadoop jars in the distribution. Specifically, with -Phive and -Phadoop-provided, you end up with hadoop-annotations, hadoop-auth, and hadoop-common included in the Spark jars,

sortWithinPartitions in Structured Streaming

2020-04-08 Thread Kimahriman
Currently, all sorting is disallowed with structured streaming queries. Not allowing global sorting makes sense, as you can't sort an infinite list, but could non-global sorting (i.e. sortWithinPartitions) be allowed? I'm running into this with an external source I'm using, but not sure if this wou