We really need some documents to define what non-deterministic means.
AFAIK, non-deterministic expressions may produce a different result for the
same input row, if the already processed input rows are different.
The optimizer tries its best to not change the input sequence
of non-deterministic ex
+1
On Thu, Nov 7, 2019 at 6:54 PM Shane Knapp wrote:
> +1
>
> On Thu, Nov 7, 2019 at 6:08 PM Hyukjin Kwon wrote:
> >
> > +1
> >
> > 2019년 11월 6일 (수) 오후 11:38, Wenchen Fan 님이 작성:
> >>
> >> Sounds reasonable to me. We should make the behavior consistent within
> Spark.
> >>
> >> On Tue, Nov 5, 20
+1
On Thu, Nov 7, 2019 at 6:08 PM Hyukjin Kwon wrote:
>
> +1
>
> 2019년 11월 6일 (수) 오후 11:38, Wenchen Fan 님이 작성:
>>
>> Sounds reasonable to me. We should make the behavior consistent within Spark.
>>
>> On Tue, Nov 5, 2019 at 6:29 AM Bryan Cutler wrote:
>>>
>>> Currently, when a PySpark Row is cre
+1
2019년 11월 6일 (수) 오후 11:38, Wenchen Fan 님이 작성:
> Sounds reasonable to me. We should make the behavior consistent within
> Spark.
>
> On Tue, Nov 5, 2019 at 6:29 AM Bryan Cutler wrote:
>
>> Currently, when a PySpark Row is created with keyword arguments, the
>> fields are sorted alphabetically.
Hi all,
To enable wide-scale community testing of the upcoming Spark 3.0 release,
the Apache Spark community has posted a preview release of Spark 3.0. This
preview is *not a stable release in terms of either API or functionality*,
but it is meant to give the community early access to try the code
That was very interesting, thanks Enrico.
Sean, IIRC it also prevents push down of the UDF in Catalyst in some cases.
Regards,
Ruben
> On 7 Nov 2019, at 11:09, Sean Owen wrote:
>
> Interesting, what does non-deterministic do except have this effect?
> aside from the naming, it could be a fin
Interesting, what does non-deterministic do except have this effect?
aside from the naming, it could be a fine use of this flag if that's
all it effectively does. I'm not sure I'd introduce another flag with
the same semantics just over naming. If anything 'expensive' also
isn't the right word, mor
Hi all,
Running expensive deterministic UDFs that return complex types, followed
by multiple references to those results cause Spark to evaluate the UDF
multiple times per row. This has been reported and discussed before:
SPARK-18748 SPARK-17728
val f: Int => Array[Int]
val udfF = ud