Re: Inconsistent schema on Encoders.bean (reported issues from user@)

2020-06-26 Thread Jungtaek Lim
Somehow I just revisited the issue, and realized the issue is resolved in Spark 3.0.0. ExpressionEncoder is refactored in Spark 3.0.0 and schema is removed as a part of refactor, which seems to be a root cause as schema and the data types of serializer don't match in such case. ExpressionEncoder in

Re: Inconsistent schema on Encoders.bean (reported issues from user@)

2020-05-25 Thread Jungtaek Lim
I meant how to interpret Java Beans in Spark are not consistently defined. Unlike you've guessed, in most paths Spark uses "read-only" properties. (All the failed existing tests in my experiment have "read-only" properties.) The problematic case is when Java bean is used for read-write; one case i

Re: Inconsistent schema on Encoders.bean (reported issues from user@)

2020-05-24 Thread Sean Owen
Java Beans are well-defined; it's valid to have a getter- or setter-only property. That doesn't mean Spark can meaningfully use such a property, as it typically has to both read and write them. I guess it depends on context. For example, I don't see how you can have a deserializer without setters,

Re: Inconsistent schema on Encoders.bean (reported issues from user@)

2020-05-24 Thread Jungtaek Lim
OK I just went through the change, and the change breaks bunch of existing UTs. https://github.com/apache/spark/pull/28611 Note that I modified all the cases where Spark extracts the columns for "read method" only properties to both "read" & "write". It doesn't only change the code path of Encode

Re: Inconsistent schema on Encoders.bean (reported issues from user@)

2020-05-10 Thread Jungtaek Lim
First case is not tied to the batch / streaming as Encoders.bean simply fails when inferring schema. Second case is tied to the streaming, and I've described the reason in the last reply. I'm not sure we don't have similar case for batch though. (If there're some operators only relying on the sequ

Re: Inconsistent schema on Encoders.bean (reported issues from user@)

2020-05-10 Thread Wenchen Fan
is it a problem only for streaming or it affects batch queries as well? On Fri, May 8, 2020 at 11:42 PM Jungtaek Lim wrote: > The first case of user report is obvious - according to the user report, > AVRO generated code contains getter which denotes to itself hence Spark > disallows (throws exc

Re: Inconsistent schema on Encoders.bean (reported issues from user@)

2020-05-08 Thread Jungtaek Lim
The first case of user report is obvious - according to the user report, AVRO generated code contains getter which denotes to itself hence Spark disallows (throws exception), but it doesn't have matching setter method (if I understand correctly) so technically it shouldn't matter. For the second c

Re: Inconsistent schema on Encoders.bean (reported issues from user@)

2020-05-08 Thread Wenchen Fan
Can you give some simple examples to demonstrate the problem? I think the inconsistency would bring problems but don't know how. On Fri, May 8, 2020 at 3:49 PM Jungtaek Lim wrote: > (bump to expose the discussion to more readers) > > On Mon, May 4, 2020 at 4:57 PM Jungtaek Lim > wrote: > >> Hi

Re: Inconsistent schema on Encoders.bean (reported issues from user@)

2020-05-08 Thread Jungtaek Lim
(bump to expose the discussion to more readers) On Mon, May 4, 2020 at 4:57 PM Jungtaek Lim wrote: > Hi devs, > > There're couple of issues being reported on the user@ mailing list which > results in being affected by inconsistent schema on Encoders.bean. > > 1. Typed datataset from Avro generat

Inconsistent schema on Encoders.bean (reported issues from user@)

2020-05-04 Thread Jungtaek Lim
Hi devs, There're couple of issues being reported on the user@ mailing list which results in being affected by inconsistent schema on Encoders.bean. 1. Typed datataset from Avro generated classes? [1] 2. spark structured streaming GroupState returns weird values from sate [2] Below is a part of