(bump to expose the discussion to more readers)

On Mon, May 4, 2020 at 4:57 PM Jungtaek Lim <kabhwan.opensou...@gmail.com>
wrote:

> Hi devs,
>
> There're couple of issues being reported on the user@ mailing list which
> results in being affected by inconsistent schema on Encoders.bean.
>
> 1. Typed datataset from Avro generated classes? [1]
> 2. spark structured streaming GroupState returns weird values from sate [2]
>
> Below is a part of JavaTypeInference.inferDataType() which handles beans:
>
>
> https://github.com/apache/spark/blob/f72220b8ab256e8e6532205a4ce51d50b69c26e9/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala#L139-L157
>
> it collects properties based on the availability of getter.
>
> (It's applied as well as `SQLContext.beansToRows`.)
>
> JavaTypeInference.serializerFor() and JavaTypeInference.deserializerFor()
> aren't. They collect properties based on the available of both getter and
> setter.
> (It calls JavaTypeInference.inferDataType() inside the method, making
> inconsistent even only these method is called.)
>
> This inconsistent produces runtime issues when Java bean only has getter
> for some fields, even there's no such field for the getter method - as
> getter/setter methods are determined by naming convention.
>
> I feel this is something we should fix, but would like to see opinions on
> how to fix it. If the user query has the problematic beans but hasn't
> encountered such issue, fixing the issue would drop off some columns, which
> would be backward incompatible. I think this is still the way to go, but if
> we concern more on not breaking existing query, we may want to at least
> document the ideal form of the bean Spark expects.
>
> Would like to hear opinions on this.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> 1.
> https://lists.apache.org/thread.html/r8f8e680e02955cdf05b4dd34c60a9868288fd10a03f1b1b8627f3d84%40%3Cuser.spark.apache.org%3E
> 2.
> http://mail-archives.apache.org/mod_mbox/spark-user/202003.mbox/%3ccafx8l21dzbyv5m1qozs3y+pcmycwbtjko6ytwvkydztq7u4...@mail.gmail.com%3e
>

Reply via email to