(bump to expose the discussion to more readers) On Mon, May 4, 2020 at 4:57 PM Jungtaek Lim <kabhwan.opensou...@gmail.com> wrote:
> Hi devs, > > There're couple of issues being reported on the user@ mailing list which > results in being affected by inconsistent schema on Encoders.bean. > > 1. Typed datataset from Avro generated classes? [1] > 2. spark structured streaming GroupState returns weird values from sate [2] > > Below is a part of JavaTypeInference.inferDataType() which handles beans: > > > https://github.com/apache/spark/blob/f72220b8ab256e8e6532205a4ce51d50b69c26e9/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala#L139-L157 > > it collects properties based on the availability of getter. > > (It's applied as well as `SQLContext.beansToRows`.) > > JavaTypeInference.serializerFor() and JavaTypeInference.deserializerFor() > aren't. They collect properties based on the available of both getter and > setter. > (It calls JavaTypeInference.inferDataType() inside the method, making > inconsistent even only these method is called.) > > This inconsistent produces runtime issues when Java bean only has getter > for some fields, even there's no such field for the getter method - as > getter/setter methods are determined by naming convention. > > I feel this is something we should fix, but would like to see opinions on > how to fix it. If the user query has the problematic beans but hasn't > encountered such issue, fixing the issue would drop off some columns, which > would be backward incompatible. I think this is still the way to go, but if > we concern more on not breaking existing query, we may want to at least > document the ideal form of the bean Spark expects. > > Would like to hear opinions on this. > > Thanks, > Jungtaek Lim (HeartSaVioR) > > 1. > https://lists.apache.org/thread.html/r8f8e680e02955cdf05b4dd34c60a9868288fd10a03f1b1b8627f3d84%40%3Cuser.spark.apache.org%3E > 2. > http://mail-archives.apache.org/mod_mbox/spark-user/202003.mbox/%3ccafx8l21dzbyv5m1qozs3y+pcmycwbtjko6ytwvkydztq7u4...@mail.gmail.com%3e >