Re: Question about POJO rules - why fields should be public or have public setter/getter?

Naehee Kim Mon, 12 Jul 2021 15:10:18 -0700

Hi Dawid,

Thanks for your reply. Good to know it is due to historic and compatibility
reasons.

The reason why I started looking into POJO rules is to understand if Scala
Case Class can conform to POJO rules to support schema evolution. In our
case, we store several Scala Case Classes to RocksDB state backend and
those classes can evolve over time, mostly simply new fields being added.
At each time of change, we should start with a fresh state because the
flink job cannot restart from the previous savepoint. It looks like it uses
Kryo and Kryo doesn't support schema evolution. We'd like to support schema
evolution so that the job can start with a savepoint.

I have a few questions regarding this. In v1.11 or upper version,
(1) What is the best way to support schema evolution for Scala Case Class
if not following POJO rules? Should I develop a Serializer Snapshot
inheriting TypeSerializer for Case Class?
(2) What is the purpose of  ScalaCaseClassSerializer and
CaseClassSerializer? Why two serializers for Case Class?
(3) Why Case Class doesn't have a matching Serializer Snapshot? What
serializer Scala Case Class falls into? Is it Kryo?

I'd appreciate it if you answer my questions.

Regards,
Naehee

On Thu, Jul 8, 2021 at 3:25 AM Dawid Wysakowicz <dwysakow...@apache.org>
wrote:

> Hi Naehee,
>
> Short answer would be for historic reasons and compatibility reasons. It
> was implemented that way back in the days and we don't want to change the
> default type extraction logic. Otherwise user jobs that rely on the default
> type extraction logic for state storing would end up with a state stored in
> an incompatible way with the updated serializer.
>
> This is not a problem for Table/SQL programs as we control the state
> internally, and that's why we were able to change the requirements for
> POJOs in Table/SQL programs. [1]
>
> Best,
>
> Dawid
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/types/#user-defined-data-types
> On 08/07/2021 00:09, Naehee Kim wrote:
>
> According to the Flink doc,
>
> Flink recognizes a data type as a POJO type (and allows “by-name” field
> referencing) if the following conditions are fulfilled:
>
>    - The class is public and standalone (no non-static inner class)
>    - The class has a public no-argument constructor
>    - All non-static, non-transient fields in the class (and all
>    superclasses) are either public (and non-final) or have a public getter-
>    and a setter- method that follows the Java beans naming conventions for
>    getters and setters.
>
>
> PojoSerializer uses Java reflection to access an object's fields. One of
> PojoSerializer's constructor calls setAccessible(true) for all fields.
> for (int i = 0; i < numFields; i++) {
>    this.fields[i].setAccessible(true);
> }
> Then, to my knowledge, it can set a field regardless of the field's access
> control(private, public,..).
>
> However, its another constructor, called by PojoSerializerSnapshot,
> doesn't call setAccessible(true). Does anyone know the reason why
> setAccessible(true) is not called here? And why fields should be public or
> have a public gettter- and setter- method?
>
> Regards,
> Naehee
>
>
>

Re: Question about POJO rules - why fields should be public or have public setter/getter?

Reply via email to