Hi Team,

During the root cause analysis of an Iceberg serialization issue [1], we
have found that *DataOutputSerializer.writeUTF* has a hard limit on the
length of the string (64k). This is inherited from the *DataOutput.writeUTF*
method, where the JDK specifically defines this limit [2].

For our use-case we need to enable the possibility to serialize longer UTF
strings, so we will need to define a *writeLongUTF* method with a similar
specification than the *writeUTF*, but without the length limit.

My question is:
- Is it something which would be useful for every Flink user? Shall we add
this method to *DataOutputSerializer*?
- Is it very specific for Iceberg, and we should keep it in Iceberg
connector code?

Thanks,
Peter

[1] - https://github.com/apache/iceberg/issues/9410
[2] -
https://docs.oracle.com/javase/8/docs/api/java/io/DataOutput.html#writeUTF-java.lang.String-

Reply via email to