This is a good idea even if it may have some issues that we should probably document and warn users about:
1. Java based serialization is really practical for JVM based systems, but we should probably add a warning or documentation because Java serialization is not deterministic between JVMs so this could be a source for issues (usually companies use the same version of the JVM so this is less critical, but this still can happen specially now with all the different versions of Java and OpenJDK based flavors). 2. This is not cross language compatible, the String based representation (or even an Avro based representation of Schema) can be used in every language. Even with these I think just for ease of use it is worth to make Schema Serializable. Is the plan to fully serialize it, or just to make it a String and serialize the String as done in the issue Doug mentioned? If we take the first approach we need to properly test with a Schema that has elements of the full specification that (de)-serialization works correctly. Does anyone know if we have already a test schema that covers the full ‘schema’ specification to reuse it if so? On Mon, Jul 15, 2019 at 11:46 PM Driesprong, Fokko <fo...@driesprong.frl> wrote: > > Correct me if I'm wrong here. But as far as I understood the way of > serializing the schema is using Avro, as it is part of the file. To avoid > confusion there should be one way of serializing. > > However, I'm not sure if this is worth the hassle of not simply > implementing serializable. Also Flink there is a rather far from optimal > implementation: > https://github.com/apache/flink/blob/master/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/ParquetAvroWriters.java#L72 > This converts it to JSON and back while distributing the schema to the > executors. > > Cheers, Fokko > > Op ma 15 jul. 2019 om 23:03 schreef Doug Cutting <cutt...@gmail.com>: > > > I can't think of a reason Schema should not implement Serializable. > > > > There's actually already an issue & patch for this: > > > > https://issues.apache.org/jira/browse/AVRO-1852 > > > > Doug > > > > On Mon, Jul 15, 2019 at 6:49 AM Ismaël Mejía <ieme...@gmail.com> wrote: > > > > > +d...@avro.apache.org > > > > > > On Mon, Jul 15, 2019 at 3:30 PM Ryan Skraba <r...@skraba.com> wrote: > > > > > > > > Hello! > > > > > > > > I'm looking for any discussion or reference why the Schema object isn't > > > serializable -- I'm pretty sure this must have already been discussed > > (but > > > the keywords +avro +serializable +schema have MANY results in all the > > > searches I did: JIRA, stack overflow, mailing list, web) > > > > > > > > In particular, I was at a demo today where we were asked why Schemas > > > needed to be passed as strings to run in distributed tasks. I remember > > > running into this problem years ago with MapReduce, and again in Spark, > > and > > > again in Beam... > > > > > > > > Is there any downside to making a Schema implement > > > java.lang.Serializable? The only thing I can think of is that the schema > > > _should not_ be serialized with the data, and making it non-serializable > > > loosely enforces this (at the cost of continually writing different > > > flavours of "Avro holders" for when you really do want to serialize it). > > > > > > > > Willing to create a JIRA and work on the implementation, of course! > > > > > > > > All my best, Ryan > > > > >