[ 
https://issues.apache.org/jira/browse/AVRO-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138507#comment-17138507
 ] 

Ryan Skraba commented on AVRO-2299:
-----------------------------------

Hello -- once again, my apologies for arriving so late to this JIRA!  I have 
some specific feedback for the JIRA that might make it easier to merge quickly!

Everyone seems to agree that the Parsing Canonical Form for schemas is not 
sufficient for advanced "storage and comparison" of schemas, especially around 
long-lived Schema Registries and versioning/resolving schemas as artifacts.  
Fair enough, that's not what it's for!

The requirement is to be able to identify the "same" schemas regardless of any 
custom annotations (a.k.a. user JsonProperties in the Java SDK) that might be 
present.

If I understand your use case correctly, you'd like to be able to store a 
"cleaned" schema for describing and versioning your persistence, but also allow 
the devs to add/remove useful custom annotations (such as GDRP info) during 
processing.  It should be easy for the dev to find the cleaned schema from the 
annotated schema, or to determine that two differently annotated schemas are 
the "same".

[~tjwp]'s resolution canonical form is slightly different, excluding some 
reserved attributes in the spec that aren't used in resolution either (notably 
doc, order, and logicalType).

I'd like to propose **not** adding new canonical forms to the spec, but simply 
adding the tools to "normalize" any schema according to the same rules as the 
existing Parsing Canonical Form, but with an allowlist/blocklist for reserved 
and user properties.  And, of course, if logicalType is included, all of its 
sub-attributes should be included (for user-defined types).

It seems to me that this would be less constraining and a more generally useful 
strategy, providing a useful schema transformation tool for some language SDKS 
without multiplying the number of "Canonical Forms" supported or making them an 
obligatory part of a language SDK.

> Get Plain Schema
> ----------------
>
>                 Key: AVRO-2299
>                 URL: https://issues.apache.org/jira/browse/AVRO-2299
>             Project: Apache Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.9.0, 1.8.2, 1.9.1
>            Reporter: Rumeshkrishnan Mohan
>            Assignee: Doug Cutting
>            Priority: Major
>              Labels: features
>
> {panel:title=Avro Schema Reserved Keys:}
> "doc", "fields", "items", "name", "namespace",
>  "size", "symbols", "values", "type", "aliases", "default"
> {panel}
> AVRO also supports user defined properties for both Schema and Field.
> Is there way to get the schema with reserved property (key, value)? 
> Input Schema: 
> {code:java}
> {
>   "name": "testSchema",
>   "namespace": "com.avro",
>   "type": "record",
>   "fields": [
>     {
>       "name": "email",
>       "type": "string",
>       "doc": "email id",
>       "user_field_prop": "xxxxx"
>     }
>   ],
>   "user_schema_prop": "xxxxxx"
> }{code}
> Expected Plain Schema:
> {code:java}
> {
>   "name": "testSchema",
>   "namespace": "com.avro",
>   "type": "record",
>   "fields": [
>     {
>       "name": "email",
>       "type": "string",
>       "doc": "email id"
>     }
>   ]
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to