tballison commented on PR #2916:
URL: https://github.com/apache/tika/pull/2916#issuecomment-4846751029

   Even with agents to help out, I can't stomach 11k lines of code to nail down 
maybe 80% of an open set.
   
   I'm really worried about maintenance within the project and then clients 
having to rebuild their protos when we change metadata definitions.
   
   We've had churn on value types EVEN for dublin core over the history of the 
project. Even if we limit custom handling to that, clients will still have to 
rebuild their protos when we make changes.
   
   I'd be ok, maybe, with special handling for dublin core and some of the tika 
core properties: media type, etc.
   
   Fellow devs (@nddipiazza) what do you think about this?
   
   From claude:  The lossless catch-all is the right idea and the part that 
belongs in Tika — it's what should replace the removed fields map. I'd simplify 
its shape, though:  from repeated MetadataEntry with a typed oneof to a plain 
multivalue map<string, StringList>. That keeps the native dict lookup clients 
had with the old map<string,string>, fixes the real gap (multivalue), and drops 
the per-value typing — which for dynamic keys forces clients to branch on a 
6-way union on every read without giving them a compile-time typed accessor 
anyway. A new or renamed metadata key still never forces a client rebuild, 
because a key is data, not schema. On top of that map I'd add only 
special-cased DC + a few core props as typed strings.
   
   @krickert what, specifically, do you need within the Tika project and what 
can you do outside of Tika to meet your objectives?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to