[
https://issues.apache.org/jira/browse/TIKA-930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404869#comment-13404869
]
Nick Burch commented on TIKA-930:
---------------------------------
In terms of dcmi, we tend to take a pragmatic view on metadata standards. If
it's good enough to be useful, and it won't confuse, use it! Try to keep things
simple though, so don't include a whole standard just for the sake of it... But
if it provides value then go for it
> Consolidation of Some Tika Core Properties
> ------------------------------------------
>
> Key: TIKA-930
> URL: https://issues.apache.org/jira/browse/TIKA-930
> Project: Tika
> Issue Type: Improvement
> Components: metadata
> Affects Versions: 1.2
> Reporter: Ray Gauss II
>
> There are a few properties in TikaCoreProperties which overlap and I think we
> should minimize ambiguity by consolidating them into a single composite
> property with the clearest name, the most general specification referenced as
> its primary property, and the others and deprecated strings as its
> secondaries.
> Here's the proposed pseudo-code for the changes:
> Remove TikaCoreProperties.SUBJECT
> TikaCoreProperties.KEYWORDS <- DublinCore.SUBJECT, { Office.KEYWORDS,
> MSOffice.KEYWORDS, Metadata.SUBJECT }
> Remove TikaCoreProperties.DATE
> TikaCoreProperties.CREATION_DATE <- DublinCore.DATE, { Office.CREATION_DATE,
> MSOffice.CREATION_DATE, Metadata.DATE }
> Remove TikaCoreProperties.MODIFIED
> TikaCoreProperties.SAVE_DATE <- DublinCore.MODIFIED, { Office.SAVE_DATE,
> MSOffice.LAST_SAVED, Metadata.MODIFIED, "Last-Modified" }
> and an example of the Java changes:
> {code:title=TikaCoreProperties.java *Before*}
> /**
> * @see DublinCore#SUBJECT
> */
> public static final Property SUBJECT =
> Property.composite(DublinCore.SUBJECT,
> new Property[] { Property.internalText(Metadata.SUBJECT) });
>
> /**
> * @see Office#KEYWORDS
> */
> public static final Property KEYWORDS =
> Property.composite(Office.KEYWORDS,
> new Property[] { Property.internalTextBag(MSOffice.KEYWORDS) });
> {code}
> would become
> {code:title= TikaCoreProperties.java *After*}
> /**
> * @see DublinCore#SUBJECT
> * @see Office#KEYWORDS
> */
> public static final Property KEYWORDS =
> Property.composite(DublinCore.SUBJECT,
> new Property[] {
> Office.KEYWORDS,
> Property.internalTextBag(MSOffice.KEYWORDS),
> Property.internalText(Metadata.SUBJECT)
> });
> {code}
> Since this would require a bit of refactoring for parsers that use the
> properties being removed I thought it best to get some feedback before
> working up a full patch.
> Does this seem like a reasonable approach?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira