[ https://issues.apache.org/jira/browse/TIKA-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17930387#comment-17930387 ]
Tim Allison commented on TIKA-4381: ----------------------------------- K. The trick was to actually read the specs and to rely on some great work already done on Apache POI. Specifically, the work in parsing the nameidchunks was critical. That data has to be used to map the local-in-file storage ids with the actual ids and long ids in MS-OXPROPS. This is what we're now pulling from an appointment msg file in our unit tests: {noformat} mapi:raw:PidLidAgingDontAgeMe : false mapi:raw:PidLidAppointmentAuxiliaryFlags : 0 mapi:raw:PidLidAppointmentColor : 0 mapi:raw:PidLidAppointmentCounterProposal : false mapi:raw:PidLidAppointmentDuration : 30 mapi:raw:PidLidAppointmentEndWhole : 2017-02-28T19:00:00Z mapi:raw:PidLidAppointmentNotAllowPropose : false mapi:raw:PidLidAppointmentProposalNumber : 0 mapi:raw:PidLidAppointmentProposedDuration : 0 mapi:raw:PidLidAppointmentSequence : 0 mapi:raw:PidLidAppointmentStartWhole : 2017-02-28T18:30:00Z mapi:raw:PidLidAppointmentStateFlags : 0 mapi:raw:PidLidAppointmentSubType : false mapi:raw:PidLidAutoFillLocation : false mapi:raw:PidLidBusyStatus : 2 mapi:raw:PidLidClipEnd : 2017-02-28T19:00:00Z mapi:raw:PidLidClipStart : 2017-02-28T18:30:00Z mapi:raw:PidLidCommonEnd : 2017-02-28T19:00:00Z mapi:raw:PidLidCommonStart : 2017-02-28T18:30:00Z mapi:raw:PidLidConferencingType : 0 mapi:raw:PidLidCurrentVersion : 166965 mapi:raw:PidLidFInvited : false mapi:raw:PidLidIntendedBusyStatus : -1 mapi:raw:PidLidPrivate : false mapi:raw:PidLidRecurrenceType : 0 mapi:raw:PidLidRecurring : false mapi:raw:PidLidReminderDelta : 15 mapi:raw:PidLidReminderSet : false mapi:raw:PidLidReminderSignalTime : 4501-01-01T00:00:00Z mapi:raw:PidLidReminderTime : 2017-02-28T18:30:00Z mapi:raw:PidLidResponseStatus : 0 mapi:raw:PidLidSideEffects : 369 mapi:raw:PidLidTaskMode : 0 mapi:raw:PidLidValidFlagStringProof : 2017-02-28T18:42:23Z {noformat} > Improve extraction of metadata from Appointment/Task msgs > --------------------------------------------------------- > > Key: TIKA-4381 > URL: https://issues.apache.org/jira/browse/TIKA-4381 > Project: Tika > Issue Type: Task > Reporter: Tim Allison > Priority: Major > Attachments: Parser.java > > > Our metadata extraction on msgs is mostly focused on "NOTE"/regular emails. > We could do to improve extraction from appointments, tasks and other msg > types. -- This message was sent by Atlassian Jira (v8.20.10#820010)