Thanks Nick,

WRT test files, there's a test file associated with the bugzilla entry, did you 
mean something more targeted than this?

I've added some extra information from the Microsoft open specifications to the 
item which actually seems to make the use of this header in POI date metadata 
extraction a bit suspect.  It might be better to drop the use of the message 
submission chunk for date extraction moving forward.

For moving HSMF forward, I'll have a think about what you said.  My knowledge 
of the current code base is pretty slack.  Only the message submission chunk 
really!  Is the current code built upon HPSF or POIFS?  If so, I might start 
having a play with the Microsoft specs...

Adrian



-----Original Message-----
From: Nick Burch [mailto:[email protected]] 
Sent: 18 March 2015 12:47
To: POI Developers List
Subject: Re: Bugzilla item #57678 (Incorrect year parsing in message submission 
chunk)

On Tue, 17 Mar 2015, Adrian Conlon wrote:
> I submitted a bug report + patch a week or so ago, and I was wondering 
> whether one of the committers could take a look and see whether it 
> looks OK or not.
>
> https://bz.apache.org/bugzilla/show_bug.cgi?id=57678

Are you sure the logic for when to switch from 19xx to 20xx is right? If you 
could produce a test file and/or reference in the spec, that'd help!

> I realise that it isn't a major bug, but I'm using this as "testing 
> the water" for making other bug fixes for HSMF. With that in mind, I'd 
> appreciate pointers as to making accepting my changes as painless as 
> possible for committers to take.

HSMF started life before Microsoft released the file format specs, and is based 
around what we could figure out easily from hex dumps. It turns out that we got 
some key parts back-t-front. As such, pretty much only "variable length" 
properties are supported. While we do have some support for fixed length 
properties now (which actually cover most of them), we don't have a link 
between the properties in the propery table and their variable length chunks 
with their values in.

What it really needs is someone to spend some time with the spec, work out 
exactly how a variable length property in the properties chunk maps to a value 
chunk, and code up some logic to do that. With that in place, we can deprecate 
much of the current code driven by the value chunks, and replace it with a 
"proper" way of going via the properties list. That will also mean we can 
expose and use a lot more properties than we currently do, and possibly also 
avoid some hacky things like parsing string message headers to try to find dates

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected] For additional commands, 
e-mail: [email protected]

____________________________________________________________
Electronic mail messages entering and leaving Arup  business
systems are scanned for acceptability of content and viruses


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to