Re: equation images in a .doc

2014-05-29 Thread Andreas Beeker
Hi Bing, I haven't (re-)searched for the checksum algorithm yet, but I would be happy if you tell me, what you have found out. Thank you, Andi On 29.05.2014 04:49, Bing Ran wrote: I figured out the checksum:) - To unsu

Re: equation images in a .doc

2014-05-28 Thread Bing Ran
I figured out the checksum:) 2014-05-29 10:37 GMT+08:00 Bing Ran : > Thanks Andi! I'm working towards that direction. > > I'm wondering how the checksum is calculated. That's the last two bytes in > the header... > > Thanks for helping! > > Bing > > > > > > 2014-05-29 4:04 GMT+08:00 Andreas Bee

Re: equation images in a .doc

2014-05-28 Thread Bing Ran
Thanks Andi! I'm working towards that direction. I'm wondering how the checksum is calculated. That's the last two bytes in the header... Thanks for helping! Bing 2014-05-29 4:04 GMT+08:00 Andreas Beeker : > Hi Bing, > > the wmf code is in a branch, because I'd like to commit (save) it wit

Re: equation images in a .doc

2014-05-28 Thread Andreas Beeker
Hi Bing, the wmf code is in a branch, because I'd like to commit (save) it without interfering with the trunk ... and it's still far from being finished. Not sure, how the github synchronisation works, but I guess "it" only fetches the trunk. And for the header: I couldn't find the reference o

Re: equation images in a .doc

2014-05-28 Thread Bing Ran
Using JWord from IndependentSoft to extract images and I got the same set of wmf files, only 22 bytes longer. Those files displayed properly. So the headers were chopped off by POI. I'm wondering how tools like JWord recover the head information... 2014-05-29 0:34 GMT+08:00 Bing Ran : > Hi

Re: equation images in a .doc

2014-05-28 Thread Bing Ran
Hi Andreas, Thanks for answer. The raw data was acquired from overriding the AbstractWordConverter.processingImage()... in the hwpf package, by calling picture.getContent(). I cannot immediately figure out how to reset the header after reading your code reference. BTW, I was using a local compil

Re: equation images in a .doc

2014-05-28 Thread Andreas Beeker
Hi Bing, maybe the wmfs are missing the wmf header, which can be chopped off, when the wmf is embedded [2] - so if the size is 22 bytes to big, this would be a good indication. I've started to implement wmf parsing a while ago and maybe you can recreate the header with the WmfPlaceableHeader c

equation images in a .doc

2014-05-28 Thread Bing Ran
Hi, New to the list but I have pressing need to extract all the embedded equation images from an Word 97 .doc file (not .docx). I know all those images are in WMF format. After I dumped the picture content (from the Picture.getContent()) to a file, I found that the file was not entirely a valid W