On Tue, Sep 17, 2013 at 09:37:16AM +0200, Sébastien Hinderer wrote:
> > If you can come up with a
> > sensible looking patch which works for your example file and doesn't
> > break others, I'm happy to add it to the package.
>
> Many thanks. I really appreciate. The "doesn't break others" part might
> be a bit difficult to prove but I'll try.
I'm not expecting absolute proof, but it'd be good to test it on a
selection of word documents, and compare output with and without
the patch.
> > You're probably best off talking to the upstream author, though I don't
> > think he's actively working on antiword now as the last release was
> > 2005-10-21.
>
> That was also my guess. Do you think the address I used in Cc of
> the original bug report is the best one to try to contact him? I'm
> asking because I noticed that you didn't Cc this address in your
> response, so that made me think that perhaps the addess is not good.
Don't read anything into that - it's just an artifact of how I replied
(I just fetched the mailbox for the bug with bts show -m, so replied
to the message as it was before the X-Debbugs-Cc got processed).
> Thanks. If I can't talk to anybody and have to discover things by myself
> it may take me some time to come up with a patch because the spec of the
> format offered by Microsoft is non-trivial and, for me, not so easy to
> read and understand.
It might be worth trying some of the other options (if you haven't
already).
wv has a command line extractor (wvText), which in my experience handles
some files better than antiword (and others less well). Sadly it isn't
actively maintained upstream either these days (last release was just
under 3 years ago). ISTR antiword is faster than wvText.
There's wv2, but that doesn't come with a command line tool - it's
just a library. That's also not active upstream (last release nearly 4
years ago).
There's also unoconv which uses libreoffice to do the extraction - that
means the extraction code is actively maintained upstream, and it seems
to work with most files I've tried. The downside is it is rather slow
and memory hungry, and I've found it randomly fails sometimes. I think
the issues stem from trying to remote control libreoffice, which of
course thinks it's a GUI application rather than a command line tool
or library.
Cheers,
Olly
--
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]