RE: Finding document language?

2016-07-20 Thread Allison, Timothy B.
This doesn't answer your question on HWPF. Last I looked at this, a few years ago, I figured out how to get the language via OLE, and it was so rarely populated that it was better to run language id on the extracted content. For language id (in Java), consider optimaize or yalder -Orig

Re: Finding document language?

2016-07-20 Thread Branden Visser
Hi Timothy, thanks for your reply. I'm not trying to learn what the language of a document is, I'm actually just trying to see if the language of the document was set and if so, what it was set to. That said, do you recall how to get the language metadata? Thanks, Branden On Wed, Jul 20, 2016 at

RE: Finding document language?

2016-07-20 Thread Allison, Timothy B.
Again, this may miss the mark of the document language. This [1] points out how to get the language from each run in HWPF: CharacterRun.getLanguageCode(); in XWPF, the lang can be stored in the run's properties: here is the text [1] http://stackoverflow.com/questions/28904283/generate-a-word-

Re: Finding document language?

2016-07-20 Thread Branden Visser
Thanks again Timothy, that info is helpful. It sounds like HWPF simply doesn't have any document-level language setting. FWIW, I've found that the language code of the character runs are quite reliably set for the most part. Additionally, it seems for XWPF that the custom properties does retain th