Thanks again Timothy, that info is helpful. It sounds like HWPF simply doesn't have any document-level language setting. FWIW, I've found that the language code of the character runs are quite reliably set for the most part.
Additionally, it seems for XWPF that the custom properties does retain the document language when set in Word, which is nice. Whether or not it's accurate or useful is another task ;) All the best, Branden On Wed, Jul 20, 2016 at 10:24 AM, Allison, Timothy B. <talli...@mitre.org> wrote: > Again, this may miss the mark of the document language. > > This [1] points out how to get the language from each run in HWPF: > CharacterRun.getLanguageCode(); > > in XWPF, the lang can be stored in the run's properties: > <w:r><w:rPr><w:lang w:bidi="ar-QA"/></w:rPr><w:t xml:space="preserve">here is > the text</w:t> > > [1] > http://stackoverflow.com/questions/28904283/generate-a-word-document-using-different-languages > > -----Original Message----- > From: Branden Visser [mailto:mrvis...@gmail.com] > Sent: Wednesday, July 20, 2016 10:22 AM > To: POI Users List <user@poi.apache.org> > Subject: Re: Finding document language? > > Hi Timothy, thanks for your reply. > > I'm not trying to learn what the language of a document is, I'm actually just > trying to see if the language of the document was set and if so, what it was > set to. That said, do you recall how to get the language metadata? > > Thanks, > Branden > > On Wed, Jul 20, 2016 at 6:12 AM, Allison, Timothy B. <talli...@mitre.org> > wrote: >> This doesn't answer your question on HWPF. >> >> Last I looked at this, a few years ago, I figured out how to get the >> language via OLE, and it was so rarely populated that it was better to >> run language id on the extracted content. For language id (in Java), >> consider optimaize or yalder >> >> >> >> -----Original Message----- >> From: Branden Visser [mailto:mrvis...@gmail.com] >> Sent: Tuesday, July 19, 2016 6:38 PM >> To: POI Users List <user@poi.apache.org> >> Subject: Finding document language? >> >> Hi all, >> >> Does anyone know the best way to get the document language for both XWPF and >> HWPF documents? >> >> I'm guessing if it's File > Properties > Custom, then it can be extracted >> from the custom properties in XWPF, but is there a similar API available in >> HWPF? >> >> Thanks, >> Branden >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@poi.apache.org For additional >> commands, e-mail: user-h...@poi.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@poi.apache.org For additional > commands, e-mail: user-h...@poi.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@poi.apache.org For additional commands, e-mail: user-h...@poi.apache.org