PDFBox can handle multi-byte encodings.  There are a couple recent fixes
for CJK languages that are not part of 0.7.2 but are part of the nightly
build.

Ben



On Fri, 10 Feb 2006, Zhang, Lisheng wrote:

> Hi,
>
> Currently we are using PDFBox to process PDF files and
> POI to process DOC/XLS files, before send strings to lucene
> for indexing,
>
> Does any one know if PDFBox or POI can process multi-
> byte characters like Japanese with various encodings (whatever
> specified in PDF or DOC)?
>
> Thanks very much for helps, Lisheng
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to