PDFBox can handle multi-byte encodings. There are a couple recent fixes for CJK languages that are not part of 0.7.2 but are part of the nightly build.
Ben On Fri, 10 Feb 2006, Zhang, Lisheng wrote: > Hi, > > Currently we are using PDFBox to process PDF files and > POI to process DOC/XLS files, before send strings to lucene > for indexing, > > Does any one know if PDFBox or POI can process multi- > byte characters like Japanese with various encodings (whatever > specified in PDF or DOC)? > > Thanks very much for helps, Lisheng > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]