anyway even if you get correct whitespaces and new lines this won't affect indexing.
Best Regards Alexander Aristov On 3 December 2010 10:00, Lance Norskog <goks...@gmail.com> wrote: > The text should come out as a stream of words with space, but without > any of the formatting in the PDF. Extraction is only good enough to > tell you that a word is somewhere inside a PDF file. Can you post a > short bit of the text that it extracted? > > Also, you should try this test on different PDF files that were made > with different software. > > On Thu, Dec 2, 2010 at 9:35 PM, Ganesh <emailg...@yahoo.co.in> wrote: > > Hello all, > > > > I know, this is not the right group to ask this question, thought some of > you guys might have experienced. > > > > I newbie with Tika. I am using latest version 0.8 version. I extracted > text from PDF document but found spaces and new line missing. Indexing the > data gives wrong result. Could any one in this group could help me? I am > using tika directly to extract the contents, which later gets indexed. > > > > Regards > > Ganesh > > Send free SMS to your Friends on Mobile from your Yahoo! Messenger. > Download Now! http://messenger.yahoo.com/download.php > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > -- > Lance Norskog > goks...@gmail.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >