The main problem is i am not getting whitespace and newline char. This is happening only for PDF documents.
Sample outoput: Someofthedifferencesare but it should be Some of the differences are Regards Ganesh ----- Original Message ----- From: "Alexander Aristov" <alexander.aris...@gmail.com> To: <java-user@lucene.apache.org> Sent: Friday, December 03, 2010 2:39 PM Subject: Re: PDF text extracted without spaces > anyway even if you get correct whitespaces and new lines this won't affect > indexing. > > Best Regards > Alexander Aristov > > > On 3 December 2010 10:00, Lance Norskog <goks...@gmail.com> wrote: > >> The text should come out as a stream of words with space, but without >> any of the formatting in the PDF. Extraction is only good enough to >> tell you that a word is somewhere inside a PDF file. Can you post a >> short bit of the text that it extracted? >> >> Also, you should try this test on different PDF files that were made >> with different software. >> >> On Thu, Dec 2, 2010 at 9:35 PM, Ganesh <emailg...@yahoo.co.in> wrote: >> > Hello all, >> > >> > I know, this is not the right group to ask this question, thought some of >> you guys might have experienced. >> > >> > I newbie with Tika. I am using latest version 0.8 version. I extracted >> text from PDF document but found spaces and new line missing. Indexing the >> data gives wrong result. Could any one in this group could help me? I am >> using tika directly to extract the contents, which later gets indexed. >> > >> > Regards >> > Ganesh >> > Send free SMS to your Friends on Mobile from your Yahoo! Messenger. >> Download Now! http://messenger.yahoo.com/download.php >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> > For additional commands, e-mail: java-user-h...@lucene.apache.org >> > >> > >> >> >> >> -- >> Lance Norskog >> goks...@gmail.com >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > Send free SMS to your Friends on Mobile from your Yahoo! Messenger. Download Now! http://messenger.yahoo.com/download.php --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org