The text should come out as a stream of words with space, but without
any of the formatting in the PDF. Extraction is only good enough to
tell you that a word is somewhere inside a PDF file.  Can you post a
short bit of the text that it extracted?

Also, you should try this test on different PDF files that were made
with different software.

On Thu, Dec 2, 2010 at 9:35 PM, Ganesh <emailg...@yahoo.co.in> wrote:
> Hello all,
>
> I know, this is not the right group to ask this question, thought some of you 
> guys might have experienced.
>
> I newbie with Tika. I am using latest version 0.8 version. I extracted text 
> from PDF document but found spaces and new line missing. Indexing the data 
> gives wrong result. Could any one in this group could help me? I am using 
> tika directly to extract the contents, which later gets indexed.
>
> Regards
> Ganesh
> Send free SMS to your Friends on Mobile from your Yahoo! Messenger. Download 
> Now! http://messenger.yahoo.com/download.php
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>



-- 
Lance Norskog
goks...@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to