Re: PDF text extracted without spaces

Lance Norskog Thu, 02 Dec 2010 23:00:51 -0800

The text should come out as a stream of words with space, but without
any of the formatting in the PDF. Extraction is only good enough to
tell you that a word is somewhere inside a PDF file.  Can you post a
short bit of the text that it extracted?


Also, you should try this test on different PDF files that were made
with different software.

On Thu, Dec 2, 2010 at 9:35 PM, Ganesh <[email protected]> wrote:
> Hello all,
>
> I know, this is not the right group to ask this question, thought some of you 
> guys might have experienced.
>
> I newbie with Tika. I am using latest version 0.8 version. I extracted text 
> from PDF document but found spaces and new line missing. Indexing the data 
> gives wrong result. Could any one in this group could help me? I am using 
> tika directly to extract the contents, which later gets indexed.
>
> Regards
> Ganesh
> Send free SMS to your Friends on Mobile from your Yahoo! Messenger. Download 
> Now! http://messenger.yahoo.com/download.php
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>



-- 
Lance Norskog
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: PDF text extracted without spaces

Reply via email to