Re: PDF text extracted without spaces

Alexander Aristov Fri, 03 Dec 2010 01:10:19 -0800

anyway even if you get correct whitespaces and new lines this won't affect
indexing.


Best Regards
Alexander Aristov


On 3 December 2010 10:00, Lance Norskog <[email protected]> wrote:

> The text should come out as a stream of words with space, but without
> any of the formatting in the PDF. Extraction is only good enough to
> tell you that a word is somewhere inside a PDF file.  Can you post a
> short bit of the text that it extracted?
>
> Also, you should try this test on different PDF files that were made
> with different software.
>
> On Thu, Dec 2, 2010 at 9:35 PM, Ganesh <[email protected]> wrote:
> > Hello all,
> >
> > I know, this is not the right group to ask this question, thought some of
> you guys might have experienced.
> >
> > I newbie with Tika. I am using latest version 0.8 version. I extracted
> text from PDF document but found spaces and new line missing. Indexing the
> data gives wrong result. Could any one in this group could help me? I am
> using tika directly to extract the contents, which later gets indexed.
> >
> > Regards
> > Ganesh
> > Send free SMS to your Friends on Mobile from your Yahoo! Messenger.
> Download Now! http://messenger.yahoo.com/download.php
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
> >
>
>
>
> --
> Lance Norskog
> [email protected]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: PDF text extracted without spaces

Reply via email to