On Dec 1, 2008, at 8:22 AM, Grant Ingersoll wrote:
On Dec 1, 2008, at 8:01 AM, tiziano bernardi wrote:
I tried to use pdfbox but gives me an error.
That the version of lucene and the pdfbox are incompatible.
Lucene knows nothing about PDFBox, so I don't see how they could be
incompatible, unless your are referring to PDFBox's Lucene Document
creator, in which case, you should ask on the PDFBox mailing list.
I think, however, that it's pretty straightforward to create a
Lucene document from PDFBox, so you shouldn't need to rely on their
version.
Personally, I'd have a look at Tika (http://lucene.apache.org/tika),
which wraps PDFBox (and other extraction libraries) and gives you
back SAX-like events via a ContentHandler, which you can then use to
create Lucene documents. Else, I've been working on SOLR-284, which
integrates Tika into Solr, see https://issues.apache.org/jira/browse/SOLR-284
-Grant
And for something out-of-the-box, you might also look at XTF:
http://www.cdlib.org/inside/projects/xtf/
which will index and display text, html, pdf (using PDFbox ) and
several XML text formats ( tei, ead, ... )
-- or you can look at the sources to see how they use PDFbox.
-- Steve Majewski
I use pdf box 0.7.3 and lucene 2.1.0> Date: Mon, 1 Dec 2008
11:43:00 +0000> From: [EMAIL PROTECTED]> To: java-user@lucene.apache.org
> Subject: Re: Pdf in Lucene?> > Hi> > > Lucene only indexes text
so you'll have to get the text out of the PDF> and feed it to
lucene.> > Google for lucene pdf, or go straight to http://www.pdfbox.org/
> > > --> Ian.> > > > 2008/12/1 tiziano bernardi
<[EMAIL PROTECTED]>:> >> >> > Hi,> > I want to index PDF files with
lucene is possible?> > What like?> > Thanks Tiziano Bernardi> >
_________________________________________________________________>
> Fanne di tutti i colori, personalizza la tua Hotmail!> > http://imagine-windowslive.com/Hotmail/#0
> >
--------------------------------------------------------------------->
To unsubscribe, e-mail: [EMAIL PROTECTED]>
For additional commands, e-mail: [EMAIL PROTECTED]>
_________________________________________________________________
50 nuovi schemi per giocare su CrossWire! Accetta la sfida!
http://livesearch.games.msn.com/crosswire/play_it/
--------------------------
Grant Ingersoll
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]