Congrats, Chris! This was no doubt a big effort. I didn't do much
but lend some moral support. There was an odd issue you had with
certain types of queries at one point - did you get that resolved?
Erik
On May 25, 2005, at 3:40 PM, Chris Conrad wrote:
Hello,
I just wanted to let e
Thanks for your help. I was using 0.7.0. However, I installed 0.7.1 and
get the same result with the ExtractText utility. I will post an issue with
the PDFBox sourceforge site.
Tom
- Original Message -
From: "Ben Litchfield" <[EMAIL PROTECTED]>
To:
Sent: Wednesday, May 25, 2005 5:
hi,
I agree with Ben Litchfield,
Before feed extracted text into lucene indexer ,
should ched the extracted text ,and for me , now using
java org.pdfbox.ExtractText to get the text
in pdf .
[quote]
"Ben Litchfield" <[EMAIL PROTECTED]>
Can you run the following command line applica
There were some fixes around extra spaces in the 0.7.1 version of PDFBox,
if you are not using that version please try it, otherwise post an issue
on the PDFBox sourceforge site.
http://sourceforge.net/tracker/?group_id=78314&atid=552832
Thanks,
Ben
On Wed, 25 May 2005, Thomas X Hoban wrote:
In creating the index, the code passes StandardAnalyzer to the IndexWriter
constructor.
- Original Message -
From: "Chris Fraschetti" <[EMAIL PROTECTED]>
To:
Sent: Wednesday, May 25, 2005 4:53 PM
Subject: Re: Lucene - PDFBox
Also, which analyzer are you using when indexing your docu
Thanks for replying.
When I run the command, it generates a file with a "txt" extension. The
text in this file has spaces interspersed in odd spots. Here is output from
a file I ran the command on...
Marc h 29, 2005
Hello t here m y good friend.
HELLO
Legal Soft w are is GOOD.
I woul
Also, which analyzer are you using when indexing your documents?
On 5/25/05, Ben Litchfield <[EMAIL PROTECTED]> wrote:
>
> Can you run the following command line application on the PDF to verify
> that the extracted text is correct
>
> java org.pdfbox.ExtractText
>
> Ben
>
>
>
> On Wed, 25
Can you run the following command line application on the PDF to verify
that the extracted text is correct
java org.pdfbox.ExtractText
Ben
On Wed, 25 May 2005, Thomas X Hoban wrote:
>
>
> First, I am new to Lucene.
>
> Is there anyone out there who has had trouble getting hits when running
First, I am new to Lucene.
Is there anyone out there who has had trouble getting hits when running phrase
queries against an index that contains content from PDF files. For PDF
documents, I create the document using LucenePDFDocument.getDocument(file) and
then add it to the index. For n
Hello,
I just wanted to let everyone know that we've officially announced
that the new SourceForge.net search system is based on Lucene. It's
been in operation for over a month now and we're very happy with it.
I'd also like to personally thank Erik Hatcher for helping me out
during dev
On Wednesday 25 May 2005 11:21, Kapil Chhabra wrote:
> 1. My application requires documents to be sorted on one of my indexed
> fields everytime.
> I use the hits.setSort() method to specify the field.
> In short my application will never use the scores generated by lucene
> search.
> Is calculatin
On Wednesday 25 May 2005 13:00, Barbara Krausz wrote:
>
> >
> Hi,
>
> Consider a Query with e.g. 4 terms (t1,t2,t3,t4). I want to retrieve all
> documents which contain at least e.g. 3 of the queryterms. How can I
> implement this?
> The first idea is to use BooleanQueries such as
> (t1 and t2
Hi all,
I send this email to make a correction to the solution that enables
SuffixQeuries
The definition of the WILDTERM was a buggy one, it splitted a term in
two terms
e.g "term:te*st" was parsed to "term:te* term:st", of course this
was wrong.
HERE is the right way to do it ...
This isn't totally what you want, but is a intermediate step between
going through all terms is something like what is in Luke.
In Luke on the Documents tab, you can put in a single letter in the
Browse by term field and then hit "next term" and it will give you the
next term, which you could th
I wrote a very simple sax parser for our xml content - I check for the
search tokens (analyzer.tokenStream) in the text and place a span tag
around each found token. This process could work well with xhtml as well.
In other words, I could never get the highlighter to do what I wanted to
On May 25, 2005, at 7:00 AM, Barbara Krausz wrote:
Hi,
Consider a Query with e.g. 4 terms (t1,t2,t3,t4). I want to
retrieve all documents which contain at least e.g. 3 of the
queryterms. How can I implement this?
The first idea is to use BooleanQueries such as
(t1 and t2 and t3 and t4) or
Hi,
Consider a Query with e.g. 4 terms (t1,t2,t3,t4). I want to retrieve all
documents which contain at least e.g. 3 of the queryterms. How can I
implement this?
The first idea is to use BooleanQueries such as
(t1 and t2 and t3 and t4) or (t1 and t2 and t3) or(t1 and t2 and t4) or
(t1 and
Yeah,
That works
Thanks
Peter Gelderbloem
-Original Message-
From: Chris Lamprecht [mailto:[EMAIL PROTECTED]
Sent: 24 May 2005 18:16
To: java-user@lucene.apache.org
Subject: Re: Query.toString(0 does not escape special characters
Hi Peter,
See the method escape(String s) of QueryParser,
1. My application requires documents to be sorted on one of my indexed
fields everytime.
I use the hits.setSort() method to specify the field.
In short my application will never use the scores generated by lucene
search.
Is calculating scores a overhead? Can I skip the process somehow?
2. let C
Got you :-)
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Chris Hostetter
Sent: Wednesday, May 25, 2005 8:48 AM
To: java-user@lucene.apache.org
Subject: RE: tf=0 while lucene is finding matches?
: I believe I do use the index number for the explain(),
Hello,
I am currently looking for a way to navigate forward and backward among
the indexed terms.
For example, given a Term t, I would like to be able to get the next 10
terms or the previous 10 ones.
Getting the next terms is quite straitforward, using the terms(Term t)
method from IndexRea
21 matches
Mail list logo