date:20080512

Re: Filtering a SpanQuery

2008-05-12 Thread Paul Elschot

Op Monday 12 May 2008 09:06:36 schreef Eran Sevi: > Thanks Paul, > > I'll give your code sample a try. > I still think that calling getSpans (the first line of code) that > returns millions of results is going to be much slower than calling > getSpans that's going to return only a few thousands of

Re: Question about startOffset and endOffset

2008-05-12 Thread Brendan Grainger

Hi Erick, Thanks for the reply. The use case I have is this: Say you have a synonym expansion like this: ac -> air conditioning And to keep it simple, a document where the first term is ac. When analyzing the document I currently create a token stream that looks something like this for the

Re: Numerical Range Query

2008-05-12 Thread Erick Erickson

Are you using NumberTools both at index and query time? Because this works exactly as I expect import org.apache.lucene.index.IndexWriter; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import

Re: Numerical Range Query

2008-05-12 Thread Dan Hardiker

Erick Erickson wrote: Although I'm a bit puzzled by what you're actually getting back. You might try using Luke to look at your index to see what's there. I've looked through with Luke and it doesn't look like much has changed between using NumberTools and not. NumberTools definitely does some

Re: Numerical Range Query

2008-05-12 Thread Erick Erickson

Yep, lucene works with strings, not numbers so the fact that you're not getting what you expect is expected . Although I'm a bit puzzled by what you're actually getting back. You might try using Luke to look at your index to see what's there. See the NumberTools class for some help here... B

Numerical Range Query

2008-05-12 Thread Dan Hardiker

Hi, I've got an application which stores ratings for content in a Lucene index. It works a treat for the most part, apart from the use-case I have for being able to filter out ratings that have less than a given number of rates. It kinda works, but seems to use Alpha ranging rather than Numer

Re: Can POI provide reliable text extraction results for production search engine for Word, Excel and PowerPoint formats?

2008-05-12 Thread Karl Wettin

Lukas Vlcek skrev: Hi, I need to find a reliable way how to extract content out of Word, Excel and PowerPoint formats prior to indexing and I am not sure if POI is the best way to go. Can anybody share experience with POI and/or other [commercial] Java library for text extraction from MS formats

Re: Question about startOffset and endOffset

2008-05-12 Thread Karl Wettin

Erick Erickson skrev: Offhand, I expect this will affect up span queries, phrase queries, and who knows what else? Maybe scoring? I belive that the offsets are just meta data stored with the term vectors, used by the highlighter et c. Phrase and span queries use term position in the stream (p

Re: Question about startOffset and endOffset

2008-05-12 Thread Erick Erickson

Is this a theoretical question or is there a use-case you're trying to support? If the latter, a statement of the problem you're trying to solve would be helpful. If the former, setting all your start offsets to 0 seems wrong. You're essentially saying that all tokens are at the beginning of the d

Question about startOffset and endOffset

2008-05-12 Thread Brendan Grainger

Hi, I have a TokenStream that inserts synonym tokens into the stream when matched. One thing I am wondering about is what is the effect of the startOffset and endOffset. I have something like this: Token synonymToken = new Token(originalToken.startOffset(), originalToken.endOffset(), "SYN

Re: Can POI provide reliable text extraction results for production search engine for Word, Excel and PowerPoint formats?

2008-05-12 Thread Nick Burch

On Mon, 12 May 2008, Lukas Vlcek wrote: I need to find a reliable way how to extract content out of Word, Excel and PowerPoint formats prior to indexing and I am not sure if POI is the best way to go. Can anybody share experience with POI and/or other [commercial] Java library for text extracti

posting lists of index are sorted?

2008-05-12 Thread Miguel Costa

Hi all, I have two questions related to the Lucene ranking. 1) Does anyone know how the posting lists (term -> doc1 doc2 doc3) from the index are sorted? It is used a TFxIDF value, the boost value or none to sort documents (doc1 doc2 doc3)? Does Lucene compute the ranking for all the documents

Re: confused about an entry in the FAQ

2008-05-12 Thread Stephane Nicoll

I tried all this and I am confused about the result. I am trying to implement an hybrid query handler where I fetch the IDs from a database criteria and the IDs from a full text lucene query and I intersect them to return the result to the user. The database query and the intersection works fine ev

Search and retrieve the line data from the File

2008-05-12 Thread Madan Narra

Hi All, I am very much new to Lucene and want to extend my skills over this tool But i am in need of a quick assignment which i would need to complete soon...so haven't got much time to read over the docs/books over net.. So please suggest how can i archive the below task and the rest i can

[ANNOUNCE] Lucene Java 2.3.2 release available

2008-05-12 Thread Michael Busch

-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Release 2.3.2 of Lucene Java is now available! This release contains fixes for bugs found in 2.3.1. It does not contain any new features, API or file format changes, which makes it fully compatible to 2.3.0 and 2.3.1. The detailed change log is at:

Re: Filtering a SpanQuery

2008-05-12 Thread Eran Sevi

Thanks Paul, I'll give your code sample a try. I still think that calling getSpans (the first line of code) that returns millions of results is going to be much slower than calling getSpans that's going to return only a few thousands of results. Since the filtering is only performed after calling

Re: Filtering a SpanQuery

Re: Question about startOffset and endOffset

Re: Numerical Range Query

Re: Numerical Range Query

Re: Numerical Range Query

Numerical Range Query

Re: Can POI provide reliable text extraction results for production search engine for Word, Excel and PowerPoint formats?

Re: Question about startOffset and endOffset

Re: Question about startOffset and endOffset

Question about startOffset and endOffset

Re: Can POI provide reliable text extraction results for production search engine for Word, Excel and PowerPoint formats?

posting lists of index are sorted?

Re: confused about an entry in the FAQ

Search and retrieve the line data from the File

[ANNOUNCE] Lucene Java 2.3.2 release available

Re: Filtering a SpanQuery

16 matches

Site Navigation

Mail list logo

Footer information