Re: wildcard search constraints?

2006-02-17 Thread Chris Hostetter
: I have some strange behaviors: : : (1) The field query slug:abc*hurr2* works only if the field type is : "Keyword". The query fails if the type is "Text". : : (2) On the other hand, query slug:abc-nws-hurr29 only works if the field : type is "Text" and fails if the type is "Keyword" I think the

wildcard search constraints?

2006-02-17 Thread Xin Herbert Wu
Hi, I have a lucene index library field "slug" which has a string value "abc-nws-hurr29". When using these settings: - Use QueryParser - StandardAnalyzer I have some strange behaviors: (1) The field query slug:abc*hurr2* works only if the field type is "Keyword". The query fails if th

Re: StandardAnalyzer .. stemming

2006-02-17 Thread Mufaddal Khumri
Thank you. I think in my case i can just do the last approach you suggested. One more question, what jar is SnowballFilter part of? Chris Hostetter wrote: : The SnowBallAnalyzer seems to offer stemming. The StandardAnalyzer on : the other hand has a bunch of other niceness. What is the best pr

Re: StandardAnalyzer .. stemming

2006-02-17 Thread Chris Hostetter
: The SnowBallAnalyzer seems to offer stemming. The StandardAnalyzer on : the other hand has a bunch of other niceness. What is the best practice : of leveraging both these analyzers while indexing and searching? Do I : chain these up somehow and if so what apis do i look at for doing so? Do : i i

StandardAnalyzer .. stemming

2006-02-17 Thread Mufaddal Khumri
The SnowBallAnalyzer seems to offer stemming. The StandardAnalyzer on the other hand has a bunch of other niceness. What is the best practice of leveraging both these analyzers while indexing and searching? Do I chain these up somehow and if so what apis do i look at for doing so? Do i implemen

Re: Custom Sorting

2006-02-17 Thread Michael D. Curtin
SOME ONE wrote: Hi, I am using MultiFieldQueryParser (Lucene 1.9) to search title and body fields in the documents. The requirement is that documents with title match should be returned before the documents with body match. Using the default scoring, title matches do come before the body matche

Re: ArrayIndexOutOfBoundsException while closing the index writer

2006-02-17 Thread Otis Gospodnetic
Hi, Sorry if I sounded like adding the finally block will solve your problem. It will not. >From the cursory look at your code, I don't see why you are passing >IndexWriter INTO indexFile() method. I think you should change that. Then, your code should look like this: IndexWriter writer =

Custom Sorting

2006-02-17 Thread SOME ONE
Hi, I am using MultiFieldQueryParser (Lucene 1.9) to search title and body fields in the documents. The requirement is that documents with title match should be returned before the documents with body match. Using the default scoring, title matches do come before the body matches. But, I also need

Re: Speedup indexing process

2006-02-17 Thread Michael D. Curtin
Java Programmer wrote: Hi, Maybe this question is trivial but I need to ask it. I've some problem with indexing large number of documents, and I seek for better solution. Task is to index about 33GB text data CSV (each record about 30kB), it possible of course to index these data but I'm not ver

Re: Vector Space Model <-> Probabilistic Model

2006-02-17 Thread Malcolm
I know of one I used for my Thesis. The REF is: Fuhr, N. 2001, "Models in information retrieval", , pp. 21-50. http://portal.acm.org/citation.cfm?id=567294 I may have a electronic version. If you need it give me an email address as this service doesn't allow attachments. Hope this helps, Mal

Re: Vector Space Model <-> Probabilistic Model

2006-02-17 Thread Karl Koch
Does anybody here know a paper/book chapter that particularily writes about the individual advantages/disadvantages of each ot the two models? I had a look at [Grossman/Frieder 1998] ("Information Retrieval - Algorithms and Heuristics") in which is was stated that the question which model works bes

RE: BM25 Similarity implementation

2006-02-17 Thread Trieschnigg, R.B. \(Dolf\)
Sorry, the image wasn't sent: http://wwwhome.cs.utwente.nl/~trieschn/bm25.PNG > -Original Message- > From: Trieschnigg, R.B. (Dolf) > [mailto:[EMAIL PROTECTED] > Sent: vrijdag 17 februari 2006 10:54 > To: java-user@lucene.apache.org > Subject: RE: BM25 Similarity implementation > > > >

Speedup indexing process

2006-02-17 Thread Java Programmer
Hi, Maybe this question is trivial but I need to ask it. I've some problem with indexing large number of documents, and I seek for better solution. Task is to index about 33GB text data CSV (each record about 30kB), it possible of course to index these data but I'm not very happy with timings (abou

Re: QueryParser behaviour ..

2006-02-17 Thread sergiu gordea
Yonik Seeley wrote: From the user's point of view I think it will make sense to build a phrase query only when the quotes are found in the search string. You make an interesting point Sergiu. Your proposal would increase the expressive power of the QueryParser by allowing the constructio

RE: BM25 Similarity implementation

2006-02-17 Thread Trieschnigg, R.B. \(Dolf\)
> > I would like to implement the Okapi BM25 weighting function > > using my own Similarity implementation. Unfortunately BM25 > > requires the document length in the score calculation, which > > is not provided by the Scorer. > > How do you want to measure document length? If the number of >

Re: Highlighting text for queries with huge numbers of terms

2006-02-17 Thread markharw00d
Hi Daniel/Chris, Unfortunately, the contrib/highlighter code in source control fails to meet our needs in two ways: 1. We don't just want fragments, we want *all* of the text, with highlights in the appropriate places (although we do offer a means to display just the fragments as w