Search oddities

2006-05-25 Thread Tim.Wright
It appears that I was confused about the way analyzers are working. I assumed that a typical analyzer would just remove hyphens and treat the phrase as a space. We're just using StandardAnalyzer. When we search (using QueryParser) for the phrase "t-mobile" (including quotes) we're getting results

Handling hyphens and other puncuation in proper nouns

2006-05-24 Thread Tim.Wright
Hi all, We're having issues searching for proper nouns (names) which have punctuation in; things like "a-blah" or "blah'x". I suspect the StandardAnalyzer is replacing the punctuation with spaces, and we get back results that just contain "blah". Any suggestions? I'm guessing we could write our

RE: Lucene 1.9.1 Query

2006-03-22 Thread Tim.Wright
You need to create a QueryParser instance and use that instead: QueryParser qp = new QueryParser("text", new StandardAnalyzer()); Query query = qp.parse(this.searchvalue); Cheers, Tim. -Original Message- From: WATHELET Thomas [mailto:[EMAIL PROTECTED] Sent: 22 March 2006 11:25 To: java

RE: TooManyClauses exception in Lucene (1.4)

2006-03-17 Thread Tim.Wright
Thanks to everyone for the explanation. Given that RangeQuery is clearly unsuitable for out requirements, ConstantScoreRangeQuery looks ideal. However, we're building our queries (at the moment) using QueryParser. Is there any way we can get QueryParser to use a ConstantScoreRangeQuery instead of

RE: TooManyClauses exception in Lucene (1.4)

2006-03-16 Thread Tim.Wright
Ouch! Yes, we're indexing with seconds, that's almost certainly the problem. :( I had no idea that rangequery worked by enumerating every possible value, that's terrifying. We have a requirement to index data going back for about 20 years, though, and although daily resolution would be fine, this

TooManyClauses exception in Lucene (1.4)

2006-03-16 Thread Tim.Wright
Hi, We're using queryparser to generate my queries (not ideal, and we're planning on rewriting it, but at the moment we don't have the resources to do so). We have a default field "text" which contains all of our text fields, and a "date" field which is just a string field in the format -MM-

Changing default QueryParser operator from OR to AND

2006-02-10 Thread Tim.Wright
Hi guys, IF QueryParser gets a phrase with a number of words (ie: "here are words") it uses the implicit operator OR - "here OR are OR words". LIA on p94 says the operator "by default is OR", implying that there may be some way to change this. We'd really like the default to be AND. Is that pos

RE: Sorting by calculated custom score at search time

2006-01-24 Thread Tim.Wright
Nick Vincent [mailto:[EMAIL PROTECTED] wrote: [snip] > From an earlier thread discussing a calculated score based on the hit > score and the age of document I gather that TSS regenerate their indexes > to alter the document boost based on date. I need to be able to sort by > either relevance or

RE: Boost value and LUKE

2006-01-23 Thread Tim.Wright
Ah - that's useful to know. Although in that case I'd suggest that the sensible thing for Luke to do would be to either remove the boost field, or show it as "unavailable", instead of (misleadingly) displaying it as 1.0... Cheers, Tim. -Original Message- From: Andrzej Bialecki [mailto:[

RE: Boost value and LUKE

2006-01-23 Thread Tim.Wright
I'm pretty sure this is a bug or incompatibility with Luke - I'm using boosted documents, and I seem to remember that Luke reported everything as 1.0, even though my test applications showed things correctly. The boost in the final app is working fine, so the functionality of Lucene appears to be

RE: Searching for "keyword" fields using QueryParser

2005-11-24 Thread Tim.Wright
Excellent, that's exactly what I needed. Many thanks! Cheers, Tim. -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: 24 November 2005 14:51 To: java-user@lucene.apache.org Subject: Re: Searching for "keyword" fields using QueryParser Tim, The trick is to use PerFi

Searching for "keyword" fields using QueryParser

2005-11-24 Thread Tim.Wright
Hi, Our index has a large text field, and a number of "keyword" fields with things such as the publication code, article reference and so on. We're analysing using the StandardAnalyzer, which works well. Obviously the fields which are defined as Field.Keyword don't run through the analyzer. Th

RE: Deleting documents

2005-09-16 Thread Tim.Wright
If you're indexing a field like this in order to be able to use it as a reference later, you should normally index it using Field.Keyword instead of Field.Text - if you use Text, it will go through your Analyzer, which is probably what's changing the case. (I think this is right - I'm sure someone

RE: Sorting results by both score and date

2005-09-16 Thread Tim.Wright
>> What I really want to do is sort by "A * (1-(B/700))", where A is the >> score, and B is the age (in days) of the document. IE - the score is >> basically "scaled down" with date. > Maybe the TSS case study will help, though they rebuild their index > nightly and can adjust the boost based on

RE: Sorting results by both score and date

2005-09-16 Thread Tim.Wright
Ah - the one bit of LIA I haven't read yet is the case studies section! Many thanks, I'll check it out. Sorting by multiple fields isn't quite what I want - that sorts entirely by field A, then uses field B for records where A is identical, correct? What I really want to do is sort by "A * (1-(B/

Sorting results by both score and date

2005-09-16 Thread Tim.Wright
Hi, I'm working in an industry which is fairly time sensitive, and older documents are inherently less valuable. I'd like to be able to "weight" the score of search results, so that older documents score lower. I don't just want to sort by date, though - I'd still like results to be ordered by sco