Re: Indexing with weightsjxcmhcl$bn

2011-01-24 Thread baleksan
W Sent via BlackBerry from T-Mobile -Original Message- From: Erick Erickson Date: Mon, 24 Jan 2011 16:16:54 To: Reply-To: java-user@lucene.apache.org Subject: Re: Indexing with weights I think all you need to do is index the keywords

Re: Query parse errors for dashes in Lucene (3.0.3)

2011-01-24 Thread Yuhan Zhang
Hi Andrew, you can escape the special characters in the string that QueryParser reserves by: String queryString = QueryParser.escape( queryString ); Query query = QueryParser.parse( queryString ); Yuhan On Mon, Jan 24, 2011 at 6:03 PM, Andrew Kane wrote: > Wow, passing the buck doesn't really

Re: Query parse errors for dashes in Lucene (3.0.3)

2011-01-24 Thread Andrew Kane
Wow, passing the buck doesn't really work for me. If you think Lucene is a *database* that's fine, but in your demo code (or wherever) you should have a translation routine to convert user input into *SQL/whatever language you're using* and solve 95% of the use cases. Does such a translation rout

Re: Query parse errors for dashes in Lucene (3.0.3)

2011-01-24 Thread Erick Erickson
Yes. You're confusing an *engine* with a full-blown application. The user here is a Java programmer. I argue that guessing, which is what you're asking for, is emphatically NOT in the domain of the search *engine*, which is what Lucene is. Imagine the poor programmer trying to understand why certa

Re: Query parse errors for dashes in Lucene (3.0.3)

2011-01-24 Thread Andrew Kane
What are you talking about?! A search engine isn't a compiler with a programmer for a user and a strict syntax. The job of a search engine is to produce the best results it can *for any given input*. Am I missing something here? Andrew. On Mon, Jan 24, 2011 at 5:15 PM, Adriano Crestani wrote

Re: Indexing with weights

2011-01-24 Thread Chris Schilling
Well, maybe this trick is better? while(parseFile) { String keyword = ...; String score = ...; doc.add(new Field("keywords", keyword, Field.Store.NO, Field.Index.ANALYZED)); doc.add(new NumericField(keyword).setAsDouble(score)); } Then, I guess I can sort based on

Re: Indexing with weights

2011-01-24 Thread Chris Schilling
Thanks Erick, So something like: while(parseFile) { String keyword = ...; String score = ...; doc.add(new Field("keywords", keyword, Field.Store.NO, Field.Index.ANALYZED)); doc.add(new Field("scores", score, Field.Store.YES, Field.Index.NOT_ANALYZED)); } How wou

Re: Query parse errors for dashes in Lucene (3.0.3)

2011-01-24 Thread Adriano Crestani
It's valid syntax error, since - is the exclusion operator, so the QP expects a term, phrase, parenthesis, etc after that. On Mon, Jan 24, 2011 at 5:05 PM, Andrew Kane wrote: > Shouldn't these two queries be fine? (from TREC million query track). > Should this be entered as a bug? > > Thanks,

Query parse errors for dashes in Lucene (3.0.3)

2011-01-24 Thread Andrew Kane
Shouldn't these two queries be fine? (from TREC million query track). Should this be entered as a bug? Thanks, Andrew. Cannot parse 'statistics on child labor laws 1930 -': Encountered "" at line 1, column 37. Was expecting one of: "(" ... "*" ... ... ... ... ...

Re: Indexing with weights

2011-01-24 Thread Erick Erickson
I think all you need to do is index the keywords in one field and weights in another. Then just search on keywords and sort on weight. Note: the field you sort on should NOT be tokenized. Best Erick On Mon, Jan 24, 2011 at 4:02 PM, Chris Schilling wrote: > Hello, > > I have a bunch of text doc

Indexing with weights

2011-01-24 Thread Chris Schilling
Hello, I have a bunch of text documents formatted like so: keyword1 wt1 keyword2 wt2 keyword3 wt3 I would like to index the documents based on the keywords. When I retrieve (search) for a keyword, I would like the list of documents to be sorted by the weight for that keyword. Is there an ex

Re: Trying to extend MappingCharFilter so that it only changes a token if the length of the token matches the length of singleMatch

2011-01-24 Thread Paul Taylor
On 22/01/2011 15:43, Koji Sekiguchi wrote: (11/01/20 22:19), Paul Taylor wrote: Trying to extend MappingCharFilter so that it only changes a token if the length of the token matches the length of singleMatch in NormalizeCharMap (currently the singleMatch just has to be found in the token I want

RE: Preserving original HTML file offsets for highlighting

2011-01-24 Thread Uwe Schindler
You can use HTMLStripCharFilter that is plugged into the chain before the Tokenizer. This one strips all HTML but preserves the Token positions, so you can later highlight using those positions. This filter is currently only released through Apache Solr, but in Lucene 4.0 its part of the analysis

Preserving original HTML file offsets for highlighting

2011-01-24 Thread Karolina Bernat
Hi all, I'm new to Lucene and have a question about indexing/highlighting of HTML files with Lucene. What I need to do is highlight the hits (terms) in the original HTML file (or get the positions of the terms/tokens in the original file). This problem has already been described by Fred Toth in t

Re: Unusual scoring

2011-01-24 Thread Dmytro Barabash
Thanks a lot, Umesh! 2011/1/24 Umesh Prasad : >  DisjunctionMaxQuery may be one you are looking for. > http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/DisjunctionMaxQuery.html >   *This is useful when searching for a word in multiple fields with > different boost factors (so that

Re: Unusual scoring

2011-01-24 Thread Umesh Prasad
DisjunctionMaxQuery may be one you are looking for. http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/DisjunctionMaxQuery.html *This is useful when searching for a word in multiple fields with different boost factors (so that the fields cannot be combined equivalently into a sing

Unusual scoring

2011-01-24 Thread Dmytro Barabash
Hi! My index contains a few (really 7) fields and I need to search by all of them. I use BooleanQuery and seven TermQueries added to this one. Problem: result must to be sorted by max(field.boost), not by Lucene’s default formula. I think, for this I need to implement MySimilarity (it will simply r