W
Sent via BlackBerry from T-Mobile
-Original Message-
From: Erick Erickson
Date: Mon, 24 Jan 2011 16:16:54
To:
Reply-To: java-user@lucene.apache.org
Subject: Re: Indexing with weights
I think all you need to do is index the keywords
Hi Andrew,
you can escape the special characters in the string that QueryParser
reserves
by:
String queryString = QueryParser.escape( queryString );
Query query = QueryParser.parse( queryString );
Yuhan
On Mon, Jan 24, 2011 at 6:03 PM, Andrew Kane wrote:
> Wow, passing the buck doesn't really
Wow, passing the buck doesn't really work for me. If you think Lucene is a
*database* that's fine, but in your demo code (or wherever) you should have
a translation routine to convert user input into *SQL/whatever language
you're using* and solve 95% of the use cases. Does such a translation
rout
Yes. You're confusing an *engine* with a full-blown application.
The user here is a Java programmer. I argue that guessing, which
is what you're asking for, is emphatically NOT in the domain of the
search *engine*, which is what Lucene is. Imagine the poor programmer
trying to understand why certa
What are you talking about?! A search engine isn't a compiler with a
programmer for a user and a strict syntax. The job of a search engine is to
produce the best results it can *for any given input*. Am I missing
something here?
Andrew.
On Mon, Jan 24, 2011 at 5:15 PM, Adriano Crestani wrote
Well, maybe this trick is better?
while(parseFile) {
String keyword = ...;
String score = ...;
doc.add(new Field("keywords", keyword, Field.Store.NO,
Field.Index.ANALYZED));
doc.add(new NumericField(keyword).setAsDouble(score));
}
Then, I guess I can sort based on
Thanks Erick,
So something like:
while(parseFile) {
String keyword = ...;
String score = ...;
doc.add(new Field("keywords", keyword, Field.Store.NO,
Field.Index.ANALYZED));
doc.add(new Field("scores", score, Field.Store.YES,
Field.Index.NOT_ANALYZED));
}
How wou
It's valid syntax error, since - is the exclusion operator, so the QP
expects a term, phrase, parenthesis, etc after that.
On Mon, Jan 24, 2011 at 5:05 PM, Andrew Kane wrote:
> Shouldn't these two queries be fine? (from TREC million query track).
> Should this be entered as a bug?
>
> Thanks,
Shouldn't these two queries be fine? (from TREC million query track).
Should this be entered as a bug?
Thanks,
Andrew.
Cannot parse 'statistics on child labor laws 1930 -': Encountered "" at
line 1, column 37.
Was expecting one of:
"(" ...
"*" ...
...
...
...
...
I think all you need to do is index the keywords in one field and weights in
another.
Then just search on keywords and sort on weight.
Note: the field you sort on should NOT be tokenized.
Best
Erick
On Mon, Jan 24, 2011 at 4:02 PM, Chris Schilling wrote:
> Hello,
>
> I have a bunch of text doc
Hello,
I have a bunch of text documents formatted like so:
keyword1 wt1
keyword2 wt2
keyword3 wt3
I would like to index the documents based on the keywords. When I retrieve
(search) for a keyword, I would like the list of documents to be sorted by the
weight for that keyword. Is there an ex
On 22/01/2011 15:43, Koji Sekiguchi wrote:
(11/01/20 22:19), Paul Taylor wrote:
Trying to extend MappingCharFilter so that it only changes a token if
the length of the token
matches the length of singleMatch in NormalizeCharMap (currently the
singleMatch just has to be
found in the token I want
You can use HTMLStripCharFilter that is plugged into the chain before the
Tokenizer. This one strips all HTML but preserves the Token positions, so
you can later highlight using those positions.
This filter is currently only released through Apache Solr, but in Lucene
4.0 its part of the analysis
Hi all,
I'm new to Lucene and have a question about indexing/highlighting of HTML
files with Lucene.
What I need to do is highlight the hits (terms) in the original HTML file
(or get the positions of the terms/tokens in the original file).
This problem has already been described by Fred Toth in t
Thanks a lot, Umesh!
2011/1/24 Umesh Prasad :
> DisjunctionMaxQuery may be one you are looking for.
> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/DisjunctionMaxQuery.html
> *This is useful when searching for a word in multiple fields with
> different boost factors (so that
DisjunctionMaxQuery may be one you are looking for.
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/DisjunctionMaxQuery.html
*This is useful when searching for a word in multiple fields with
different boost factors (so that the fields cannot be combined equivalently
into a sing
Hi!
My index contains a few (really 7) fields and I need to search by all
of them. I use BooleanQuery and seven TermQueries added to this one.
Problem: result must to be sorted by max(field.boost), not by Lucene’s
default formula.
I think, for this I need to implement MySimilarity (it will simply
r
17 matches
Mail list logo