Re: How to do ranking in lucene?

2009-12-13 Thread Weiwei Wang
it's alread done. You can read this book for more information: http://www-nlp.stanford.edu/IR-book/ On Mon, Dec 14, 2009 at 1:37 PM, DHIVYA M wrote: > Hi all, > > Am using lucene 2.3.1. > Can anyone suggest me how to implement ranking in lucene? If its available > how is it done? > > Thanks in ad

How to do ranking in lucene?

2009-12-13 Thread DHIVYA M
Hi all,   Am using lucene 2.3.1. Can anyone suggest me how to implement ranking in lucene? If its available how is it done?   Thanks in advance, Dhivya The INTERNET now has a personality. YOURS! See your Yahoo! Homepage. http://in.yahoo.com/

Problem in MappingCharFilter and PatternReplaceCharFilter

2009-12-13 Thread Weiwei Wang
Hi, guys, 1. how to deal with c++c++ or c++abc using MappingCharFilter i use a NormalizeMap("c++","cplusplus"), the analyzed result will be cpluspluscplusplus or cplusplusabc wich is not what i want if i use a NormalizeMap("c++","cplusplus$"), the offset will not be correct. 2. I use PatternRe

CJKAnalyzer phrase slop?

2009-12-13 Thread Jason Rutherglen
Does CJK support phrase slop? (I'm assuming no) - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Offset Problem

2009-12-13 Thread Weiwei Wang
The offset is incorrect for PatternReplaceCharFilter so the hilighting result is wrong. How to fix it? On Mon, Dec 14, 2009 at 11:43 AM, Weiwei Wang wrote: > All solr souce downloaded, and I found PatternReplaceCharFilter is very > useful for my project. > > Thanks > > > On Mon, Dec 14, 2009 at

Re: Looking for a MappingCharFilter that accepts regular expressions

2009-12-13 Thread Weiwei Wang
All solr souce downloaded, and I found PatternReplaceCharFilter is very useful for my project. Thanks On Mon, Dec 14, 2009 at 11:14 AM, Weiwei Wang wrote: > I need the source file not the patch file, where can i download it? > > > On Mon, Dec 14, 2009 at 1:15 AM, Koji Sekiguchi wrote: > >> Koji

Re: Looking for a MappingCharFilter that accepts regular expressions

2009-12-13 Thread Weiwei Wang
I need the source file not the patch file, where can i download it? On Mon, Dec 14, 2009 at 1:15 AM, Koji Sekiguchi wrote: > Koji Sekiguchi wrote: > >> Paul Taylor wrote: >> >>> I want my search to treat 'No. 1' and 'No.1' the same, because in our >>> context its one token I want 'No. 1' to beco

Re: Looking for a MappingCharFilter that accepts regular expressions

2009-12-13 Thread Koji Sekiguchi
Koji Sekiguchi wrote: Paul Taylor wrote: I want my search to treat 'No. 1' and 'No.1' the same, because in our context its one token I want 'No. 1' to become 'No.1', I need to do this before tokenizing because the tokenizer would split one value into two terms and one into just one term. I al

Re: Looking for a MappingCharFilter that accepts regular expressions

2009-12-13 Thread Koji Sekiguchi
Paul Taylor wrote: I want my search to treat 'No. 1' and 'No.1' the same, because in our context its one token I want 'No. 1' to become 'No.1', I need to do this before tokenizing because the tokenizer would split one value into two terms and one into just one term. I already use a NormalizeM

Re: Recover special terms from StandardTokenizer

2009-12-13 Thread Weiwei Wang
Another problem how to deal with c++c++ or c++abc using MappingCharFilter i use a NormalizeMap("c++","cplusplus"), the analyzed result will be cpluspluscplusplus or cplusplusabc wich is not what i want if i use a NormalizeMap("c++","cplusplus$"), the offset will not be correct. On Sun, Dec 13

Re: Recover special terms from StandardTokenizer

2009-12-13 Thread Weiwei Wang
Thank you very much, Uwe, I found the problem. 2009/12/13 Uwe Schindler > MappingCharFilter definitely preserves the offsets from the original > reader. > Yo can verify that for your case with Lucene’s testcase > TestMappingCharFilter in the source distribution @ > /src/test/org/apache/lucene/an

RE: Recover special terms from StandardTokenizer

2009-12-13 Thread Uwe Schindler
MappingCharFilter definitely preserves the offsets from the original reader. Yo can verify that for your case with Lucene’s testcase TestMappingCharFilter in the source distribution @ /src/test/org/apache/lucene/analysis/TestMappingCharFilter.java: public void test2to4() throws Exception { CharS

Re: Recover special terms from StandardTokenizer

2009-12-13 Thread Weiwei Wang
LowercaseCharFilter is necessary, as in the MappingCharFilter we need to provide a NormalizeCharMap. We lowercase the stream so as we only provide lowercase maps in the NormalizeCharMap, e.g. we provide map (c++-->cplusplus) instead of (c++-->cplusplus) and (C++-->cplusplus). C++ is only an exampl

RE: Recover special terms from StandardTokenizer

2009-12-13 Thread Uwe Schindler
I think your problem is theLowercaseCharFilter that does not pass correctOffset() to the underying CharFilter. Does it work better without your LowerCaseCharFilter (which is duplicate because there is already a LowerCaseFilter in the Tokenizer chain). As you are only looking for "c++", just also a

Re: Index Update

2009-12-13 Thread Weiwei Wang
gotcha, thanks, Mike On Sun, Dec 13, 2009 at 7:28 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > IndexWriter is transactional: if you do the deletes & adds during a > single IndexWriter session (ie, no commit in between), then simply > call IndexWriter.rollback to get back to your s

Re: Index Update

2009-12-13 Thread Michael McCandless
IndexWriter is transactional: if you do the deletes & adds during a single IndexWriter session (ie, no commit in between), then simply call IndexWriter.rollback to get back to your starting index, if anything goes wrong. If nothing goes wrong, call IndexWriter.commit. Outside readers, even newly

Re: Recover special terms from StandardTokenizer

2009-12-13 Thread Weiwei Wang
thanks, Uwe. Maybe i was not very clear. My situation is like this: Analyzer: NormalizeCharMap RECOVERY_MAP = new NormalizeCharMap(); RECOVERY_MAP.add("c++","cplusplus$"); CharFilter filter = new LowercaseCharFilter(reader); filter = new RosaMappingCharFilter(RECOVERY_MAP,filter);

RE: Recover special terms from StandardTokenizer

2009-12-13 Thread Uwe Schindler
MappingCharFilter preserves the offsets in the stream *before* filtering. So if you store the original string (without c++ replaced) in a stored field you can highlight using the given offstes. The highlighter must use again the same analyzer or use FastVectorHighlighter. - Uwe Schindler H.-H.

Re: Recover special terms from StandardTokenizer

2009-12-13 Thread Weiwei Wang
Problem solved. Now another problem comes. As I want to use Highlighter in my system, the token offset is incorrect after the MappingCharFilter is used. Koji, do you known how to fix the offset problem? On Sun, Dec 13, 2009 at 11:12 AM, Weiwei Wang wrote: > I use Luke to check the result and