Re: FastVectorHighlighter StringIndexOutofBounds bug

2011-05-22 Thread Weiwei Wang
1. source string: 7 2. WhitespaceTokenizer + EGramTokenFilter 3. FastVectorHighlighter, 4. debug info: subInfos=(777((8,11))777((5,8))777((2,5)))/3.0(2,102), srcIndex is not correctly computed for the second loop of the outer for-loop 2011/5/23 Weiwei Wang > the following code ha

FastVectorHighlighter StringIndexOutofBounds bug

2011-05-22 Thread Weiwei Wang
the following code has a bug of StringIndexOutofBounds when multiple matched terms need highlight private String makeFragment( WeightedFragInfo fragInfo, String src, int s, String[] preTags, String[] postTags, Encoder encoder ){ StringBuilder fragment = new StringBuilder(); int srcIn

Re: lucene PorterStemmer

2011-04-24 Thread Weiwei Wang
search lucene snowball 在 2011-4-24 下午7:14,"wenlei zhou" 写道: > > Hi, guys > I want to stem some English text. > For example: > > String stemTerm(String term){ > ... > } > > We can get the stemmed term of the input term. > > Does any one knows how to use lucene to achieve this target? > > Sincerely

ICU Chinese words

2011-04-23 Thread Weiwei Wang
hi,all I'm working on a Chinese contact search project, I need to transform the Chinese words to its Pinyin form. e.g. 中国--> zhongguo The problem I encounter is that for some chinese words which have more than one transforms, like. 贾-> jia, 贾->gu, ... I already used the ICUTransformFilter

Re: How to make search distributed and scalable

2011-04-21 Thread Weiwei Wang
t; > > > On 4/19/2011 12:11 AM, Weiwei Wang wrote: > >> Hi, buddies, >> I'm reading something about solr and elastic-search, the thing i have >> been curious is how to make search engine distributed(using something like >> hadoop?). >> >>

How to make search distributed and scalable

2011-04-18 Thread Weiwei Wang
de me more materials to get me clear about the distributed architecture of such open source search engines? Thanks~ --- Weiwei Wang gtalk: ww.wang...@gmail.com

Re: [ANN] Luke 1.0.0 for Lucene 3.0

2009-12-26 Thread Weiwei Wang
d Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Weiwei Wang Alex Wang 王巍巍 Room 403, Men

Re: How to get a apache public license

2009-12-24 Thread Weiwei Wang
help - if you own the > copyright and patent rights to the work, you may release it under the > Apache > license by putting the code somewhere for people to get (sourceforge, > google-code, github, your own server, wherever), and include the > above-mentioned license file. > >

Re: Document category identification in query

2009-12-21 Thread Weiwei Wang
#x27; name. > But you could have a try. If you have interesting results, let me know. > Thanks! > > That's my opinion. > > 2009/12/21 Alex > > > Hi ! > > > > Many thanks to both of you for your suggestions and answers! > > > > What Weiwei Wang

Re: Search for "similar documents" with a query as another document ?

2009-12-21 Thread Weiwei Wang
ery. > > > > > > > > > -- > > Spica Framework: http://code.google.com/p/spica > > http://www.twitter.com/pcdinh > > http://groups.google.com/group/phpvietnam > > > -- Weiwei Wang Alex Wang 王巍巍 Room 403, Mengmin Wei Building Computer Science Department Gulou Campus of Nanjing University Nanjing, P.R.China, 210093 Homepage: http://cs.nju.edu.cn/rl/weiweiwang

Re: How to do alias(Pinyin) search in Lucene

2009-12-16 Thread Weiwei Wang
> tips complimentary to Robert's suggestions.. > > http://wiki.apache.org/lucene-java/ImproveSearchingSpeed > > <http://wiki.apache.org/lucene-java/ImproveSearchingSpeed>HTH > Erick > > 2009/12/15 Weiwei Wang > > > Thanks Robert, a lot is learned from

Re: Regex Help

2009-12-15 Thread Weiwei Wang
Same reason, i do not know which delimiter regex to use:-( On Wed, Dec 16, 2009 at 3:02 PM, Ghazal Gharooni wrote: > Hello, > Why don't you use String Tokenizer for splitting the result? > > > On Tue, Dec 15, 2009 at 9:45 PM, Weiwei Wang wrote: > > I want to split

Regex Help

2009-12-15 Thread Weiwei Wang
I want to split this parsed result string: name:"zhong guo" name:friend server:172.16.65.79 into name:"zhong guo" name:friend server:172.16.65.79 how can I write a regular pattern to do that? I'm not familiar with regex and tried a few patterns which didn't work

Re: How to do alias(Pinyin) search in Lucene

2009-12-15 Thread Weiwei Wang
mes a necessity if you are using expensive objects like this. > > 2009/12/15 Weiwei Wang > > > Finally, i make it run, however, it works so slow > > > > 2009/12/15 Weiwei Wang > > > > > got it, thanks, Robert > > > > > > > > > On T

Re: How to do alias(Pinyin) search in Lucene

2009-12-15 Thread Weiwei Wang
Finally, i make it run, however, it works so slow 2009/12/15 Weiwei Wang > got it, thanks, Robert > > > On Tue, Dec 15, 2009 at 10:19 PM, Robert Muir wrote: > >> if you have lucene 2.9 or 3.0 source code, just run patch -p0 < >> /path/to/LUCENE-XXYY.patch fr

Re: Document category identification in query

2009-12-15 Thread Weiwei Wang
rom ? > > Thanks. > -- Weiwei Wang Alex Wang 王巍巍 Room 403, Mengmin Wei Building Computer Science Department Gulou Campus of Nanjing University Nanjing, P.R.China, 210093 Homepage: http://cs.nju.edu.cn/rl/weiweiwang

Re: How to do alias(Pinyin) search in Lucene

2009-12-15 Thread Weiwei Wang
to integrate some of this soon for a future > release. > > On Tue, Dec 15, 2009 at 9:13 AM, Weiwei Wang wrote: > > > Yes, i found the patch file LUCENE-1488.patch and there's no icu > directory > > in my dowloaded contrib directory. > > > > I'm

Re: How to do alias(Pinyin) search in Lucene

2009-12-15 Thread Weiwei Wang
e, Dec 15, 2009 at 9:51 PM, Robert Muir wrote: > look at the latest patch file attached to the issue, it should work with > lucene 2.9 or greater (I think) > > 2009/12/15 Weiwei Wang > > > where can i find the source code? > > > > On Tue, Dec 15, 2009 at 9:40 P

Re: How to do alias(Pinyin) search in Lucene

2009-12-15 Thread Weiwei Wang
("中国")); >ICUTransformFilter filter = new ICUTransformFilter(tokenizer, pinyin); >assertTokenStreamContents(filter, new String[] { "zhongguo" } ); > > > 2009/12/15 Weiwei Wang > > > Hi, guys, > > I'm implementing a search engine bas

How to do alias(Pinyin) search in Lucene

2009-12-15 Thread Weiwei Wang
guo the results will include documents containing "中国" or even Chinese Anybody here know how to achieve this? -- Weiwei Wang Alex Wang 王巍巍 Room 403, Mengmin Wei Building Computer Science Department Gulou Campus of Nanjing University Nanjing, P.R.China, 210093 Homepage: http://cs.nju.edu.cn/rl/weiweiwang

Re: I need to implement a TokenFilter to break season07

2009-12-15 Thread Weiwei Wang
WordDelimiterFilter is implemented in an old version where nextToken is called On Tue, Dec 15, 2009 at 7:17 PM, Koji Sekiguchi wrote: > Weiwei Wang wrote: > >> Hi, all >> I currently need a TokenFilter to break token season07 into two >> tokens >> season 07 &

Re: Lucene Analyzer that can handle C++ vs C#

2009-12-15 Thread Weiwei Wang
us" is > not > >> really an option for us. > >> > > > > ----- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > Send instant messages to your online friends http://in.messenger.yahoo.com > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Weiwei Wang Alex Wang 王巍巍 Room 403, Mengmin Wei Building Computer Science Department Gulou Campus of Nanjing University Nanjing, P.R.China, 210093 Homepage: http://cs.nju.edu.cn/rl/weiweiwang

Re: Search correction

2009-12-15 Thread Weiwei Wang
nce > measurement function to build up a suggestion array from the > suggestion index. > I have no comparison how good it works with chinese text though. > simon > > On Tue, Dec 15, 2009 at 8:08 AM, Weiwei Wang wrote: > > Hi, all, > > Most of us should have so

I need to implement a TokenFilter to break season07

2009-12-15 Thread Weiwei Wang
Hi, all I currently need a TokenFilter to break token season07 into two tokens season 07 I tried PatternReplaceCharFilter to replace "season07" with "season 07", however, the offset is not correct for Highlighting. For this reason, I want to implement a TokenFilter, but I do not know how to

Re: Offset Problem

2009-12-14 Thread Weiwei Wang
got it, thanks, Koji On Mon, Dec 14, 2009 at 9:19 PM, Koji Sekiguchi wrote: > Weiwei Wang wrote: > >> The offset is incorrect for PatternReplaceCharFilter so the hilighting >> result is wrong. >> >> How to fix it? >> >> >> > As I noted in the

Re: SnowballAnalyzer and StopAnalyzer.ENGLISH_STOP_WORDS_SET ?

2009-12-14 Thread Weiwei Wang
o I can > use the non deprecated stopwords from StopAnalyzer, or am I barking up the > wrong tree here? > > Thanks > Nick > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For addi

Re: How to do ranking in lucene?

2009-12-13 Thread Weiwei Wang
done? > > Thanks in advance, > Dhivya > > > The INTERNET now has a personality. YOURS! See your Yahoo! Homepage. > http://in.yahoo.com/ -- Weiwei Wang Alex Wang 王巍巍 Room 403, Mengmin Wei Building Computer Science Department Gulou Campus of Nanjing University N

Problem in MappingCharFilter and PatternReplaceCharFilter

2009-12-13 Thread Weiwei Wang
he offset will not be correct. 2. I use PatternReplaceCharFilter to filter season07 or something like this into season 07, the pattern i use is shown below String pattern = "([\\p{Alpha}]+)(\\d+)"; String replace = "1,{ },2"; the offset is not correct for Highlighting -- Weiwe

Offset Problem

2009-12-13 Thread Weiwei Wang
The offset is incorrect for PatternReplaceCharFilter so the hilighting result is wrong. How to fix it? On Mon, Dec 14, 2009 at 11:43 AM, Weiwei Wang wrote: > All solr souce downloaded, and I found PatternReplaceCharFilter is very > useful for my project. > > Thanks > > >

Re: Looking for a MappingCharFilter that accepts regular expressions

2009-12-13 Thread Weiwei Wang
All solr souce downloaded, and I found PatternReplaceCharFilter is very useful for my project. Thanks On Mon, Dec 14, 2009 at 11:14 AM, Weiwei Wang wrote: > I need the source file not the patch file, where can i download it? > > > On Mon, Dec 14, 2009 at 1:15 AM, Koji Sek

Re: Looking for a MappingCharFilter that accepts regular expressions

2009-12-13 Thread Weiwei Wang
onal commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >>> Hi Paul, >> >> I've written a patch for this kind of purpose. See: >> >> https://issues.apache.org/jira/browse/SOLR-1653 >> >> Koji >> >> Oops. I thou

Re: Recover special terms from StandardTokenizer

2009-12-13 Thread Weiwei Wang
fset will not be correct. On Sun, Dec 13, 2009 at 9:38 PM, Weiwei Wang wrote: > Thank you very much, Uwe, I found the problem. > > > 2009/12/13 Uwe Schindler > >> MappingCharFilter definitely preserves the offsets from the original >> reader. >> Yo can verify

Re: Recover special terms from StandardTokenizer

2009-12-13 Thread Weiwei Wang
arFilter calls and verify > where your offsets change. > > I cannot help more. > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: Weiwei Wa

Re: Recover special terms from StandardTokenizer

2009-12-13 Thread Weiwei Wang
g for "C++" and > you > are done, why lowercasing all because of one char? > > And what's RosaMappingCharFilter? A pink one? *g* > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > &

Re: Index Update

2009-12-13 Thread Weiwei Wang
to get back to your starting index, if > anything goes wrong. If nothing goes wrong, call IndexWriter.commit. > > Outside readers, even newly opened ones, will see either the starting > state, or the finished state, and nothing in between. > > Mike > > On Sat, Dec 12, 2009

Re: Recover special terms from StandardTokenizer

2009-12-13 Thread Weiwei Wang
> Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: Weiwei Wang [mailto:ww.wang...@gmail.com] > > Sent: Sunday, December 13, 2009 11:43 AM > > To: java-user

Re: Recover special terms from StandardTokenizer

2009-12-13 Thread Weiwei Wang
Problem solved. Now another problem comes. As I want to use Highlighter in my system, the token offset is incorrect after the MappingCharFilter is used. Koji, do you known how to fix the offset problem? On Sun, Dec 13, 2009 at 11:12 AM, Weiwei Wang wrote: > I use Luke to check the result

Re: Index Update

2009-12-12 Thread Weiwei Wang
; but there's no reason to hold off on committing changes in > your indexwriter for a long time and risk losing the changes > should your program abort. > > HTH > Erick > > On Sat, Dec 12, 2009 at 9:08 AM, Weiwei Wang wrote: > > > Gotcha, tThanks, Uwe > > >

Re: Recover special terms from StandardTokenizer

2009-12-12 Thread Weiwei Wang
I use Luke to check the result and find only c exists as a term, no cplusplus found in the index On Sun, Dec 13, 2009 at 10:34 AM, Weiwei Wang wrote: > Thanks, Koji, I followed your advice and change my analyzer as shown below: > NormalizeCharMap RECOVERY_MAP = new NormalizeC

Re: Recover special terms from StandardTokenizer

2009-12-12 Thread Weiwei Wang
rch c++ on the built index, nothing is returned. I do not know what's wrong with my code. Waiting for your replay On Fri, Dec 11, 2009 at 9:43 PM, Weiwei Wang wrote: > Thanks, Koji > > > On Fri, Dec 11, 2009 at 7:59 PM, Koji Sekiguchi wrote: > >> MappingChar

Re: Index Update

2009-12-12 Thread Weiwei Wang
-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: Weiwei Wang [mailto:ww.wang...@gmail.com] > > Sent: Saturday, December 12, 2009 1:25 PM > > To: java-user@lucene.apache.org > > Subj

Tell me the difference

2009-12-12 Thread Weiwei Wang
Hi, all, Suppose I want to index this string NashQ/c++.test and i used the following procedure to do the processing. NormalizeCharMap RECOVERY_MAP = new NormalizeCharMap(); RECOVERY_MAP.add("c++","cplusplus$"); CharFilter filter = new LowercaseCharFilter(reader);//LowercaseCharFilter, see the

Index Update

2009-12-12 Thread Weiwei Wang
service. I want to use only one copy of index, and do updating searching on the same index? Does the updating process will effect the search result when the updating process is running? -- Weiwei Wang Alex Wang 王巍巍 Room 403, Mengmin Wei Building Computer Science Department Gulou Campus of Nanjing

Re: Recover special terms from StandardTokenizer

2009-12-11 Thread Weiwei Wang
s expressed here belong to everybody, the opinions to me. The >> distinction is yours to draw >> >> >> On Fri, Dec 11, 2009 at 11:00 AM, Weiwei Wang >> wrote: >> >> >> >>> Hi, all, >>>I designed a ftp search engine based o

Recover special terms from StandardTokenizer

2009-12-10 Thread Weiwei Wang
Hi, all, I designed a ftp search engine based on Lucene. I did a few modifications to the StandardTokenizer. My problem is: C++ is tokenized as c from StandardTokenizer and I want to recover it from the TokenStream from StandardTokenizer What should I do? -- Weiwei Wang Alex Wang 王巍巍

Re: Index file compatibility and a migration plan to lucene 3

2009-12-09 Thread Weiwei Wang
er first is a better idea for most > situations, but I wanted to know if writer first would work for me for 2.3.1 > -> 3.0.0. > > -Original Message- > From: Weiwei Wang [mailto:ww.wang...@gmail.com] > Sent: 09 December 2009 12:21 > To: java-user@lucene.apache.org > Subject:

Re: Index file compatibility and a migration plan to lucene 3

2009-12-09 Thread Weiwei Wang
to > http://wiki.apache.org/lucene-java/BackwardsCompatibility#File_Formats > > Does this sound like a good plan? > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Weiwei Wang Alex Wang 王巍巍 Room 403, Mengmin Wei Building Computer Science Department Gulou Campus of Nanjing University Nanjing, P.R.China, 210093 Homepage: http://cs.nju.edu.cn/rl/weiweiwang

Re: HOW to do date range searchi in 3.0

2009-12-08 Thread Weiwei Wang
3 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: Weiwei Wang [mailto:ww.wang...@gmail.com] > > Sent: Wednesday, December 09, 2009 4:23 AM > > To: java-user@lucene.apache.org > > Subject: HOW to do date range se

HOW to do date range searchi in 3.0

2009-12-08 Thread Weiwei Wang
fore in version 2.4.1(I'm updating my project from version 2.4.1 to lucenen 3.0.0) Could anybody here offer me a solution? -- Weiwei Wang Alex Wang 王巍巍 Room 403, Mengmin Wei Building Computer Science Department Gulou Campus of Nanjing University Nanjing, P.R.China, 210093 Homepage: htt

Re: org.apache.lucene.search.RemoteSearchable missing

2009-12-08 Thread Weiwei Wang
Thanks, so many changes in 3.0.0 On Tue, Dec 8, 2009 at 8:32 PM, Mark Miller wrote: > Weiwei Wang wrote: > > Hi,all, > > I can't not find this class in the downloaded jar and I can't figure > out > > what's wrong. > > Does anybody here know h

org.apache.lucene.search.RemoteSearchable missing

2009-12-08 Thread Weiwei Wang
Hi,all, I can't not find this class in the downloaded jar and I can't figure out what's wrong. Does anybody here know how to fix it? -- Weiwei Wang Alex Wang 王巍巍 Room 403, Mengmin Wei Building Computer Science Department Gulou Campus of Nanjing University Nanjing, P.R.China, 2

Re: About Lucene ...

2009-12-03 Thread Weiwei Wang
Stefan > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Weiwei Wang Alex Wang 王巍巍 Room 403, Mengmin Wei

Re: Need help regarding implementation of autosuggest using jquery

2009-12-01 Thread Weiwei Wang
gt; > >>> > eMail: u...@thetaphi.de > > >>> > > > >>> > > > >>> > > -Original Message- > > >>> > > From: DHIVYA M [mailto:dhivyakrishna...@yahoo.com] > > >>> > > Sent: Wednesday, November 25, 2009 8:06 AM > > >>> > > To: java user >