1. source string: 7
2. WhitespaceTokenizer + EGramTokenFilter
3. FastVectorHighlighter,
4. debug info: subInfos=(777((8,11))777((5,8))777((2,5)))/3.0(2,102),
srcIndex is not correctly computed for the second loop of the outer for-loop
2011/5/23 Weiwei Wang
> the following code ha
the following code has a bug of StringIndexOutofBounds when multiple matched
terms need highlight
private String makeFragment( WeightedFragInfo fragInfo, String src, int s,
String[] preTags, String[] postTags, Encoder encoder ){
StringBuilder fragment = new StringBuilder();
int srcIn
search lucene snowball
在 2011-4-24 下午7:14,"wenlei zhou" 写道:
>
> Hi, guys
> I want to stem some English text.
> For example:
>
> String stemTerm(String term){
> ...
> }
>
> We can get the stemmed term of the input term.
>
> Does any one knows how to use lucene to achieve this target?
>
> Sincerely
hi,all
I'm working on a Chinese contact search project, I need to transform
the Chinese words to its Pinyin form.
e.g.
中国--> zhongguo
The problem I encounter is that for some chinese words which have more than
one transforms, like. 贾-> jia, 贾->gu, ...
I already used the ICUTransformFilter
t;
>
>
> On 4/19/2011 12:11 AM, Weiwei Wang wrote:
>
>> Hi, buddies,
>> I'm reading something about solr and elastic-search, the thing i have
>> been curious is how to make search engine distributed(using something like
>> hadoop?).
>>
>>
de me more materials to get me clear about the
distributed architecture of such open source search engines?
Thanks~
---
Weiwei Wang
gtalk: ww.wang...@gmail.com
d Unix, System Integration
> http://www.sigram.com Contact: info at sigram dot com
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
--
Weiwei Wang
Alex Wang
王巍巍
Room 403, Men
help - if you own the
> copyright and patent rights to the work, you may release it under the
> Apache
> license by putting the code somewhere for people to get (sourceforge,
> google-code, github, your own server, wherever), and include the
> above-mentioned license file.
>
>
#x27; name.
> But you could have a try. If you have interesting results, let me know.
> Thanks!
>
> That's my opinion.
>
> 2009/12/21 Alex
>
> > Hi !
> >
> > Many thanks to both of you for your suggestions and answers!
> >
> > What Weiwei Wang
ery.
> > >
> >
> >
> > --
> > Spica Framework: http://code.google.com/p/spica
> > http://www.twitter.com/pcdinh
> > http://groups.google.com/group/phpvietnam
> >
>
--
Weiwei Wang
Alex Wang
王巍巍
Room 403, Mengmin Wei Building
Computer Science Department
Gulou Campus of Nanjing University
Nanjing, P.R.China, 210093
Homepage: http://cs.nju.edu.cn/rl/weiweiwang
> tips complimentary to Robert's suggestions..
>
> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
>
> <http://wiki.apache.org/lucene-java/ImproveSearchingSpeed>HTH
> Erick
>
> 2009/12/15 Weiwei Wang
>
> > Thanks Robert, a lot is learned from
Same reason, i do not know which delimiter regex to use:-(
On Wed, Dec 16, 2009 at 3:02 PM, Ghazal Gharooni
wrote:
> Hello,
> Why don't you use String Tokenizer for splitting the result?
>
>
> On Tue, Dec 15, 2009 at 9:45 PM, Weiwei Wang wrote:
> > I want to split
I want to split this parsed result string: name:"zhong guo" name:friend
server:172.16.65.79
into
name:"zhong guo"
name:friend
server:172.16.65.79
how can I write a regular pattern to do that?
I'm not familiar with regex and tried a few patterns which didn't work
mes a necessity if you are using expensive objects like this.
>
> 2009/12/15 Weiwei Wang
>
> > Finally, i make it run, however, it works so slow
> >
> > 2009/12/15 Weiwei Wang
> >
> > > got it, thanks, Robert
> > >
> > >
> > > On T
Finally, i make it run, however, it works so slow
2009/12/15 Weiwei Wang
> got it, thanks, Robert
>
>
> On Tue, Dec 15, 2009 at 10:19 PM, Robert Muir wrote:
>
>> if you have lucene 2.9 or 3.0 source code, just run patch -p0 <
>> /path/to/LUCENE-XXYY.patch fr
rom ?
>
> Thanks.
>
--
Weiwei Wang
Alex Wang
王巍巍
Room 403, Mengmin Wei Building
Computer Science Department
Gulou Campus of Nanjing University
Nanjing, P.R.China, 210093
Homepage: http://cs.nju.edu.cn/rl/weiweiwang
to integrate some of this soon for a future
> release.
>
> On Tue, Dec 15, 2009 at 9:13 AM, Weiwei Wang wrote:
>
> > Yes, i found the patch file LUCENE-1488.patch and there's no icu
> directory
> > in my dowloaded contrib directory.
> >
> > I'm
e, Dec 15, 2009 at 9:51 PM, Robert Muir wrote:
> look at the latest patch file attached to the issue, it should work with
> lucene 2.9 or greater (I think)
>
> 2009/12/15 Weiwei Wang
>
> > where can i find the source code?
> >
> > On Tue, Dec 15, 2009 at 9:40 P
("中国"));
>ICUTransformFilter filter = new ICUTransformFilter(tokenizer, pinyin);
>assertTokenStreamContents(filter, new String[] { "zhongguo" } );
>
>
> 2009/12/15 Weiwei Wang
>
> > Hi, guys,
> > I'm implementing a search engine bas
guo the results will
include documents containing "中国" or even Chinese
Anybody here know how to achieve this?
--
Weiwei Wang
Alex Wang
王巍巍
Room 403, Mengmin Wei Building
Computer Science Department
Gulou Campus of Nanjing University
Nanjing, P.R.China, 210093
Homepage: http://cs.nju.edu.cn/rl/weiweiwang
WordDelimiterFilter is implemented in an old version where nextToken is
called
On Tue, Dec 15, 2009 at 7:17 PM, Koji Sekiguchi wrote:
> Weiwei Wang wrote:
>
>> Hi, all
>> I currently need a TokenFilter to break token season07 into two
>> tokens
>> season 07
&
us" is
> not
> >> really an option for us.
> >>
> >
> > -----
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> Send instant messages to your online friends http://in.messenger.yahoo.com
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
--
Weiwei Wang
Alex Wang
王巍巍
Room 403, Mengmin Wei Building
Computer Science Department
Gulou Campus of Nanjing University
Nanjing, P.R.China, 210093
Homepage: http://cs.nju.edu.cn/rl/weiweiwang
nce
> measurement function to build up a suggestion array from the
> suggestion index.
> I have no comparison how good it works with chinese text though.
> simon
>
> On Tue, Dec 15, 2009 at 8:08 AM, Weiwei Wang wrote:
> > Hi, all,
> > Most of us should have so
Hi, all
I currently need a TokenFilter to break token season07 into two tokens
season 07
I tried PatternReplaceCharFilter to replace "season07" with "season 07",
however, the offset is not correct for Highlighting. For this reason, I want
to implement a TokenFilter, but I do not know how to
got it, thanks, Koji
On Mon, Dec 14, 2009 at 9:19 PM, Koji Sekiguchi wrote:
> Weiwei Wang wrote:
>
>> The offset is incorrect for PatternReplaceCharFilter so the hilighting
>> result is wrong.
>>
>> How to fix it?
>>
>>
>>
> As I noted in the
o I can
> use the non deprecated stopwords from StopAnalyzer, or am I barking up the
> wrong tree here?
>
> Thanks
> Nick
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For addi
done?
>
> Thanks in advance,
> Dhivya
>
>
> The INTERNET now has a personality. YOURS! See your Yahoo! Homepage.
> http://in.yahoo.com/
--
Weiwei Wang
Alex Wang
王巍巍
Room 403, Mengmin Wei Building
Computer Science Department
Gulou Campus of Nanjing University
N
he offset will not be correct.
2. I use PatternReplaceCharFilter to filter season07 or something like this
into season 07, the pattern i use is shown below
String pattern = "([\\p{Alpha}]+)(\\d+)";
String replace = "1,{ },2";
the offset is not correct for Highlighting
--
Weiwe
The offset is incorrect for PatternReplaceCharFilter so the hilighting
result is wrong.
How to fix it?
On Mon, Dec 14, 2009 at 11:43 AM, Weiwei Wang wrote:
> All solr souce downloaded, and I found PatternReplaceCharFilter is very
> useful for my project.
>
> Thanks
>
>
>
All solr souce downloaded, and I found PatternReplaceCharFilter is very
useful for my project.
Thanks
On Mon, Dec 14, 2009 at 11:14 AM, Weiwei Wang wrote:
> I need the source file not the patch file, where can i download it?
>
>
> On Mon, Dec 14, 2009 at 1:15 AM, Koji Sek
onal commands, e-mail: java-user-h...@lucene.apache.org
>>>
>>>
>>> Hi Paul,
>>
>> I've written a patch for this kind of purpose. See:
>>
>> https://issues.apache.org/jira/browse/SOLR-1653
>>
>> Koji
>>
>> Oops. I thou
fset will not be correct.
On Sun, Dec 13, 2009 at 9:38 PM, Weiwei Wang wrote:
> Thank you very much, Uwe, I found the problem.
>
>
> 2009/12/13 Uwe Schindler
>
>> MappingCharFilter definitely preserves the offsets from the original
>> reader.
>> Yo can verify
arFilter calls and verify
> where your offsets change.
>
> I cannot help more.
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: Weiwei Wa
g for "C++" and
> you
> are done, why lowercasing all because of one char?
>
> And what's RosaMappingCharFilter? A pink one? *g*
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> &
to get back to your starting index, if
> anything goes wrong. If nothing goes wrong, call IndexWriter.commit.
>
> Outside readers, even newly opened ones, will see either the starting
> state, or the finished state, and nothing in between.
>
> Mike
>
> On Sat, Dec 12, 2009
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: Weiwei Wang [mailto:ww.wang...@gmail.com]
> > Sent: Sunday, December 13, 2009 11:43 AM
> > To: java-user
Problem solved. Now another problem comes.
As I want to use Highlighter in my system, the token offset is incorrect
after the MappingCharFilter is used.
Koji, do you known how to fix the offset problem?
On Sun, Dec 13, 2009 at 11:12 AM, Weiwei Wang wrote:
> I use Luke to check the result
; but there's no reason to hold off on committing changes in
> your indexwriter for a long time and risk losing the changes
> should your program abort.
>
> HTH
> Erick
>
> On Sat, Dec 12, 2009 at 9:08 AM, Weiwei Wang wrote:
>
> > Gotcha, tThanks, Uwe
> >
>
I use Luke to check the result and find only c exists as a term, no
cplusplus found in the index
On Sun, Dec 13, 2009 at 10:34 AM, Weiwei Wang wrote:
> Thanks, Koji, I followed your advice and change my analyzer as shown below:
> NormalizeCharMap RECOVERY_MAP = new NormalizeC
rch c++ on the built
index, nothing is returned.
I do not know what's wrong with my code. Waiting for your replay
On Fri, Dec 11, 2009 at 9:43 PM, Weiwei Wang wrote:
> Thanks, Koji
>
>
> On Fri, Dec 11, 2009 at 7:59 PM, Koji Sekiguchi wrote:
>
>> MappingChar
-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: Weiwei Wang [mailto:ww.wang...@gmail.com]
> > Sent: Saturday, December 12, 2009 1:25 PM
> > To: java-user@lucene.apache.org
> > Subj
Hi, all,
Suppose I want to index this string NashQ/c++.test and i used the
following procedure to do the processing.
NormalizeCharMap RECOVERY_MAP = new NormalizeCharMap();
RECOVERY_MAP.add("c++","cplusplus$");
CharFilter filter = new LowercaseCharFilter(reader);//LowercaseCharFilter,
see the
service.
I want to use only one copy of index, and do updating searching on the same
index? Does the updating process will effect the search result when the
updating process is running?
--
Weiwei Wang
Alex Wang
王巍巍
Room 403, Mengmin Wei Building
Computer Science Department
Gulou Campus of Nanjing
s expressed here belong to everybody, the opinions to me. The
>> distinction is yours to draw
>>
>>
>> On Fri, Dec 11, 2009 at 11:00 AM, Weiwei Wang
>> wrote:
>>
>>
>>
>>> Hi, all,
>>>I designed a ftp search engine based o
Hi, all,
I designed a ftp search engine based on Lucene. I did a few
modifications to the StandardTokenizer.
My problem is:
C++ is tokenized as c from StandardTokenizer and I want to recover it from
the TokenStream from StandardTokenizer
What should I do?
--
Weiwei Wang
Alex Wang
王巍巍
er first is a better idea for most
> situations, but I wanted to know if writer first would work for me for 2.3.1
> -> 3.0.0.
>
> -Original Message-
> From: Weiwei Wang [mailto:ww.wang...@gmail.com]
> Sent: 09 December 2009 12:21
> To: java-user@lucene.apache.org
> Subject:
to
> http://wiki.apache.org/lucene-java/BackwardsCompatibility#File_Formats
>
> Does this sound like a good plan?
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
--
Weiwei Wang
Alex Wang
王巍巍
Room 403, Mengmin Wei Building
Computer Science Department
Gulou Campus of Nanjing University
Nanjing, P.R.China, 210093
Homepage: http://cs.nju.edu.cn/rl/weiweiwang
3 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: Weiwei Wang [mailto:ww.wang...@gmail.com]
> > Sent: Wednesday, December 09, 2009 4:23 AM
> > To: java-user@lucene.apache.org
> > Subject: HOW to do date range se
fore in version 2.4.1(I'm updating my project
from version 2.4.1 to lucenen 3.0.0)
Could anybody here offer me a solution?
--
Weiwei Wang
Alex Wang
王巍巍
Room 403, Mengmin Wei Building
Computer Science Department
Gulou Campus of Nanjing University
Nanjing, P.R.China, 210093
Homepage: htt
Thanks, so many changes in 3.0.0
On Tue, Dec 8, 2009 at 8:32 PM, Mark Miller wrote:
> Weiwei Wang wrote:
> > Hi,all,
> > I can't not find this class in the downloaded jar and I can't figure
> out
> > what's wrong.
> > Does anybody here know h
Hi,all,
I can't not find this class in the downloaded jar and I can't figure out
what's wrong.
Does anybody here know how to fix it?
--
Weiwei Wang
Alex Wang
王巍巍
Room 403, Mengmin Wei Building
Computer Science Department
Gulou Campus of Nanjing University
Nanjing, P.R.China, 2
Stefan
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
--
Weiwei Wang
Alex Wang
王巍巍
Room 403, Mengmin Wei
gt; > >>> > eMail: u...@thetaphi.de
> > >>> >
> > >>> >
> > >>> > > -Original Message-
> > >>> > > From: DHIVYA M [mailto:dhivyakrishna...@yahoo.com]
> > >>> > > Sent: Wednesday, November 25, 2009 8:06 AM
> > >>> > > To: java user
>
53 matches
Mail list logo