Re: StopWords problem

2007-12-26 Thread Liaqat Ali
Grant Ingersoll wrote: On Dec 26, 2007, at 5:24 PM, Liaqat Ali wrote: - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] No, at this level I am not using any stemming technique. I

Re: StopWords problem

2007-12-26 Thread Grant Ingersoll
On Dec 26, 2007, at 5:24 PM, Liaqat Ali wrote: - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] No, at this level I am not using any stemming technique. I am just trying to elim

Re: StopWords problem

2007-12-26 Thread Liaqat Ali
Grant Ingersoll wrote: Are you altering (stemming) the token before it gets to the StopFilter? On Dec 26, 2007, at 5:08 PM, Liaqat Ali wrote: Doron Cohen wrote: On Dec 26, 2007 10:33 PM, Liaqat Ali <[EMAIL PROTECTED]> wrote: Using javac -encoding UTF-8 still raises the following error. ur

Re: StopWords problem

2007-12-26 Thread Grant Ingersoll
Are you altering (stemming) the token before it gets to the StopFilter? On Dec 26, 2007, at 5:08 PM, Liaqat Ali wrote: Doron Cohen wrote: On Dec 26, 2007 10:33 PM, Liaqat Ali <[EMAIL PROTECTED]> wrote: Using javac -encoding UTF-8 still raises the following error. urduIndexer.java : illegal

Re: StopWords problem

2007-12-26 Thread Liaqat Ali
Doron Cohen wrote: On Dec 26, 2007 10:33 PM, Liaqat Ali <[EMAIL PROTECTED]> wrote: Using javac -encoding UTF-8 still raises the following error. urduIndexer.java : illegal character: \65279 ? ^ 1 error What I am doing wrong? If you have the stop-words in a file, say one word in a l

Re: StopWords problem

2007-12-26 Thread Doron Cohen
On Dec 26, 2007 10:33 PM, Liaqat Ali <[EMAIL PROTECTED]> wrote: > Using javac -encoding UTF-8 still raises the following error. > > urduIndexer.java : illegal character: \65279 > ? > ^ > 1 error > > What I am doing wrong? > If you have the stop-words in a file, say one word in a line, they can be

Re: StopWords problem

2007-12-26 Thread 李晓峰
or you can save it as "Unicode" and javac -encoding Unicode this way you can still use notepad. Liaqat Ali 写道: 李晓峰 wrote: "javac" has an option "-encoding", which tells the compiler the encoding the input source file is using, this will probably solve the problem. or you can try the unicode e

Re: StopWords problem

2007-12-26 Thread 李晓峰
It's the notepad. It adds byte-order-mark(BOM, in this case 65279, or 0xfeff.) in front of your file, which javac does not recognize for reasons not quite clear to me. here is the bug: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058 it won't be fixed, so try to eliminate BOM before co

Re: StopWords problem

2007-12-26 Thread Liaqat Ali
李晓峰 wrote: "javac" has an option "-encoding", which tells the compiler the encoding the input source file is using, this will probably solve the problem. or you can try the unicode escape: \u, then you can save it in ANSI, had for human to read though. or use an IDE, eclipse is a good choic

Re: StopWords problem

2007-12-26 Thread 李晓峰
"javac" has an option "-encoding", which tells the compiler the encoding the input source file is using, this will probably solve the problem. or you can try the unicode escape: \u, then you can save it in ANSI, had for human to read though. or use an IDE, eclipse is a good choice, you can se

StopWords problem

2007-12-26 Thread Liaqat Ali
Hi, Doro Cohen Thanks for your reply, but I am facing a small problem over here. As I am using notepad for coding, then in which format the file should be saved. public static final String[] URDU_STOP_WORDS = { "کے" ,"کی" ,"سے" ,"کا" ,"کو" ,"ہے" }; Analyzer analyzer = new StandardAnalyzer(

Analyzer choices for indexing and searching multiple languages

2007-12-26 Thread Jay Hill
I'm working on a project where we will be searching across several languages with a single query. There will be different categories which will include different groups of languages to search (i.e. category "a": English, French, Spanish; category "b": Spanish, Portugese, Itailian, etc) Originally I

Re: Pagination ...

2007-12-26 Thread Mike Richmond
You might want to take a look at Solr (http://lucene.apache.org/solr/). You could either use Solr directly, or see how they implement paging. --Mike On Dec 26, 2007 12:12 PM, Zhou Qi <[EMAIL PROTECTED]> wrote: > Using the search function for pagination will carry out unnecessary index > searc

Re: Index lucene database details.

2007-12-26 Thread Zhou Qi
Hi Grant, The exception is throw from java native method."Failed to merge indexes, java.lang.OutOfMemoryError: Java heap space ". ( I have set the -Xmx1024m in JVM.) I guess it is similar as the problem appeared in previous thread before ( http://www.nabble.com/Index-merge-and-java-heap-space-tt50

Re: Pagination ...

2007-12-26 Thread Zhou Qi
Using the search function for pagination will carry out unnecessary index search when you are going previous or next. Generally, most of the information need (e.g 80%) can be satisfied by the first 100 documents (20%). In lucene, the returing documents is set to 100 for the sake of speed. I am not

RE: Pagination ...

2007-12-26 Thread Dragon Fly
Any advice on this? Thanks. > From: [EMAIL PROTECTED] > To: java-user@lucene.apache.org > Subject: Pagination ... > Date: Sat, 22 Dec 2007 10:19:30 -0500 > > > Hi, > > What is the most efficient way to do pagination in Lucene? I have always done > the following because this "flavor" of the se

Re: optimize Index problem

2007-12-26 Thread Grant Ingersoll
Great, I think. Except now I am really interested about the exception and what settings you had for heap size, Lucene version, etc. On Dec 23, 2007, at 11:03 PM, Zhou Qi wrote: Hi , Grant After I adjust the mergefactor of indexwriter from 1000 to 100, it worked. Thank you. 22 Dec 20

Re: Index lucene database details.

2007-12-26 Thread Grant Ingersoll
I would start at the Lucene Java home page (http://lucene.apache.org/java ) and dig in from there. There are a number of good docs on Scoring and the IR model used (Boolean plus Vector.) From there, I would dig into the javadocs and whip up some example code that indexes a set of tokens an

Re: Modifying StopAnalyzer

2007-12-26 Thread Doron Cohen
> > can we modify the StopyAnalyzer to insert Stop Words of > another language, instead of English, like Urdu given below: > public static final String[] URDU_STOP_WORDS = { "پر", "کا", "کی", "کو" }; > "new StandardAnalyzer(URDU_STOP_WORDS)" should work. Regards, Doron

Modifying StopAnalyzer

2007-12-26 Thread Liaqat Ali
Hi, Erick Thanks for your suggestion, putting the declaration of StringBuffer variable sb inside the for loop is working well. I want to ask another question, can we modify the StopyAnalyzer to insert Stop Words of another language, instead of English, like Urdu given below: public stati