Grant Ingersoll wrote:
On Dec 26, 2007, at 5:24 PM, Liaqat Ali wrote:
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
No, at this level I am not using any stemming technique. I
On Dec 26, 2007, at 5:24 PM, Liaqat Ali wrote:
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
No, at this level I am not using any stemming technique. I am just
trying to elim
Grant Ingersoll wrote:
Are you altering (stemming) the token before it gets to the StopFilter?
On Dec 26, 2007, at 5:08 PM, Liaqat Ali wrote:
Doron Cohen wrote:
On Dec 26, 2007 10:33 PM, Liaqat Ali <[EMAIL PROTECTED]> wrote:
Using javac -encoding UTF-8 still raises the following error.
ur
Are you altering (stemming) the token before it gets to the StopFilter?
On Dec 26, 2007, at 5:08 PM, Liaqat Ali wrote:
Doron Cohen wrote:
On Dec 26, 2007 10:33 PM, Liaqat Ali <[EMAIL PROTECTED]> wrote:
Using javac -encoding UTF-8 still raises the following error.
urduIndexer.java : illegal
Doron Cohen wrote:
On Dec 26, 2007 10:33 PM, Liaqat Ali <[EMAIL PROTECTED]> wrote:
Using javac -encoding UTF-8 still raises the following error.
urduIndexer.java : illegal character: \65279
?
^
1 error
What I am doing wrong?
If you have the stop-words in a file, say one word in a l
On Dec 26, 2007 10:33 PM, Liaqat Ali <[EMAIL PROTECTED]> wrote:
> Using javac -encoding UTF-8 still raises the following error.
>
> urduIndexer.java : illegal character: \65279
> ?
> ^
> 1 error
>
> What I am doing wrong?
>
If you have the stop-words in a file, say one word in a line,
they can be
or you can save it as "Unicode" and javac -encoding Unicode
this way you can still use notepad.
Liaqat Ali 写道:
李晓峰 wrote:
"javac" has an option "-encoding", which tells the compiler the
encoding the input source file is using, this will probably solve the
problem.
or you can try the unicode e
It's the notepad.
It adds byte-order-mark(BOM, in this case 65279, or 0xfeff.) in front of
your file, which javac does not recognize for reasons not quite clear to me.
here is the bug: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058
it won't be fixed, so try to eliminate BOM before co
李晓峰 wrote:
"javac" has an option "-encoding", which tells the compiler the
encoding the input source file is using, this will probably solve the
problem.
or you can try the unicode escape: \u, then you can save it in
ANSI, had for human to read though.
or use an IDE, eclipse is a good choic
"javac" has an option "-encoding", which tells the compiler the encoding
the input source file is using, this will probably solve the problem.
or you can try the unicode escape: \u, then you can save it in ANSI,
had for human to read though.
or use an IDE, eclipse is a good choice, you can se
Hi, Doro Cohen
Thanks for your reply, but I am facing a small problem over here. As I
am using notepad for coding, then in which format the file should be saved.
public static final String[] URDU_STOP_WORDS = { "کے" ,"کی" ,"سے" ,"کا"
,"کو" ,"ہے" };
Analyzer analyzer = new StandardAnalyzer(
I'm working on a project where we will be searching across several languages
with a single query. There will be different categories which will include
different groups of languages to search (i.e. category "a": English, French,
Spanish; category "b": Spanish, Portugese, Itailian, etc) Originally I
You might want to take a look at Solr (http://lucene.apache.org/solr/). You
could either use Solr directly, or see how they implement paging.
--Mike
On Dec 26, 2007 12:12 PM, Zhou Qi <[EMAIL PROTECTED]> wrote:
> Using the search function for pagination will carry out unnecessary index
> searc
Hi Grant,
The exception is throw from java native method."Failed to merge indexes,
java.lang.OutOfMemoryError: Java heap space ". ( I have set the -Xmx1024m in
JVM.)
I guess it is similar as the problem appeared in previous thread before (
http://www.nabble.com/Index-merge-and-java-heap-space-tt50
Using the search function for pagination will carry out unnecessary index
search when you are going previous or next. Generally, most of the
information need (e.g 80%) can be satisfied by the first 100 documents
(20%). In lucene, the returing documents is set to 100 for the sake of
speed.
I am not
Any advice on this? Thanks.
> From: [EMAIL PROTECTED]
> To: java-user@lucene.apache.org
> Subject: Pagination ...
> Date: Sat, 22 Dec 2007 10:19:30 -0500
>
>
> Hi,
>
> What is the most efficient way to do pagination in Lucene? I have always done
> the following because this "flavor" of the se
Great, I think. Except now I am really interested about the exception
and what settings you had for heap size, Lucene version, etc.
On Dec 23, 2007, at 11:03 PM, Zhou Qi wrote:
Hi , Grant
After I adjust the mergefactor of indexwriter from 1000 to 100, it
worked.
Thank you.
22 Dec 20
I would start at the Lucene Java home page (http://lucene.apache.org/java
) and dig in from there. There are a number of good docs on Scoring
and the IR model used (Boolean plus Vector.) From there, I would dig
into the javadocs and whip up some example code that indexes a set of
tokens an
>
> can we modify the StopyAnalyzer to insert Stop Words of
> another language, instead of English, like Urdu given below:
> public static final String[] URDU_STOP_WORDS = { "پر", "کا", "کی", "کو" };
>
"new StandardAnalyzer(URDU_STOP_WORDS)" should work.
Regards,
Doron
Hi, Erick
Thanks for your suggestion, putting the declaration of StringBuffer
variable sb inside the for loop is working well. I want to ask another
question, can we modify the StopyAnalyzer to insert Stop Words of
another language, instead of English, like Urdu given below:
public stati
20 matches
Mail list logo