Try printing all these after you close the writer:
- ((FSDirectory) dir).getFile().getAbsolutePath()
- dir.list().length (n)
- dir.list()[0], .. , dir.list[n]
This should at least help you verify that an index was created and where.
Regards,
Doron
On Dec 27, 2007 12:26 PM, Liaqat Ali <[EMAIL PR
Doron Cohen wrote:
On Dec 27, 2007 11:49 AM, Liaqat Ali <[EMAIL PROTECTED]> wrote:
I got your point. The program given does not give not any error during
compilation and it is interpreted well. But the it does not create any
index. when the StandardAnalyzer() is called without Stopwords list
On Dec 27, 2007 11:49 AM, Liaqat Ali <[EMAIL PROTECTED]> wrote:
> I got your point. The program given does not give not any error during
> compilation and it is interpreted well. But the it does not create any
> index. when the StandardAnalyzer() is called without Stopwords list it
> works well, b
Doron Cohen wrote:
This is not a self contained program - it is incomplete, and it depends
on files on *your* disk...
Still, can you show why you're saying it indexes stopwords?
Can you print here few samples of IndexReader.terms().term()?
BR, Doron
On Dec 27, 2007 10:22 AM, Liaqat Ali <[EMAIL
This is not a self contained program - it is incomplete, and it depends
on files on *your* disk...
Still, can you show why you're saying it indexes stopwords?
Can you print here few samples of IndexReader.terms().term()?
BR, Doron
On Dec 27, 2007 10:22 AM, Liaqat Ali <[EMAIL PROTECTED]> wrote:
Hi Liaqat,
Are you sure that the Urdu characters are being correctly interpreted
by the JVM even during the file I/O operation?
I would expect Unicode characters to be encoded as multi-byte
sequences and so, the string-matching operations would fail (if the
literals are different from the
Doron Cohen wrote:
Hi Liagat,
This part of the code seems correct and should work, so problem
must be elsewhere.
Can you post a short program that demonstrates the problem?
You can start with something like this:
Document doc = new Document();
doc.add(new Field("text",URDU_STOP_WOR
Hi Liagat,
This part of the code seems correct and should work, so problem
must be elsewhere.
Can you post a short program that demonstrates the problem?
You can start with something like this:
Document doc = new Document();
doc.add(new Field("text",URDU_STOP_WORDS[0] +
Grant Ingersoll wrote:
On Dec 26, 2007, at 5:24 PM, Liaqat Ali wrote:
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
No, at this level I am not using any stemming technique. I
On Dec 26, 2007, at 5:24 PM, Liaqat Ali wrote:
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
No, at this level I am not using any stemming technique. I am just
trying to elim
Grant Ingersoll wrote:
Are you altering (stemming) the token before it gets to the StopFilter?
On Dec 26, 2007, at 5:08 PM, Liaqat Ali wrote:
Doron Cohen wrote:
On Dec 26, 2007 10:33 PM, Liaqat Ali <[EMAIL PROTECTED]> wrote:
Using javac -encoding UTF-8 still raises the following error.
ur
Are you altering (stemming) the token before it gets to the StopFilter?
On Dec 26, 2007, at 5:08 PM, Liaqat Ali wrote:
Doron Cohen wrote:
On Dec 26, 2007 10:33 PM, Liaqat Ali <[EMAIL PROTECTED]> wrote:
Using javac -encoding UTF-8 still raises the following error.
urduIndexer.java : illegal
Doron Cohen wrote:
On Dec 26, 2007 10:33 PM, Liaqat Ali <[EMAIL PROTECTED]> wrote:
Using javac -encoding UTF-8 still raises the following error.
urduIndexer.java : illegal character: \65279
?
^
1 error
What I am doing wrong?
If you have the stop-words in a file, say one word in a l
On Dec 26, 2007 10:33 PM, Liaqat Ali <[EMAIL PROTECTED]> wrote:
> Using javac -encoding UTF-8 still raises the following error.
>
> urduIndexer.java : illegal character: \65279
> ?
> ^
> 1 error
>
> What I am doing wrong?
>
If you have the stop-words in a file, say one word in a line,
they can be
or you can save it as "Unicode" and javac -encoding Unicode
this way you can still use notepad.
Liaqat Ali 写道:
李晓峰 wrote:
"javac" has an option "-encoding", which tells the compiler the
encoding the input source file is using, this will probably solve the
problem.
or you can try the unicode e
It's the notepad.
It adds byte-order-mark(BOM, in this case 65279, or 0xfeff.) in front of
your file, which javac does not recognize for reasons not quite clear to me.
here is the bug: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058
it won't be fixed, so try to eliminate BOM before co
李晓峰 wrote:
"javac" has an option "-encoding", which tells the compiler the
encoding the input source file is using, this will probably solve the
problem.
or you can try the unicode escape: \u, then you can save it in
ANSI, had for human to read though.
or use an IDE, eclipse is a good choic
"javac" has an option "-encoding", which tells the compiler the encoding
the input source file is using, this will probably solve the problem.
or you can try the unicode escape: \u, then you can save it in ANSI,
had for human to read though.
or use an IDE, eclipse is a good choice, you can se
Hi, Doro Cohen
Thanks for your reply, but I am facing a small problem over here. As I
am using notepad for coding, then in which format the file should be saved.
public static final String[] URDU_STOP_WORDS = { "کے" ,"کی" ,"سے" ,"کا"
,"کو" ,"ہے" };
Analyzer analyzer = new StandardAnalyzer(
19 matches
Mail list logo