lucene for Arabic and Urdu

2007-09-18 Thread Liaqat Ali
the scratch using Lucene. Liaqat Ali - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

setting up lucene

2007-09-25 Thread Liaqat Ali
Hi All I m facing problems in setting up lucene. kindly some guy guide me in this

Integration of Lucene

2007-10-24 Thread Liaqat Ali
Hi All, I m developing a search engine for Urdu language. I want to use lucene for that purpose. Now the situation is that ---I have a corpus of 2000 Urdu(Variant of Persian and Arabic) documents in XML form, how i will make index of them using Lucene. ---Well there will be need some stemming

Corpus interpretation

2007-10-24 Thread Liaqat Ali
I want to index the Urdu language corpus (200 documents in CES XML DTD format). Is net necessary to break the XML file into 200 different files or it can be indexed in the original form using Lucene. Kindly guide in this regard. ---

Lucene setting

2007-11-19 Thread Liaqat Ali
Hi All, Can some explain to me this line. I encounter this line while setting up Lucene... Connect to the top-level of your Lucene installation Kindly guide me in this regard. Liaqat Ali - To unsubscribe, e-mail: [EMAIL

Lucene Setting

2007-11-19 Thread Liaqat Ali
I m new to lucene and want to clear about some questions. When I unpacked the Lucene, which i downloaded from Apache site. I ran the Build.txt file and there are five steps to set up lucene. Lucene Build Instructions $Id: BUILD.txt 476955 2006-11-19 22:28:41Z hossman $ Basic steps: 0) Instal

Problem in Running Lucene Demo

2007-11-19 Thread Liaqat Ali
his Regard Liaqat Ali - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Help needed

2007-11-23 Thread Liaqat Ali
de me in this regard Liaqat Ali - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Problem with indexing

2007-11-24 Thread Liaqat Ali
situation. Kindly guide me in this regard.. Liaqat Ali - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

LIA example problem

2007-11-25 Thread Liaqat Ali
Hello I m studying Lucene In Action. In chapter 2 the first example in generating errors in this part of code. doc.add(Field.Keyword("id", keywords[i])); doc.add(Field.UnIndexed("country", unindexed[i])); doc.add(Field.UnStored("contents", unstored[i])); doc.add(Field.Text("cit

Problem with Add method

2007-11-29 Thread Liaqat Ali
This code generate error, kindly tell me that what parameters will be use when we use constructors. Document doc = new Document(); doc.add( Field.Keyword("id", keywords[i])); doc.add( Field.UnIndexed("country", unindexed[i])); doc.add(Field.UnStored("contents", unstored[i]));

Deprecated API

2007-11-29 Thread Liaqat Ali
i m studying LIA. but there is a problem with code. When i run the code i get errorsThe errors are related with the use of deprecated APIs.Kindly suggest me the right APIs and also instructions how to handle this situation with other code.. package lia.indexing; import org.apache.lucene.stor

FSDirectory Again

2007-11-30 Thread Liaqat Ali
No you are not getting me. I have this original code. What i should use instead of this code to create a directory, because the dir =FSDirectory.getDirectory(indexDir, true) is deprecated. import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; protected Directo

FSDirectory

2007-11-30 Thread Liaqat Ali
I m facing problem with this code.. dir = new FSDirectory(); dir.getDirectory(indexDir, true); i get error that FSDirectory has protected access. So what i should use instead of it... Liaqat - To unsubscribe, e-mail: [E

Indexing Non-English text

2007-12-04 Thread Liaqat Ali
Hi, I m facing a problem while indexing a small .txt file with Lucene. The file which i want to index with lucene is in Urdu language (varient of Arabic and Persian). But the Index i get is in Unicode form, not in the real form (original Urdu text). This program works good for a file in Englis

Indexing XML document

2007-12-04 Thread Liaqat Ali
Hi all, I want to index an XML file,containing 200 Urdu language (Varient of Arabic and Persian) documents. This corpus is in CES format,consisting of information about author and many more, I just want to extract textual data of each document and relative Doc number and title in each documen

Errors while running LIA code.

2007-12-06 Thread Liaqat Ali
Hi I am trying to run a code from Lucene In Action, but it generate some errors.There is one one warning at compilation time and the errors generate at run time. Given below the code and errors. Kindly give me some clue. thanks... *_Code:_* ///package lia.handlingtypes.xml; import lia.handl

Re: Errors while running LIA code.

2007-12-06 Thread Liaqat Ali
Michael McCandless wrote: See this thread for one suggestion: http://www.gossamer-threads.com/lists/lucene/java-user/55465 Mike "Liaqat Ali" <[EMAIL PROTECTED]> wrote: Hi I am trying to run a code from Lucene In Action, but it generate some errors.There is on

problem in indexing documents

2007-12-25 Thread Liaqat Ali
hello, I am try to make an index of 191 documents stored in 191 text files. I developed a program, which works well with files containing single line, but files with multiple lines posing a problem.So i added while loop to completely extract data from each document. But it has some logical er

Modifying StopAnalyzer

2007-12-26 Thread Liaqat Ali
Hi, Erick Thanks for your suggestion, putting the declaration of StringBuffer variable sb inside the for loop is working well. I want to ask another question, can we modify the StopyAnalyzer to insert Stop Words of another language, instead of English, like Urdu given below: public stati

StopWords problem

2007-12-26 Thread Liaqat Ali
Hi, Doro Cohen Thanks for your reply, but I am facing a small problem over here. As I am using notepad for coding, then in which format the file should be saved. public static final String[] URDU_STOP_WORDS = { "کے" ,"کی" ,"سے" ,"کا" ,"کو" ,"ہے" }; Analyzer analyzer = new StandardAnalyzer(

Re: StopWords problem

2007-12-26 Thread Liaqat Ali
李晓峰 wrote: "javac" has an option "-encoding", which tells the compiler the encoding the input source file is using, this will probably solve the problem. or you can try the unicode escape: \u, then you can save it in ANSI, had for human to read though. or use an IDE, eclipse is a good choic

Re: StopWords problem

2007-12-26 Thread Liaqat Ali
Doron Cohen wrote: On Dec 26, 2007 10:33 PM, Liaqat Ali <[EMAIL PROTECTED]> wrote: Using javac -encoding UTF-8 still raises the following error. urduIndexer.java : illegal character: \65279 ? ^ 1 error What I am doing wrong? If you have the stop-words in a file, say one wor

Re: StopWords problem

2007-12-26 Thread Liaqat Ali
Grant Ingersoll wrote: Are you altering (stemming) the token before it gets to the StopFilter? On Dec 26, 2007, at 5:08 PM, Liaqat Ali wrote: Doron Cohen wrote: On Dec 26, 2007 10:33 PM, Liaqat Ali <[EMAIL PROTECTED]> wrote: Using javac -encoding UTF-8 still raises the following

Re: StopWords problem

2007-12-26 Thread Liaqat Ali
Grant Ingersoll wrote: On Dec 26, 2007, at 5:24 PM, Liaqat Ali wrote: - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] No, at this level I am not using any stemming technique. I

Re: StopWords problem

2007-12-27 Thread Liaqat Ali
;text",URDU_STOP_WORDS[0] + " regular text",Store.YES, Index.TOKENIZED)); indexWriter.addDocument(doc); Now URDU_STOP_WORDS[0] should not appear within the index terms. You can easily verify this by iterating IndexReader.terms(); Regards, Doron On Dec 27, 2007 9:36 AM

Re: StopWords problem

2007-12-27 Thread Liaqat Ali
Doron Cohen wrote: This is not a self contained program - it is incomplete, and it depends on files on *your* disk... Still, can you show why you're saying it indexes stopwords? Can you print here few samples of IndexReader.terms().term()? BR, Doron On Dec 27, 2007 10:22 AM, Liaqa

Re: StopWords problem

2007-12-27 Thread Liaqat Ali
Doron Cohen wrote: On Dec 27, 2007 11:49 AM, Liaqat Ali <[EMAIL PROTECTED]> wrote: I got your point. The program given does not give not any error during compilation and it is interpreted well. But the it does not create any index. when the StandardAnalyzer() is called without Sto

Calculating Precision and Recall

2007-12-29 Thread Liaqat Ali
Hello All, I want to calculate the Precision and Recall of the current system, based on Lucene. What should be the procedure and either there are some tools available for this purpose. Kindly guide me. Regards, Liaqat - To

Re: Calculating Precision and Recall

2007-12-29 Thread Liaqat Ali
, Liaqat Ali wrote: Hello All, I want to calculate the Precision and Recall of the current system, based on Lucene. What should be the procedure and either there are some tools available for this purpose. Kindly guide me. Regards, Liaqat

Scoring in Lucene (for Precision and Recall)

2008-01-02 Thread Liaqat Ali
Hello, I am using treceval for precision, recall calculation. Treceval takes Relevance judgments and Result file as an arguments to calculate the precision, recall. There is a similarity parameter in the result file. The score which is calculated by Lucene is equal to that similarity paramet

Re: Scoring in Lucene (for Precision and Recall)

2008-01-02 Thread Liaqat Ali
hed each conference, my guess is one of them will explain it in more detail, or perhaps any docs for the trec_eval program will. -Grant On Jan 2, 2008, at 3:07 PM, Liaqat Ali wrote: Hello, I am using treceval for precision, recall calculation. Treceval takes Relevance judgments and Result fi

Open source Arabic stemmer

2008-01-16 Thread Liaqat Ali
Hi Kindly tell me about some open source Arabic Stemmer which can be used with Lucene. Regards, Liaqat Ali