Re: Document Similarities lucene(particularly using doc id's)

2007-08-20 Thread Lokeya
need > the similarity score for? Do you need to compare every item in set 1 > against every item in set 2? > > On Aug 19, 2007, at 11:19 PM, Lokeya wrote: > >> >> Hi, >> >> Thanks for your reply. >> >> I can use the getTermFreqVector() on Ind

Re: Document Similarities lucene(particularly using doc id's)

2007-08-19 Thread Lokeya
gt; Hi, > > > On Aug 16, 2007, at 2:20 PM, Lokeya wrote: > >> >> Hi All, >> >> I have the following set up: a) Indexed set of docs. b) Ran 1st >> query and >> got tops docs c) Fetched the id's from that and stored in a data >> struct

Document Similarities lucene(particularly using doc id's)

2007-08-16 Thread Lokeya
Hi All, I have the following set up: a) Indexed set of docs. b) Ran 1st query and got tops docs c) Fetched the id's from that and stored in a data structure. d) Ran 2nd query , got top docs , fetched id's and stored in a data structure. Now i have 2 sets of doc ids (set 1) and (set 1). I want

Re: How to calculate centroid from HITS?

2007-04-19 Thread Lokeya
ou might also check the Carrot2 project, which has a number of > clustering algorithms and some Lucene support, although I don't know > if it does specifically what you want. > > On Apr 2, 2007, at 10:14 PM, Lokeya wrote: > >> >> Hi All, >> >> I have

Re: Issue with : Searcher.search() returning Hits of same length for different searches

2007-04-12 Thread Lokeya
to do this. Thanks in Advance. Daniel Naber-5 wrote: > > On Wednesday 11 April 2007 18:51, Lokeya wrote: > >> Thanks for your reply. I should have given more information and will >> keep in mind this for my future queries. > > If nothing else helps, please write a small,

Basic Question in Lucene Indexing.

2007-04-12 Thread Lokeya
I have one million records to index, each of which have "Tiltle", "Desciption" and "Identifier". If take each document and try to index these fields my program was very slow. So I took 100,000 records and get the value of these fields, add them to the addDocument() method. Then I use the Index wri

Re: Issue with search() Help Appreciated.

2007-04-12 Thread Lokeya
The issue is solved. Luke was very helpful in debugging, infact it helped to identify a very basic mistake we were making. Lokeya wrote: > > I solved the issue by using: > > 1.Same Analyser. > 2.Making indexing by tokenizing terms. > > Now issue with the following code i

Re: OutOfMemory Error while searching Index - Help Appreciated.

2007-04-11 Thread Lokeya
But I am not very sure why this should throw and error. Erick Erickson wrote: > > That certainly seems odd. How much memory are you allocating > your JVM? > > Erick > > On 4/11/07, Lokeya <[EMAIL PROTECTED]> wrote: >> >> >> I have gone through

OutOfMemory Error while searching Index - Help Appreciated.

2007-04-11 Thread Lokeya
I have gone through the mailing list in search of posts for this error. Though there are many, I feel my problem is little different from that and like to get some advice on this. Details: 1. Using a machine with RAM 2GB 2. Created an Index of size 200 MB. 3. Trying to do a search on this for ce

Re: Issue with : Searcher.search() returning Hits of same length for different searches

2007-04-11 Thread Lokeya
nothing about your code. Imagine that a coworker had asked > you such a question. > > Best > Erick > > On 4/11/07, Lokeya <[EMAIL PROTECTED]> wrote: >> >> >> I am following all the points which are mentioned in the following link: >> >> &

Re: Issue with : Searcher.search() returning Hits of same length for different searches

2007-04-11 Thread Lokeya
nothing about your code. Imagine that a coworker had asked > you such a question. > > Best > Erick > > On 4/11/07, Lokeya <[EMAIL PROTECTED]> wrote: >> >> >> I am following all the points which are mentioned in the following link: >> >> &

Issue with : Searcher.search() returning Hits of same length for different searches

2007-04-10 Thread Lokeya
I am following all the points which are mentioned in the following link: http://wiki.apache.org/lucene-java/LuceneFAQ#head-3558e5121806fb4fce80fc022d889484a9248b71 I am having the following issues: 1. For different Queries I give I get a Hits object where there are always 21 documents, but gett

Re: Issue with search() Help Appreciated.

2007-04-10 Thread Lokeya
in some other manner. Please Advice. Daniel Naber-5 wrote: > > On Tuesday 10 April 2007 08:40, Lokeya wrote: > >> But when i try to get hits.length() it is 0. >> >> Can anyone point out whats wrong ? > > Please check the FAQ first: > http://wiki.apache.

Issue with search() Help Appreciated.

2007-04-09 Thread Lokeya
I have indexed the docs successfully under the directory "LUCENE" under current directory, which have segments, _1.cfs and deletable. Now trying to use the following code to search the index but not getting any HITS. But when I try to read through Reader and get the document with field mentioned

How to calculate centroid from HITS?

2007-04-02 Thread Lokeya
Hi All, I have queried and have got a HITS object which is a collection of documents. I want to find out the centroid of these documents. Centroid = Top Most 35(for eg)common terms across all the documents in the HITS object. Is there any API in Lucene for this? Thanks in Advance. -- View th

Re: Issue while parsing XML files due to control characters, help appreciated.

2007-03-21 Thread Lokeya
. 7,00,000 times. If this is not clear please let me know. I have't pasted the latest code where I have fixed the lock issue as well. If required I can do that. Thanks everyone for quick turnaround and it really helped me a lot. Doron Cohen wrote: > > Lokeya <[EMAIL PROTECTED]> wr

Re: Issue while parsing XML files due to control characters, help appreciated.

2007-03-19 Thread Lokeya
/[EMAIL PROTECTED] There is another approach also but certain issues are there: try { writer.close(); } finally { FSDirectory fs = FSDirectory.getDirectory("./LUCENE",false); if(IndexReader.isLocked(fs)); { IndexReader.unlock(fs); } } Thanks a lot again. Lokeya wrote: > > I will t

Re: Issue while parsing XML files due to control characters, help appreciated.

2007-03-18 Thread Lokeya
xed.. > > If that re-structuring causes your lock error to go away, I'll be > baffled because it shouldn't (unless your version of Lucene > and filesystem is one of the "interesting" ones). > > But it'll make your code simpler... > > Best >

Re: Issue while parsing XML files due to control characters, help appreciated.

2007-03-18 Thread Lokeya
Grant Ingersoll-5 wrote: > > Move index writer creation, optimization and closure outside of your > loop. I would also use a SAX parser. Take a look at the demo code > to see an example of indexing. > > Cheers, > Grant > > On Mar 18, 2007, at 12:31 PM, Lokeya wr

Re: Issue while parsing XML files due to control characters, help appreciated.

2007-03-18 Thread Lokeya
doc.add(new > Field("Description",alist_Descr.get(k).toString(), > Field.Store.YES, Field.Index.UN_TOKENIZED)); > } > > > //Add the document created out of

Re: Issue while parsing XML files due to control characters, help appreciated.

2007-03-17 Thread Lokeya
IndexWriter takes time that too when we are appending to the Index file this happens. So what is the best approach to handle this? Thanks in Advance. Erick Erickson wrote: > > See below... > > On 3/17/07, Lokeya <[EMAIL PROTECTED]> wrote: >> >> >> Hi, >> &g

Issue while parsing XML files due to control characters, help appreciated.

2007-03-16 Thread Lokeya
Hi, I am trying to index the content from XML files which are basically the metadata collected from a website which have a huge collection of documents. This metadata xml has control characters which causes errors while trying to parse using the DOM parser. I tried to use encoding = UTF-8 but lo