need
> the similarity score for? Do you need to compare every item in set 1
> against every item in set 2?
>
> On Aug 19, 2007, at 11:19 PM, Lokeya wrote:
>
>>
>> Hi,
>>
>> Thanks for your reply.
>>
>> I can use the getTermFreqVector() on Ind
gt; Hi,
>
>
> On Aug 16, 2007, at 2:20 PM, Lokeya wrote:
>
>>
>> Hi All,
>>
>> I have the following set up: a) Indexed set of docs. b) Ran 1st
>> query and
>> got tops docs c) Fetched the id's from that and stored in a data
>> struct
Hi All,
I have the following set up: a) Indexed set of docs. b) Ran 1st query and
got tops docs c) Fetched the id's from that and stored in a data structure.
d) Ran 2nd query , got top docs , fetched id's and stored in a data
structure.
Now i have 2 sets of doc ids (set 1) and (set 1).
I want
ou might also check the Carrot2 project, which has a number of
> clustering algorithms and some Lucene support, although I don't know
> if it does specifically what you want.
>
> On Apr 2, 2007, at 10:14 PM, Lokeya wrote:
>
>>
>> Hi All,
>>
>> I have
to do this.
Thanks in Advance.
Daniel Naber-5 wrote:
>
> On Wednesday 11 April 2007 18:51, Lokeya wrote:
>
>> Thanks for your reply. I should have given more information and will
>> keep in mind this for my future queries.
>
> If nothing else helps, please write a small,
I have one million records to index, each of which have "Tiltle",
"Desciption" and "Identifier". If take each document and try to index these
fields my program was very slow. So I took 100,000 records and get the value
of these fields, add them to the addDocument() method. Then I use the Index
wri
The issue is solved. Luke was very helpful in debugging, infact it helped to
identify a very basic mistake we were making.
Lokeya wrote:
>
> I solved the issue by using:
>
> 1.Same Analyser.
> 2.Making indexing by tokenizing terms.
>
> Now issue with the following code i
But I
am not very sure why this should throw and error.
Erick Erickson wrote:
>
> That certainly seems odd. How much memory are you allocating
> your JVM?
>
> Erick
>
> On 4/11/07, Lokeya <[EMAIL PROTECTED]> wrote:
>>
>>
>> I have gone through
I have gone through the mailing list in search of posts for this error.
Though there are many, I feel my problem is little different from that and
like to get some advice on this.
Details:
1. Using a machine with RAM 2GB
2. Created an Index of size 200 MB.
3. Trying to do a search on this for ce
nothing about your code. Imagine that a coworker had asked
> you such a question.
>
> Best
> Erick
>
> On 4/11/07, Lokeya <[EMAIL PROTECTED]> wrote:
>>
>>
>> I am following all the points which are mentioned in the following link:
>>
>>
&
nothing about your code. Imagine that a coworker had asked
> you such a question.
>
> Best
> Erick
>
> On 4/11/07, Lokeya <[EMAIL PROTECTED]> wrote:
>>
>>
>> I am following all the points which are mentioned in the following link:
>>
>>
&
I am following all the points which are mentioned in the following link:
http://wiki.apache.org/lucene-java/LuceneFAQ#head-3558e5121806fb4fce80fc022d889484a9248b71
I am having the following issues:
1. For different Queries I give I get a Hits object where there are always
21 documents, but gett
in some other manner.
Please Advice.
Daniel Naber-5 wrote:
>
> On Tuesday 10 April 2007 08:40, Lokeya wrote:
>
>> But when i try to get hits.length() it is 0.
>>
>> Can anyone point out whats wrong ?
>
> Please check the FAQ first:
> http://wiki.apache.
I have indexed the docs successfully under the directory "LUCENE" under
current directory, which have segments, _1.cfs and deletable.
Now trying to use the following code to search the index but not getting any
HITS. But when I try to read through Reader and get the document with field
mentioned
Hi All,
I have queried and have got a HITS object which is a collection of
documents. I want to find out the centroid of these documents. Centroid =
Top Most 35(for eg)common terms across all the documents in the HITS
object.
Is there any API in Lucene for this?
Thanks in Advance.
--
View th
.
7,00,000 times.
If this is not clear please let me know. I have't pasted the latest code
where I have fixed the lock issue as well. If required I can do that.
Thanks everyone for quick turnaround and it really helped me a lot.
Doron Cohen wrote:
>
> Lokeya <[EMAIL PROTECTED]> wr
/[EMAIL
PROTECTED]
There is another approach also but certain issues are there:
try
{
writer.close();
}
finally
{
FSDirectory fs = FSDirectory.getDirectory("./LUCENE",false);
if(IndexReader.isLocked(fs));
{
IndexReader.unlock(fs);
}
}
Thanks a lot again.
Lokeya wrote:
>
> I will t
xed..
>
> If that re-structuring causes your lock error to go away, I'll be
> baffled because it shouldn't (unless your version of Lucene
> and filesystem is one of the "interesting" ones).
>
> But it'll make your code simpler...
>
> Best
>
Grant Ingersoll-5 wrote:
>
> Move index writer creation, optimization and closure outside of your
> loop. I would also use a SAX parser. Take a look at the demo code
> to see an example of indexing.
>
> Cheers,
> Grant
>
> On Mar 18, 2007, at 12:31 PM, Lokeya wr
doc.add(new
> Field("Description",alist_Descr.get(k).toString(),
> Field.Store.YES, Field.Index.UN_TOKENIZED));
> }
>
>
> //Add the document created out of
IndexWriter takes
time that too when we are appending to the Index file this happens.
So what is the best approach to handle this?
Thanks in Advance.
Erick Erickson wrote:
>
> See below...
>
> On 3/17/07, Lokeya <[EMAIL PROTECTED]> wrote:
>>
>>
>> Hi,
>>
&g
Hi,
I am trying to index the content from XML files which are basically the
metadata collected from a website which have a huge collection of documents.
This metadata xml has control characters which causes errors while trying to
parse using the DOM parser. I tried to use encoding = UTF-8 but lo
22 matches
Mail list logo