Re: reading indice

2006-10-16 Thread heritrix . lucene
Read *org.apache.lucene.index.IndexReader *And *org.apache.lucene.search.IndexSearcher There are description available in these docs. * On 10/17/06, EDMOND KEMOKAI <[EMAIL PROTECTED]> wrote: Can someone tell me how read an index into memory, or how to open an existing index for reading?

Re: searching for the part of a term.

2006-09-27 Thread heritrix . lucene
Hi, Thanks for yor reply.. : Since the overhead in first is the speed of the system, i think adopting : second method will be better. Since iMy index size is around 10GB the second method is also taking a lot of time for queries like "am". One more things that i found in http://www.gossame

Re: searching for the part of a term.

2006-09-26 Thread heritrix . lucene
system, i think adopting second method will be better. Is there any other solution for this problem?? Am i going in right direction?? It'll be great to see your response... Regards, On 9/23/06, heritrix. lucene <[EMAIL PROTECTED]> wrote: Hi All, How can i make my search so t

searching for the part of a term.

2006-09-23 Thread heritrix . lucene
Hi All, How can i make my search so that if i am looking for the term "counting" the documents containing "accounting" must also come up. Similarly if i am looking for term "workload", document s containing work also come up as a search result. Wildcard query seems to work in the first case, bu

Re: is there any n-gram analyzer available??

2006-09-22 Thread heritrix . lucene
Thanks for your reply. This analyzer creates combination of words. I am looking for analyzer where you can break up the words into their n-grams. For example: 2-grams of google - > go, oo, og, gl, le like that. Regards On 9/23/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: https://issues.ap

is there any n-gram analyzer available??

2006-09-22 Thread heritrix . lucene
Hi i am looking for a analyzer that chops a given string into its n-grams. Basically, I want to index 3-grams and more upto the limit of a word. Can anybody tell me if there is any analyzer is available for this. Regards..

Re: Searching is taking a lot...

2006-06-29 Thread heritrix . lucene
Ya you are correct. My idea will not work when there are lots of documents in the index and also there are lots of hits for that page. I am going with you :-) Thanx... On 6/29/06, James Pine <[EMAIL PROTECTED]> wrote: Hey, I'm not a performance guru, but it seems to me that if you've got

Re: Searching is taking a lot...

2006-06-29 Thread heritrix . lucene
perhaps that's not what you ment, perhaps you aren't iterating over any results, in which case using a HitCOllector instead isn't neccessary going to bring that 17sec down. As i told earlier that for the same query minimum time is 2-3 sec and this time is after several attempt(so i think upto th

Re: Searching is taking a lot...

2006-06-29 Thread heritrix . lucene
This will break performance. It is better to first collect all the document numbers (code without the proper declarations): public void collect(int id, float score) { if(docCount >= startDoc && docCount < endDoc) { docNrs.add(id); // or use int[] docNrs when possible. Why

Re: Searching is taking a lot...

2006-06-28 Thread heritrix . lucene
I am using Hits object to collect all documents. Let me tell you my problem. I am creating a web application. Every time when a user looks for something it goes and search the index and return the results. Results may be in millions. So for displaying results, i am doing pagination. Here the probl

Re: Searching is taking a lot...

2006-06-28 Thread heritrix . lucene
. I am using Hits for getiing the results searching the result using Searcher.search(). Is there anyother way of improving its speed. Thanks and regards, On 6/27/06, heritrix. lucene <[EMAIL PROTECTED]> wrote: No. I am not sorting the data... On 6/27/06, Martin Braun <[EMAIL

Re: IndexSearcher in Servlet

2006-06-28 Thread heritrix . lucene
o o no I mean the searching would be fast or not... But now i have tested. The result that i found reveals that there would be no difference in terms of searching speed. But there is another thing that i want to ask. What if the index is changed in between. Will the indexReader give the results w

Re: IndexSearcher in Servlet

2006-06-28 Thread heritrix . lucene
Is there any difference in terms of speed between IndexReader and IndexSearcher?? On 6/27/06, Erik Hatcher <[EMAIL PROTECTED]> wrote: On Jun 27, 2006, at 10:32 AM, Fabrice Robini wrote: > That's also my case... > I create a new IndexSearcher at each query, but with a static and > instanciate

Re: search performance benchmarks

2006-06-27 Thread heritrix . lucene
Hi, Or Lucene is more like Google in this sense, meaning that the time doesn't depend on the size of the matched result i found that it takes long time if the result set is bigger(upto 25 sec for 29 M results). But for smaller resultset of size approx 10,000 it takes approx. 200 ms. On 6/2

Re: IndexSearcher in Servlet

2006-06-27 Thread heritrix . lucene
Hi All, I am sorry on my mistake. Now i am agree with you. I had some mistake in my code, I was sharing the hits object in servlet and that was my foolish mistake. Now since i changed it and when i again ran the testcase, there was no problem. i am using single static IndexSearcher now :) Thanks

Re: IndexSearcher in Servlet

2006-06-27 Thread heritrix . lucene
Hi, I also had the same confusion. But today when i did the testing i found that it will merge your results. Therefore i believe that indexSearcher is not thread safe. I tried this on 10,000 requests per second. With Regards On 6/27/06, Ramana Jelda <[EMAIL PROTECTED]> wrote: Hi, You are wrong

Re: IndexSearcher in Servlet

2006-06-27 Thread heritrix . lucene
Hi, The same question i asked yesterday. :-) And now i know the answer :0 Creating a new searcher for each query will make your application very very slow... (leave this idea) U can not have a static indexsearcher object. It will merge all results and the user will get the result of their que

Re: Searching is taking a lot...

2006-06-27 Thread heritrix . lucene
No. I am not sorting the data... On 6/27/06, Martin Braun <[EMAIL PROTECTED]> wrote: Hi chris, > searching everytime using a new searcher was taking time. So For testing, i > made it a static one and reused the same. This gave me a lot of > improvement. > Previously my query was taking approx

Understanding Boolean Queries..

2006-06-27 Thread heritrix . lucene
Hi i am using lucene 1.9.1. My query is : (subject:cs OR author:ritchie) I am creating one Boolean query for two TermQueries. t1 = new Term("subject", "cs") t2 = new Term("author","ritchie") for this the BooleanQuery i created is: BooleanQuery mergedQuery = new BooleanQuery(); mergedQuery.add(n

Re: Searching is taking a lot...

2006-06-27 Thread heritrix . lucene
Hi, First of all, thanks for your attention... I think i've got the solution. Actually earlier, everytime for each query i was creating a different searcher object. Creating searcher object was not taking a lot. But searching everytime using a new searcher was taking time. So For testing, i made i

Searching is taking a lot...

2006-06-26 Thread heritrix . lucene
Hi, I have created an index of 47 Million documents. I have 1.28GB RAM. When i am doing a search over this index it is taking on average 25 sec. Is there a way so that i can get results in part of a second... I hope there must be some ways.. Thanks and regards..

Re: addIndexes() is taking infinite time ...

2006-06-22 Thread heritrix . lucene
so how it can be ignored ?? On 6/22/06, Mike Streeton <[EMAIL PROTECTED]> wrote: From memory addIndexes() also does and optimization before hand, this might be what is taking the time. Mike www.ardentia.com the home of NetSearch -Original Message- From: heritrix.lucene [mailto:[EMAIL

Re: addIndexes() is taking infinite time ...

2006-06-21 Thread heritrix . lucene
No. I haven't tried. Today i can try it. One thing that i m thinking is that what role does the file system plays here. I mean is there any difference on if i am doing indexing on FAT32 or i am on EXT3??? i'll have to find it out Can anybody put some light on this?? With regards On 6/22/06,

What is a "Lazy Field"...

2006-06-21 Thread heritrix . lucene
Hi, Can anybody please tell me what a "Lazy Field" is ??? I noticed several time this term has come in discussion... With Regards,

Re: addIndexes() is taking infinite time ...

2006-06-21 Thread heritrix . lucene
hi Otis, Now this time it took 10 Hr 34 Min. to merge the indexes. During merging i noticed it was not completey using the CPU. I have 512MB RAM. and here i found it used upto the 256 MB. Are there some more possibilities to make it more fast ... With Regards, On 6/21/06, heritrix. lucene

Re: addIndexes() is taking infinite time ...

2006-06-20 Thread heritrix . lucene
hi, thanks for your reply. Now i restarted my application with maxBufferedDocs=10,000. And i am sorry to say that i was adding those indexes one by one. :-) Anyway Can you please explain me the addIndex ? I want to know what exactly happens while adding these.. With Regards, On 6/20/06, Otis G

addIndexes() is taking infinite time ...

2006-06-20 Thread heritrix . lucene
Hi all, I had five different indexes: 1 having 15469008 documents 2 having 7734504 documents 3 having 7734504 documents 4 having 7734504 documents 5 having 7734504 documents Which sums to 46407024. The constant values are maxMergeFactor = 1000 maxBufferedDocs = 1000 I wrote a simple program which

Re: How to do pagination on fethed result using lucene...

2006-06-20 Thread heritrix . lucene
Hi, Actually i forgot to write that my application is web based and i am running this on tomcat server. assuming your application is web based, the general concesus is to start by implimening your app so that each page reexecutes the search, reexecuting the search is not feasible as every time

How to do pagination on fethed result using lucene...

2006-06-19 Thread heritrix . lucene
Hi all, I have built an small application that give some thousand results. I want to display results as google displays using pagination. Here my question is, how I'll maintain the sequence of displayed result. Should i associate the "Hits" object along with the session. Assume i want to display

Re: Getting count on distinct values of a field.

2006-06-13 Thread heritrix . lucene
I am sorry for my stupid question. Thanks. :-) Regards, On 6/13/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: : But what if that word is present in other fields also. : does "docFreq " only look into that particular field ?? docFreq tells you the frequency of a term, a term is a field a

Re: Getting count on distinct values of a field.

2006-06-13 Thread heritrix . lucene
But what if that word is present in other fields also. does "docFreq " only look into that particular field ?? On 6/13/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: Look at the TermEnum class... iterate over the terms in your field, and docFreq is the number of docs with that term. : Date:

Re: IndexWriter.addIndexes & optimizatio

2006-06-12 Thread heritrix . lucene
Hi, Aprrox 50 Million i have processed upto now. I kept maxMergeFactor and maxBufferedDoc's value 1000. This value i got after several round of test runs. Indexing rate for each document in 50 M, is 1 Document per 4.85 ms. I am only using fsdirectory. Is there any other way to reduce this time??

Re: IndexWriter.addIndexes & optimizatio

2006-06-12 Thread heritrix . lucene
I want to index 1 billion documents. what do you think which one (i mean using fsDir or ramDir) is suitable for indexing these many documents. On 6/12/06, Flik Shen <[EMAIL PROTECTED]> wrote: It means that to pick both high maxBufferedDocs and mergeFator will improve your indexing performance