Re: search performance benchmarks

2006-06-27 Thread heritrix . lucene
Hi, Or Lucene is more like Google in this sense, meaning that the time doesn't depend on the size of the matched result i found that it takes long time if the result set is bigger(upto 25 sec for 29 M results). But for smaller resultset of size approx 10,000 it takes approx. 200 ms. On 6/2

RE: Scoring purely on term frequencies

2006-06-27 Thread W.H. van Atteveldt
Dear Chris, Thanks for your reply, explain is a good friend indeed :-) Actually, the problem was that the documents were weighted in the indexing phase using the default similarity, and this was cached (as documented). So swithching the indexing to the HitCountSimilarity solves the problem of 'st

Re[3]: weird error with SVN of Lucene

2006-06-27 Thread Yura Smolsky
Hello, Chris. i have tried CO it with httpS. and it works. very weird... i dont use any proxies or other firewalls. also CO from other sites work fine... can anyone explain this weird behavior of TortoiseSVN 1.3.3, Build 6219 ? :) CH> : i am trying to CO or update it for 2 hours... can you perfor

Re: merge index from different platform

2006-06-27 Thread Erik Hatcher
On Jun 27, 2006, at 3:17 PM, Beady Geraghty wrote: I wasn't very clear in my original note. I just want to make sure that I can merge indexes created from differerent platforms/different OSes without problem. So I understand from the respond that this can be done. Yes, this can be done.

Re: merge index from different platform

2006-06-27 Thread Beady Geraghty
Thank you for the response. I wasn't very clear in my original note. I just want to make sure that I can merge indexes created from differerent platforms/different OSes without problem. So I understand from the respond that this can be done. Thanks On 6/27/06, Erik Hatcher <[EMAIL PROTECT

Re: weird error with SVN of Lucene

2006-06-27 Thread Steven Rowe
On 06/27/06 at 1:00 PM, Yura Smolsky wrote: > svn co -r 417135 > http://svn.apache.org/repos/asf/lucene/java/trunk > lucene-java-2.0.0-417135 I successfully ran this exact command line just now -- no errors. It is strange that the revision number given with the checkout command (417135) does not

Re[2]: weird error with SVN of Lucene

2006-06-27 Thread Chris Hostetter
: i am trying to CO or update it for 2 hours... can you perform updates or COs? both. I tried the exact revision checkout you specified in your orriginal email... svn co -r 417135 http://svn.apache.org/repos/asf/lucene/java/trunk lucene-java-2.0.0-417135 ...and had no problems. perhaps there

Re: weird error with SVN of Lucene

2006-06-27 Thread Ben Knear
Yura Smolsky wrote: Hello, Chris. CH> : svn co -r 417135 CH> http://svn.apache.org/repos/asf/lucene/java/trunk lucene-java-2.0.0-417135 CH> : svn: REPORT request failed on CH> '/repos/asf/!svn/bc/417505/lucene/java/trunk' CH> : svn: REPORT of '/repos/asf/!svn/bc/417505/lucene/java/trunk': CH> 40

RE: search performance benchmarks

2006-06-27 Thread Vladimir Olenin
Thanks, Mike. This info is actually quite helpful. What is 'times 10 rule' you are refering to? Also, I wonder how Lucene is handling the growth of the result set returned by the query? In the various search engine implementations I did myself for several projects that was one of the things which

Re[2]: weird error with SVN of Lucene

2006-06-27 Thread Yura Smolsky
Hello, Chris. CH> : svn co -r 417135 CH> http://svn.apache.org/repos/asf/lucene/java/trunk lucene-java-2.0.0-417135 CH> : svn: REPORT request failed on CH> '/repos/asf/!svn/bc/417505/lucene/java/trunk' CH> : svn: REPORT of '/repos/asf/!svn/bc/417505/lucene/java/trunk': CH> 400 Bad Request (http://

Re: merge index from different platform

2006-06-27 Thread Erik Hatcher
On Jun 27, 2006, at 2:02 PM, Daniel Naber wrote: On Dienstag 27 Juni 2006 17:23, Beady Geraghty wrote: I tried to look at the segments file, thinking that it points to the various other files in the index directory, Use IndexWriter.addIndexes() to merge two or more indexes. Or use the Ind

Re: weird error with SVN of Lucene

2006-06-27 Thread Chris Hostetter
: svn co -r 417135 http://svn.apache.org/repos/asf/lucene/java/trunk lucene-java-2.0.0-417135 : svn: REPORT request failed on '/repos/asf/!svn/bc/417505/lucene/java/trunk' : svn: REPORT of '/repos/asf/!svn/bc/417505/lucene/java/trunk': 400 Bad Request (http://svn.apache.org : ) : make: *** [luce

RE: Scoring purely on term frequencies

2006-06-27 Thread Chris Hostetter
: Similarity that simply returns the number of matched terms per document : as the score. I tried making one that returns freq as tf and 1.0f as : anything else, but that gives strange results; same for something that : really returns 1.0f whatever. That's because when your tf function always retu

weird error with SVN of Lucene

2006-06-27 Thread Yura Smolsky
Hello. I have encountered weird error with CO of Lucene, when I try to build PyLucene: [EMAIL PROTECTED] /cygdrive/d/workshop/PyLucene $ make svn co -r 417135 http://svn.apache.org/repos/asf/lucene/java/trunk lucene-java-2.0.0-417135 svn: REPORT request failed on '/repos/asf/!svn/bc/417505/lucen

Re: merge index from different platform

2006-06-27 Thread Daniel Naber
On Dienstag 27 Juni 2006 17:23, Beady Geraghty wrote: > I tried to look at the segments file, thinking that it points to the > various other > files in the index directory, Use IndexWriter.addIndexes() to merge two or more indexes. Regards Daniel -- http://www.danielnaber.de

Re: IndexSearcher in Servlet

2006-06-27 Thread Erik Hatcher
Yup, this is pretty much how I do it for lucenebook.com (though quite admittedly it's got a miniscule amount of data behind it, which rarely changes). I don't use a servlet initialization to put the searcher into application scope, though, as I'm using blojsom for the blogging system and i

Re: Lucene indexing RDF

2006-06-27 Thread Suba Suresh
I used java libraries for rtf file formats. Refer to Mannning's Lucene In Action book. It is helpful and gives pointers where you can access differentlibraries. suba suresh. mcarcelen wrote: Hi, Do you know another library for indexing RDF? Thanks a lot for your help Teresa -Mensaje ori

Re: IndexSearcher in Servlet

2006-06-27 Thread Renaud Waldura
Erik: I commend you for giving all the information that's relevant. For the sake of simplicity, and because it is the vast majority of use cases, could you endorse the following as the simplest, most correct way (i.e. a best practice) to implement Lucene for Web applications. 1- create an In

Re: IndexWriter.addIndexes & optimizatio

2006-06-27 Thread Karel Tejnora
depends of the document type, look at method setOmitNorms in Field class. heritrix.lucene wrote: Hi, Aprrox 50 Million i have processed upto now. I kept maxMergeFactor and maxBufferedDoc's value 1000. This value i got after several round of test runs. Indexing rate for each document in 50 M, is

Lucene indexing RDF

2006-06-27 Thread mcarcelen
Hi, Do you know another library for indexing RDF? Thanks a lot for your help Teresa -Mensaje original- De: Suba Suresh [mailto:[EMAIL PROTECTED] Enviado el: martes, 27 de junio de 2006 17:38 Para: java-user@lucene.apache.org Asunto: Re: Lucene indexing pdf I used PDFBox library as menti

Re: Lucene indexing pdf

2006-06-27 Thread Suba Suresh
I used PDFBox library as mentioned in Lucene in Action. It works for me. You can access it from www.pdfbox.org suba suresh mcarcelen wrote: Hi, I´m new with Lucene and I´m trying to index a pdf but when I query everything it returns nothing. Can anyone help me? Thans a lot Teresa ---

Re: Lucene indexing pdf

2006-06-27 Thread Patrick Kimber
Hi Teresa You need to convert the pdf file into text format before adding the text to the Lucene index. You may like to look at http://www.pdfbox.org/ for a library to convert pdf files to text format. Patrick On 27/06/06, mcarcelen <[EMAIL PROTECTED]> wrote: Hi, I´m new with Lucene and I´m t

Lucene indexing pdf

2006-06-27 Thread mcarcelen
Hi, I´m new with Lucene and I´m trying to index a pdf but when I query everything it returns nothing. Can anyone help me? Thans a lot Teresa - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROT

merge index from different platform

2006-06-27 Thread Beady Geraghty
Hi, I am trying to merge in index from a different node and probably different platform. I tried some simple cases by copying an index created from a windows machine, and bring to a linux server. I seem to be able to search from this index that is copied over. I would therefore assume that I c

RE: Scoring purely on term frequencies

2006-06-27 Thread W.H. van Atteveldt
Dear Ziv, List, I am probably doing something stupid... I was trying to create a Similarity that simply returns the number of matched terms per document as the score. I tried making one that returns freq as tf and 1.0f as anything else, but that gives strange results; same for something that reall

Re: IndexSearcher in Servlet

2006-06-27 Thread Erik Hatcher
On Jun 27, 2006, at 10:32 AM, Fabrice Robini wrote: That's also my case... I create a new IndexSearcher at each query, but with a static and instanciated Directory. New IndexSearcher(myDirectory) It seems to be OK... am I wrong ? You may be "ok" given your query patterns, but you won't benef

Re: IndexSearcher in Servlet

2006-06-27 Thread Erik Hatcher
Michael - you're absolutely right in your thinking. As long as IndexReader is long-lived you'll be fine. All caches internal to Lucene are based off the IndexReader, which is implicitly constructed under the covers of IndexSearcher if not specified directly. Erik On Jun 27, 2006

RE: IndexSearcher in Servlet

2006-06-27 Thread Fabrice Robini
That's also my case... I create a new IndexSearcher at each query, but with a static and instanciated Directory. New IndexSearcher(myDirectory) It seems to be OK... am I wrong ? -Original Message- From: Crump, Michael [mailto:[EMAIL PROTECTED] Sent: mardi 27 juin 2006 16:04 To: java-us

Re: IndexSearcher in Servlet

2006-06-27 Thread Karel Tejnora
Singleton pattern is better. Than you can extend it to proxy pattern. existing IndexReader really isn't that expensive and does get around - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL

RE: IndexSearcher in Servlet

2006-06-27 Thread Crump, Michael
Hello, I have another question along this line. One of the points made in this thread was to never create a new IndexSearcher for each query. Is this true even in the case that an IndexSearcher is being created with a static or cached IndexReader using the IndexSearcher(IndexReader reader) const

RE: IndexSearcher in Servlet

2006-06-27 Thread Omar Didi
you can initiliaze your IndexSearcher in a Servlet Listner, and even warm it up with few queries. that way when the user sends the first query it won't take a long time to load the index in RAM. > -Original Message- > From: Fabrice Robini [mailto:[EMAIL PROTECTED] > Sent: Tuesday, June 2

RE: IndexSearcher in Servlet

2006-06-27 Thread Fabrice Robini
Erik, Thank you for your reply. I'm goingto use the static IndexSearcher in my Servlet (my index is static). Thanks :-) -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: mardi 27 juin 2006 12:49 To: java-user@lucene.apache.org Subject: Re: IndexSearcher in Servlet

Re: Understanding Boolean Queries..

2006-06-27 Thread Erik Hatcher
If you want OR, you need to make all clauses SHOULD, no MUSTs. Erik On Jun 27, 2006, at 4:50 AM, heritrix.lucene wrote: Hi i am using lucene 1.9.1. My query is : (subject:cs OR author:ritchie) I am creating one Boolean query for two TermQueries. t1 = new Term("subject", "cs") t2 = n

Re: IndexSearcher in Servlet

2006-06-27 Thread heritrix . lucene
Hi All, I am sorry on my mistake. Now i am agree with you. I had some mistake in my code, I was sharing the hits object in servlet and that was my foolish mistake. Now since i changed it and when i again ran the testcase, there was no problem. i am using single static IndexSearcher now :) Thanks

Re: IndexSearcher in Servlet

2006-06-27 Thread Erik Hatcher
On Jun 27, 2006, at 5:47 AM, Fabrice Robini wrote: What is your advice for webApplication ? It all depends :) - IndexSearcher pool ? No point in that. A single IndexSearcher for searches is all that is ever needed. Having a warming IndexSearcher, as Solr implements, makes sense in so

Re: Index confusion and organization

2006-06-27 Thread Aleksander M. Stensby
Thank you very much Erik! I will definatly check into this. I'm currently using xfire in my implementation. I guess to big issue was/is that the indexing is done by one application, and the searching from several different applications. (obviously) I have been bitten by the lucene-virus;)

Re: Index confusion and organization

2006-06-27 Thread Erik Hatcher
Aleksander - if you're wrapping Lucene with a web service, you'd do well to investigate Solr - http://incubator.apache.org/solr - as it handles all of the index management in a very elegant fashion. It currently does not support a SOAP interface, but rather a RESTful light-weight custom XM

RE: IndexSearcher in Servlet

2006-06-27 Thread Fabrice Robini
Hi Erik, What is your advice for webApplication ? - IndexSearcher pool ? - New IndexSearcher for each query ? - Something else ? Thanks a lot, Fab -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: mardi 27 juin 2006 11:41 To: java-user@lucene.apache.org Subject: Re

Re: IndexSearcher in Servlet

2006-06-27 Thread Erik Hatcher
On Jun 27, 2006, at 5:11 AM, heritrix.lucene wrote: Hi, I also had the same confusion. But today when i did the testing i found that it will merge your results. Therefore i believe that indexSearcher is not thread safe. I tried this on 10,000 requests per second. You must have something e

Re: IndexSearcher in Servlet

2006-06-27 Thread heritrix . lucene
Hi, I also had the same confusion. But today when i did the testing i found that it will merge your results. Therefore i believe that indexSearcher is not thread safe. I tried this on 10,000 requests per second. With Regards On 6/27/06, Ramana Jelda <[EMAIL PROTECTED]> wrote: Hi, You are wrong

RE: IndexSearcher in Servlet

2006-06-27 Thread Ramana Jelda
Hi, You are wrong. In ur case (If I ignore any updates to index) , One IndexSearcher object is enough. IndexSearcher is thread safe. Jelda > -Original Message- > From: heritrix.lucene [mailto:[EMAIL PROTECTED] > Sent: Tuesday, June 27, 2006 10:58 AM > To: java-user@lucene.apache.org > S

RE: IndexSearcher in Servlet

2006-06-27 Thread Fabrice Robini
Hi, Thanks a lot for your reply :-) I totally agree with you, I'm going to use a pool. Are there "design-patterns" in a Lucene Sandbox about InsexSearcher pool? What are the best practices ? Thanks a lot, Fab -Original Message- From: heritrix.lucene [mailto:[EMAIL PROTECTED] Sent: mar

Re: IndexSearcher in Servlet

2006-06-27 Thread heritrix . lucene
Hi, The same question i asked yesterday. :-) And now i know the answer :0 Creating a new searcher for each query will make your application very very slow... (leave this idea) U can not have a static indexsearcher object. It will merge all results and the user will get the result of their que

Re: Searching is taking a lot...

2006-06-27 Thread heritrix . lucene
No. I am not sorting the data... On 6/27/06, Martin Braun <[EMAIL PROTECTED]> wrote: Hi chris, > searching everytime using a new searcher was taking time. So For testing, i > made it a static one and reused the same. This gave me a lot of > improvement. > Previously my query was taking approx

Understanding Boolean Queries..

2006-06-27 Thread heritrix . lucene
Hi i am using lucene 1.9.1. My query is : (subject:cs OR author:ritchie) I am creating one Boolean query for two TermQueries. t1 = new Term("subject", "cs") t2 = new Term("author","ritchie") for this the BooleanQuery i created is: BooleanQuery mergedQuery = new BooleanQuery(); mergedQuery.add(n

IndexSearcher in Servlet

2006-06-27 Thread Fabrice Robini
Hello, I have a question about the IndexSearcher(). I have a Servlet that has a searchDocument(String theQuery) method. These method instantiate a new IndexSearcher at each query: searchDocument(String theQuery) { Searcher searcher = new IndexSearcher(indexPath);

Re: Searching is taking a lot...

2006-06-27 Thread Martin Braun
Hi chris, > searching everytime using a new searcher was taking time. So For testing, i > made it a static one and reused the same. This gave me a lot of > improvement. > Previously my query was taking approx 25 sec. But now most of the queries > are taking time between the 100 and 800 ms. Do you

Re: Searching is taking a lot...

2006-06-27 Thread Paul Elschot
On Tuesday 27 June 2006 09:23, heritrix.lucene wrote: > Hi, > First of all, thanks for your attention... > I think i've got the solution. > Actually earlier, everytime for each query i was creating a different > searcher object. Creating searcher object was not taking a lot. But > searching everyti

Re: Searching is taking a lot...

2006-06-27 Thread heritrix . lucene
Hi, First of all, thanks for your attention... I think i've got the solution. Actually earlier, everytime for each query i was creating a different searcher object. Creating searcher object was not taking a lot. But searching everytime using a new searcher was taking time. So For testing, i made i