Apache Lucene Search Engine

2006-07-03 Thread Sarvadnya Mutalik
Hi all, I need information about which version of Apache Lucene search engine you recommend, which is stable and recomended for production or mission-critical systems. Thanks in advance Regards, Sam = The information transmitted is intended only for the

Re: Sorting & SQL-Database

2006-07-03 Thread Lukas Vlcek
Hi, Looking at your problem I can think of one solution for small and *midsize* result sets. (And I have to say it may be similar to what Aleksander proposes). Write workaround query in the following form: select addfield from ( select addfield, generated_counter from table where id = 2 union

wildcard in phrase query: problem with idf / scoring; QueryParser; MultiPhraseQuery

2006-07-03 Thread W.H. van Atteveldt
Dear List, I am using lucene to count the number of hits of queries in documents (ie taking raw frequencies as scores), which seems to work fairly well using a modified Similarity, returning freq for tf and 1.0 for everyting else, and a HitCollector to collect the hits. I also want to allow 'pref

Re: Sorting & SQL-Database

2006-07-03 Thread Monsur Hossain
On 6/30/06, Dominik Bruhn <[EMAIL PROTECTED]> wrote: SELECT id,addfield FROM table WHERE id IN ([LUCENERESULT]); Where LUCENERESULT is like 2,3,19,3,5. This works fine but got one problem: The Search-Result of Lucene is order by relevance and so the id-list is also sorted by relevance. But the

Re: QueryTerms vs. IndexTerms

2006-07-03 Thread Paul Elschot
On Monday 03 July 2006 19:52, Patricio wrote: > Hello, I'm novice in Java. > I try to understand how the query terms are matched with the index terms to > calculate the Hits. > > I thought that the class "IndexSearcher" was responsible for this process, > but apparently the classes "Scorer" and "H

QueryTerms vs. IndexTerms

2006-07-03 Thread Patricio
Hello, I'm novice in Java. I try to understand how the query terms are matched with the index terms to calculate the Hits. I thought that the class "IndexSearcher" was responsible for this process, but apparently the classes "Scorer" and "HitCollector" are essential to determine the retrieved docu

Re: Memory Leak IndexSearcher

2006-07-03 Thread Erick Erickson
Well, *assuming* that you're working in Java, you can't predict very much about when the garbage collector actually goes about freeing memory. Depending on how memory is measured, you may or may not be getting an accurate count. I wonder what would happen if you allowed the JVM only a *little* mo

Memory Leak IndexSearcher

2006-07-03 Thread Bruno Vieira
Hi everyone, I am working on a project with around 35000 documents (8 text fields with 256 chars at most for each field) on lucene. But unfortunately this index is updated at every moment and I need that these new items be in the results of my search as fast as possible. I have an IndexSearcher,

Re: Unit testing: Hits not implementing an interface and being final

2006-07-03 Thread Alessio Pace
Hi, thanks Erick for the answer. The problem is that I am using Lucene through the Hibernate support, to map trasparently Java domain entities to a file system Lucene index (no support for a RAM Index at the moment, as far as I saw). So some of my unit tests (which collaborate at some level with

Re: Unit testing: Hits not implementing an interface and being final

2006-07-03 Thread Erick Erickson
Don't know if this helps or hurts, but my approach for unit tests was to implement an index in a RAMdir for each test, index enough documents for my tests that I could strictly control and just do searches, man... True, the weakness was that the data sets are very small, and this more of a "black

Re: Indexing PPT classes hslf

2006-07-03 Thread Nick Burch
On Mon, 3 Jul 2006, mcarcelen wrote: I´ve used the classes "org.apache.poi.hslf.extractor.PowerPointExtractor" and "org.apache.poi.hslf.extractor.QuickButCruddyTextExtractor" with lucene2.0 to extract text but when I try to use the other classes such as "org.apache.poi.hslf.HSLFSlideShow", "org.a

Unit testing: Hits not implementing an interface and being final

2006-07-03 Thread Alessio Pace
Hi, I wanted just to share my issues with unit testing a component collaborating with a Hits object. The scenario is: I have a web page pagination component (say, it shows N results per page) over the Hits results found in the Lucene index. I want to test the pagination itselft, so I would like

Re: Indexing very slow.

2006-07-03 Thread peter velthuis
hehe that works.. its now racing thourgh 10 000 docs in a couple seconds :) 2006/7/3, Aleksander M. Stensby <[EMAIL PROTECTED]>: Ah, didnt see that, yeah, you should have something like new IndexWriter.. for each document, writer.add writer.optimize() writer.close() batching it up wi

Indexing PPT classes hslf

2006-07-03 Thread mcarcelen
Hi all! I´ve used the classes "org.apache.poi.hslf.extractor.PowerPointExtractor" and "org.apache.poi.hslf.extractor.QuickButCruddyTextExtractor" with lucene2.0 to extract text but when I try to use the other classes such as "org.apache.poi.hslf.HSLFSlideShow", "org.apache.poi.hslf.record.Record"

Re: Indexing very slow.

2006-07-03 Thread peter velthuis
I select it in parts, chunks of 5000 records with the limit keyword.. the thing is it starts very fast..but then slows down so i doubt it has to do with tokenizing 2006/7/3, Aleksander M. Stensby <[EMAIL PROTECTED]>: My guess is if that you actually do a complete select * from you db, and mana

Re: question

2006-07-03 Thread amit_kkumar
thanks it is working fine now DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Pvt. Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient,

Re: Indexing very slow.

2006-07-03 Thread Aleksander M. Stensby
Ah, didnt see that, yeah, you should have something like new IndexWriter.. for each document, writer.add writer.optimize() writer.close() batching it up will make it faster, yes On Mon, 03 Jul 2006 11:43:03 +0200, Volodymyr Bychkoviak <[EMAIL PROTECTED]> wrote: Problem is hidden

Re: Indexing very slow.

2006-07-03 Thread Volodymyr Bychkoviak
Problem is hidden in these lines: > writer.optimize(); > writer.close(); You should keep one index writer open for all document additions and close it only after adding last document. Optimize() merges all index segments to single segment and as index grows it takes longer and lon

Re: Indexing very slow.

2006-07-03 Thread Aleksander M. Stensby
My guess is if that you actually do a complete select * from you db, and manage all objects all at once, this will be a problem for your jvm, maybe running out of memory is the problem you encounter, strings tend to be a bit of a memory issue in java :( My suggestion is that you do paginati

Indexing very slow.

2006-07-03 Thread peter velthuis
When i start the program its fast.. about 10 docs per second. but after about 15000 it slows down very much. Now it does 1 doc per second and it is at nr# 40 000 after a whole night indexing. These are VERY small docs with very little information.. THis is what and how i index it: Document d