Re: Lucene applicability

2010-08-25 Thread Lance Norskog
A stepping stone to the above is that, in DB terms, a Lucene index is only one table. It has a suite of indexing features that are very different from database search. The features are oriented to searching large bodies of text for "ideas" rather than concrete words. It searches a lot faster than a

Re: Sorting a Lucene index

2010-08-25 Thread Lance Norskog
It is also possible to sort by function. This allows you to avoid storing an array of 1 int for all documents. It is slower than the raw Lucene sort. On Wed, Aug 25, 2010 at 1:46 AM, Toke Eskildsen wrote: > On Wed, 2010-08-25 at 07:16 +0200, Shelly_Singh wrote: >> I have 1 bln documents to sort.

RE: Blocking on IndexSearcher search

2010-08-25 Thread Uwe Schindler
> Im using Windows and I'll try NIO, good idea, my app is already memory > hungry in other areas so I guess MMapped is a no go, doe sit use heap or perm > memory ? It uses address space for mapping the files into virtual memory (like a swap file) - this is why it only works well for 64bit VMs. The

Re: Blocking on IndexSearcher search

2010-08-25 Thread Paul Taylor
Uwe Schindler wrote: That lock contention is fine there as this is the central point where all IO is done. This does not mean that only one query is running in parallel, the queries are still running in parallel. But there is one place where all IO is waiting for one file descriptor. This is not

Re: Blocking on IndexSearcher search

2010-08-25 Thread Paul Taylor
Uwe Schindler wrote: Can you show us where it exactly blocks (e.g. use Ctrl-Break on windows to print a thread dump)? IndexSearchers methods are not synchronized and concurrent access is easy possible, all concurrent access is managed by the underlying IndexReader. Maybe you synchronize somewhere

RE: Blocking on IndexSearcher search

2010-08-25 Thread Uwe Schindler
Can you show us where it exactly blocks (e.g. use Ctrl-Break on windows to print a thread dump)? IndexSearchers methods are not synchronized and concurrent access is easy possible, all concurrent access is managed by the underlying IndexReader. Maybe you synchronize somewhere in your code? - U

Blocking on IndexSearcher search

2010-08-25 Thread Paul Taylor
Hi My multithreaded code was always creating a new IndexSearcher for every search, but I changed over to the recommendation of creating just one index searcher and keeping it between searches. Now I find if I have multiple threads trying to search they block on the search method(), only one c

Re: Lucene applicability

2010-08-25 Thread Erick Erickson
The SOLR wiki has lots of good information, start there: http://wiki.apache.org/solr/ Otherwise, see below... On Wed, Aug 25, 2010 at 6:20 AM, Schreiner Wolfgang < wolfgang.schrei...@itsv.at> wrote: > Hi all, > > We are currently evaluating potential search frameworks (such as Hibernate > Search

Re: Lucene applicability

2010-08-25 Thread Chris Lu
I see you are coming from the database world. To get a better understanding of Lucene, I would suggest you use the free version of DBSight, which let you create Lucene index with SQL after a few clicks. Basically Lucene is more like a list of denormalized documents. So if you change your database

Lucene applicability

2010-08-25 Thread Schreiner Wolfgang
Hi all, We are currently evaluating potential search frameworks (such as Hibernate Search) which might be suitable to use in our project (using Spring, JPA with Hibernate) ... I am sending this E-Mail in hope you can advise me on a few issues that would help us in our decision making process.

RE: Sorting a Lucene index

2010-08-25 Thread Toke Eskildsen
On Wed, 2010-08-25 at 07:16 +0200, Shelly_Singh wrote: > I have 1 bln documents to sort. So, that would mean ( 8 bln bytes == 8GB RAM) > bytes. > All I have is 8 GB on my machine, so I do not think approach would work. This implies that your numeric value can be more than 2 billion. Are you sure

Re: Sorting a Lucene index

2010-08-25 Thread Ian Lea
1 billion i.e. 1,000,000,000? Either buy more RAM, lots more RAM, or skip lucene sorting and do your own sorting for the top n hits. You might also want to look into sharding/distributing your index. -- Ian. On Wed, Aug 25, 2010 at 6:16 AM, Shelly_Singh wrote: > I have 1 bln documents to sor