Re: Re[2]: Index Partitioning ( was Re: Search deadlocking under load)

2005-07-11 Thread Paul Smith
Many thanks for confirming the principles should work fine. It is a load off my mind! :) On index update, a small Event is triggered into a Buffer, that is periodically (every 30 seconds) processed to coalesce them, then ensure that any open IndexSearcher in the cache is closed. On 12/07

RE: Search deadlocking under load

2005-07-11 Thread Nathan Brackett
Thanks for the advice. That ought to reduce contention a bit in that particular method. I've been reviewing a large amount of thread dumps today and I was wondering if it's common to see many threads that look like this: "tcpConnection-8080-20" daemon prio=5 tid=0x081ba000 nid=0x810ac00 waiting f

Re: How to get the un-stemed word

2005-07-11 Thread markharw00d
Would that show up in the TermVectors? Yes, but uou would need a scheme for identifying "original, unstemmed" terms vs stems. For example, you could use another field and analyzer for the unstemmed forms. Andrew Boyd wrote: What about storing the unstemed word with the same position as the

Re: Index Partitioning ( was Re: Search deadlocking under load)

2005-07-11 Thread Otis Gospodnetic
If you want really real-time updates of search results, then yes. However, maybe you can live with near-real-time results, in which cases you can add some logic to your application to check for index version only every N requests/minutes/hours. Otis --- Aalap Parikh <[EMAIL PROTECTED]> wrote:

Re: How to get the un-stemed word

2005-07-11 Thread Andrew Boyd
What about storing the unstemed word with the same position as the stemmed word. Would that show up in the TermVectors? -Original Message- From: mark harwood <[EMAIL PROTECTED]> Sent: Jul 8, 2005 10:44 AM To: java-user@lucene.apache.org, Andrew Boyd <[EMAIL PROTECTED]> Subject: Re: How t

Re: BooleanQuery$TooManyClauses

2005-07-11 Thread [EMAIL PROTECTED]
2500 vs 84. Wow. That's quite a few OR statements I would be saving following your guide of just indexing the parts of the datetime I plan to search on. Every ms count. Now I have a clear picture of how range query works. Great stuff. Thanks. Btw, coming from a db background I'm so used to wri

Re: Index Partitioning ( was Re: Search deadlocking under load)

2005-07-11 Thread Aalap Parikh
>I don't really know a lot about what gets loaded into memory when you >make/use a new searcher, but the one thing i've learned from experience >is >that the FieldCache (which gets used when you sort on a field) contains >every term in the field you are sorting on, and an instance of >FieldCache

Re: Re[2]: Index Partitioning ( was Re: Search deadlocking under load)

2005-07-11 Thread Otis Gospodnetic
Paul - I'm doing the same (smaller indices) for Simpy.com for similar reasons (fast, independent and faster reindexing, etc.). Each index has its own IndexSearcher, and they are kept in a LRU data structure. Before each search the index version is checked, and new IndexSearcher created in case th

SearchBlox adds RSS and Atom Web Feeds Indexing in Version 3.0

2005-07-11 Thread Robert Selvaraj
SearchBlox Software has released Version 3.0 of its J2EE Content Search Software. SearchBlox delivers out-of-the-box search functionality for quick and easy integration with websites, applications, intranets and portals. SearchBlox uses the Lucene Search API and incorporates integrated HTTP/HTTPS

RE: Search deadlocking under load

2005-07-11 Thread Otis Gospodnetic
Hi Nick, Without looking at the source of that method, I'd suggest first trying the multifile index format (you can easily convert to it by setting the new format on IndexWriter and optimizing it). I'd be interested to know if this eliminates the problem, or at least makes it harder to hit. Otis

Text or Keyword

2005-07-11 Thread MariLuz Elola
Hi, I am not sure if I have to index using Field.Text or Field.Keyword. I know that : Keyword-Isn't analyzed, but is indexed and stored in the index verbatim. This type is suitable for fields whose original value should be preserved in its entirety, such as URLs, file system paths, dates, person

RE: Search deadlocking under load

2005-07-11 Thread Nathan Brackett
Hey Otis, Thanks for the hasty response and apologies for my delayed response. It was Friday and time to go :) The queries we're running are very varied (wildcard, phrase, normal). The index is only about a 1/2 gig in size (maybe 250,000 documents). The machine is running FreeBSD 5.3 with ~2 gig

Re: BooleanQuery$TooManyClauses

2005-07-11 Thread Erik Hatcher
On Jul 11, 2005, at 1:45 AM, [EMAIL PROTECTED] wrote: Did a google serach on the problem when using the range search phrase of "+datefield:[199801 TO 200512]" (date stored as "MMDD") which returns 1 million hits. error: org.apache.lucene.search.BooleanQuery$TooManyClauses Adding "-Do

Re[2]: Index Partitioning ( was Re: Search deadlocking under load)

2005-07-11 Thread Sven Duzont
Hello, We are already using this design in production for a email job application system. Each client (company) have an account and may have multiple users When a new client is created, a new lucene index is automatically created when new job-applications arrive for this account. Job applicati