Caching of BitSets from filters and Query.equals()

2007-03-05 Thread Antony Bowesman
Not sure if I'm going about this the right way, but I want to use Query instances as a key to a HashMap to cache BitSet instances from filtering operations. They are all for the same reader. That means equals() for any instance of the same generic Query would have to return true if the terms,

Re: searching a date range

2007-03-05 Thread Daniel Noll
Mohammad Norouzi wrote: Hi you know, actually we dont indexed this field as Date. we always use string instead of Date type because we use both Hijri date and Gregorian date so if we put a Hijri date the DateField not work properly. that is why we index such this field as String. What DateFi

Re: QueryParser and auto wildcard searches

2007-03-05 Thread Patrick Turcotte
Hi! I am new to Lucene and I am trying to customise the query parser to default to wildcard searches. For example, if the user types in "fenc", it should find "fence" and "fencing" and "fences" and "fenced". Looks like stemming to me! Maybe you should consider using a stemming analyzer instea

QueryParser and auto wildcard searches

2007-03-05 Thread Gavin
Hello, I am new to Lucene and I am trying to customise the query parser to default to wildcard searches. For example, if the user types in "fenc", it should find "fence" and "fencing" and "fences" and "fenced". I can not find a way to modify / extend the QueryParser to automatically create wildc

alternative scoring algorithm for PhraseQuery

2007-03-05 Thread Philipp Nanz
Hello folks, Maybe one of you can help me with this (sorry, long read). I have implemented a FuzzyPhraseQuery that works similar to Lucene's native PhraseQuery. I.e. it can retrieve phrases for a query, with respect to insertions and term order. But in addition it can also find matches with term

Re: IndexSearcher cache

2007-03-05 Thread Chris Hostetter
: initialized... I tried to create a seacher everytime but that lead me to : the Too-Many-Files-Open exception. So no matter what I do I face a show : stopper. were you closing the old searcher before opening the new one? even if that was the cause of your problem, i still wouldn't recomend reop

Re: IndexSearcher cache

2007-03-05 Thread Mark Miller
My two cents: Lucene often offers the least common denominator...for example: out of the box, Lucene best handles either a single user / single thread experience or a mostly 'read only' experience. I believe that the main reason for this is that it serves the greatest variety uses. You can, a

Re: IndexSearcher cache

2007-03-05 Thread mcmoisei
I agree those are benefits when you batch process the indexes once or once in a while. The beauty of AOP is that I can intercept writes and do change the index on the spot. At that point I'd need to let the search know or drop it. If I do that that will face issues on the search side since this

Re: IndexSearcher cache

2007-03-05 Thread Doron Cohen
Indeed, having to re-open a searcher/reader in order for searches to reflect index modification, can sometimes not best fit with the logic of a certain application. But see the features made possible with this design: (+) searches do not feel index modifications until desired. (++) no need to sync

Re: Using Stemmers

2007-03-05 Thread Grant Ingersoll
Hi Mathieu, You can't add TokenFilters to an existing Analyzer. However, implementing an Analyzer that acts just like the StandardAnalyzer plus your Stemmer is pretty straightforward. StandardAnalzyer.tokenStream() looks like: /** Constructs a [EMAIL PROTECTED] StandardTokenizer} filtere

Re: How can I use SortComparator in my case?

2007-03-05 Thread Doron Cohen
I too cannot think of an indexing configuration that would help this. However it seems that all the required information exists at search time, more precisely at hits collection time: - the doc-id and doc-score are known, and used when hits are collected. - The value of that certain field of inter

Re: searching a date range

2007-03-05 Thread Chris Hostetter
: I override that method and just remove the try/catch block in which you put : codes with Date stuffs and now it works fine : : my overridden method only return new RangeQuery(...); subclassing QueryParser to override getRangeQuery and eliminate the special Date code sounds like it would work ju

Re: Clearing locks

2007-03-05 Thread Chris Hostetter
there are also the static IndexReder.isLocked(Directory) and IndexReder.unlock(Directory) methods that encapsulate this logic for you ... they've been around since at least 1.4.3. : Date: Sun, 4 Mar 2007 21:34:52 -0800 : From: Chris Lu <[EMAIL PROTECTED]> : Reply-To: java-user@lucene.apache.org

RE: Soliciting Design Thoughts on Date Searching

2007-03-05 Thread Steven Parkes
But, letting it stay in the text stream and not putting it in a separate date field would give you some trouble with ranges because things that weren't dates could mess you up. This is why Chris suggested putting a prefix on the token. For example, leading underscor

Missing .tii File

2007-03-05 Thread Tim Patton
I'm not sure how, but in moving an index over from 2.0 to 2.1 and changing my own code one of the .tii files got deleted. I still have the .tis file though, can I rebuild the missing file so I can open my index? Luke won't open it now and I just want to make sure everything is ok before openi

Re: IndexSearcher cache

2007-03-05 Thread mcmoisei
That part is self understood. However as I describe the problem initially - and the use case is a very practical way of dealing with documents in real live - they change, we edit them, I don't want to run a batch re-indexing thing every night... I just wanted done on the spot. One instance Index

Using Stemmers

2007-03-05 Thread DECAFFMEYER MATHIEU
Hi, This is a very simple question, but I just can't find the ressources I need ... I am using the StandardAnalyzer : StandardAnalyzer stdAnalyzer; if ((stopWordList != null) && (stopWordList.length != 0)) { stdAnalyzer = new StandardAnalyzer(stopWordList); } else { stdAnalyzer = new Standa

Re: Soliciting Design Thoughts on Date Searching

2007-03-05 Thread Erick Erickson
If add() is tokening up front, then three calls would be the logical equivalent, and I wouldn't need to artificially add separators while doing a looping construct. document.add( Field.Text("interestingdates", "17760704" ) ); document.add( Field.Text("interestingdates&q

Re: IndexSearcher cache

2007-03-05 Thread Erick Erickson
<1>. Every time you close/open a reader, you pay a significant penalty to warm up caches, etc. You may have to do some tricky dancing to coordinate among the sessions to be able to close/reopen the reader to allow updates to show up though. Erick On 3/5/07, Mohammad Norouzi <[EMAIL PROTECTED]>

Re: Soliciting Design Thoughts on Date Searching

2007-03-05 Thread Walt Stoneburner
cal equivalent, and I wouldn't need to artificially add separators while doing a looping construct. document.add( Field.Text("interestingdates", "17760704" ) ); document.add( Field.Text("interestingdates", "20010911" ) ); document.add( Field.Text("inter

ApacheCon Promo

2007-03-05 Thread Grant Ingersoll
Little shameless self promotion here: If your not aware, there will be several Lucene related talks at ApacheCon Europe (in Amsterdam) the first week in May. ApacheCon info is available at http://www.eu.apachecon.com Here is the current schedule for the talks: * May 1: Lucene Boot Cam

Re: searching a date range

2007-03-05 Thread Mohammad Norouzi
Hi Erick, I take a look at your source codes and I saw in the getRangeQuery() method you put a DateFormat as this: DateFormat df = DateFormat.getDateInstance(DateFormat.SHORT, locale); ... Date d1 = df.parse(part1); the last line of code doesnt work with our localed format. for example I put the

Re: IndexSearcher cache

2007-03-05 Thread Mohammad Norouzi
Hi Erick I am completely confused about this IndexReader. in my case, I have to keep the reader opened because of pagination of the result so I have to had a reader per session. the thing that baffled me is can only one reader service all the session at the same time? I mean 1- having one reader

Re: SELECT * FROM Index-file

2007-03-05 Thread Morten Simonsen
On Mon, 2007-03-05 at 07:52 -0500, Erick Erickson wrote: > Why not just call IndexReader.document(idx) where idx ranges > from 0 to IndexReader.maxDoc()? I believe if your index has some > deleted documents you'll have to handle null returns though That was exactly what I was looking for. Than

Re: SELECT * FROM Index-file

2007-03-05 Thread Ronnie Kolehmainen
Or MatchAllDocsQuery. /Ronnie Erick Erickson wrote: Why not just call IndexReader.document(idx) where idx ranges from 0 to IndexReader.maxDoc()? I believe if your index has some deleted documents you'll have to handle null returns though Sorry to lose you to the dark side ... Best Erick

Re: SELECT * FROM Index-file

2007-03-05 Thread Erick Erickson
Why not just call IndexReader.document(idx) where idx ranges from 0 to IndexReader.maxDoc()? I believe if your index has some deleted documents you'll have to handle null returns though Sorry to lose you to the dark side ... Best Erick On 3/5/07, Morten Simonsen <[EMAIL PROTECTED]> wrote:

Re: How can I use SortComparator in my case?

2007-03-05 Thread Erick Erickson
There's a discussion recently where someone pointed me to FieldSortedHitQueue, you might trysearchinng for that. Also, try "buckets" which was the header of that discussion. You can also think about clever indexing schemes with fields that allow you to sort however you really need to, although I

Re: searching a date range

2007-03-05 Thread Erick Erickson
I think you should search the archive for DateTools. There have been very extensive discussions of this topic that will give you answers far more quickly. Dates are strings in Lucene. There's no magic here. You don't need to override anything to get them to work, all you need to do is make sure t

Re: IndexSearcher cache

2007-03-05 Thread Erick Erickson
There was quite a long discussion thread on this topic relatively recently, try searching the archive for concurrence, perhaps IndexReader, etc. The short take-away is that you should share a single instance of the reader, since opening one is an expensive operation, and the first searches you pe

Re: Clearing locks

2007-03-05 Thread Michael McCandless
"Chris Lu" <[EMAIL PROTECTED]> wrote: > They are not really unique. Here are my code to unlock the directory. > Notice there are two locks. > > public static void unlockDirectory(Directory dir) { > Lock dirLock = dir.makeLock(IndexWriter.WRITE_LOCK_NAME); > if (dirLock.isLocke

SELECT * FROM Index-file

2007-03-05 Thread Morten Simonsen
Hi I'm about to convert from Lucene index-files into a MySQL (sorry about that:) I thought I would run a "SELECT *" on the index-file, then read through all the "rows" (hits?) and process each of them into my new database. So I wrote this code: WildcardQuery query = new WildcardQuery(new

Re: More Precise Highlighting

2007-03-05 Thread mark harwood
I think the solution is fairly simple. Pass the "metadata" fieldname to the QueryTermExtractor - not the fieldname "author". QueryTermExtractor effectively provides just a list of strings (no fieldnames) which are then matched against strings found in the tokenStream which represents your conten

RE: How can I use SortComparator in my case?

2007-03-05 Thread Ramana Jelda
This will then be a big hastle. The results are in 100s and sometimes in 1000s. Hum.. No other better way? Jelda > -Original Message- > From: Mordo, Aviran (EXP N-NANNATEK) [mailto:[EMAIL PROTECTED] > Sent: Friday, March 02, 2007 8:02 PM > To: java-user@lucene.apache.org > Subject: RE: H