Re: Incremental updates / slow searches.

2006-10-09 Thread Mathias Lux
Rickard Bäckman wrote: > Hi, > > we are using a search system based on Lucene and have recently tried to add > incremental updating of the index instead of building a new index every now > and then. However we now run into problems as our searches starts to take > very long time to complete. > >

Re: Incremental updates / slow searches.

2006-10-09 Thread Yonik Seeley
On 10/9/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: don't forget to optimize your index every now and then as well... deleting a document just marks it as "deleted" it still gets inspectected by every query during scoring at least once to see that it can skip it, optimizing is the only thing t

Re: Incremental updates / slow searches.

2006-10-09 Thread Chris Hostetter
don't forget to optimize your index every now and then as well... deleting a document just marks it as "deleted" it still gets inspectected by every query during scoring at least once to see that it can skip it, optimizing is the only thing that truely removes the "deleted" documents. : Date: Mo

Re: wildcard and span queries

2006-10-09 Thread Erick Erickson
Doron: Thanks for the suggestion, I'll certainly put it on my list, depending upon what the PM decides. This app is geneaology reasearch, and users *can* put in their own wildcards... This is why I love this list... lots of smart people giving me suggestions I never would have thought of ... Th

Re: FieldSelectorResult instance descriptions?

2006-10-09 Thread Grant Ingersoll
See http://www.gossamer-threads.com/lists/lucene/java-dev/33964? search_string=Lazy%20Field%20Loading;#33964 for the discussion on Java Dev from wayback if you want more background info. To some extent, I still think Lazy Fields are in the early adopter stage, since they haven't officially b

RE: QueryParser syntax French Operator

2006-10-09 Thread Patrick Turcotte
Hi, I was thinking of something along those lines. Last week, I was able to take time to understand the JavaCC syntax and possiblities. I have some cleaning up, testing and documentation to do, but basically, I was able to expand the AND / OR / NOT patterns at r

Re: FieldSelectorResult instance descriptions?

2006-10-09 Thread Chris Hostetter
: If you read the entire source as I did, I becomes clear ! :) : The interesting code is in FieldsReader. Not neccessarily. There can be differneces between how constants are used and how they are suppose to be used (depending on wether or not the code using them has any bugs in it) : NO_LOAD

Re: wildcard and span queries

2006-10-09 Thread Doron Cohen
"Erick Erickson" <[EMAIL PROTECTED]> wrote on 09/10/2006 13:09:21: > ... The kicker is that what we are indexing is > OCR data, some of which is pretty trashy. So you wind up with "interesting" > words in your index, things like rtyHrS. So the whole question of allowing > very specific queries on d

Re: wildcard and span queries

2006-10-09 Thread Erick Erickson
I've already started that conversation with the PM, I'm just trying to get a better idea of what's possible. I'll whimper tooth and nail to keep from having to do a lot of work to add a feature to a product that nobody in their right mind would ever use . As far as the grammar, we don't actually

Re: wildcard and span queries

2006-10-09 Thread Paul Elschot
Erick, On Monday 09 October 2006 21:20, Erick Erickson wrote: > OK, forget the stuff about "TooManyBooleanClauses". I finally figured out > that if I specify the surround to have the same semantics as a SpanRegex ( > i.e, and(eri*, mal*)) it blows up with TooManyBooleanClauses. So that makes > mor

Re: wildcard and span queries

2006-10-09 Thread Erick Erickson
OK, forget the stuff about "TooManyBooleanClauses". I finally figured out that if I specify the surround to have the same semantics as a SpanRegex ( i.e, and(eri*, mal*)) it blows up with TooManyBooleanClauses. So that makes more sense to me now. Specifying 20w(eri*, mal*) is what I was using bef

Re: wildcard and span queries

2006-10-09 Thread Erick Erickson
OK, I'm using the surround code, and it seems to be working...with the following questions (always, more questions)... I'm gettng an exception sometimes of TooManyBasicQueries. I can control this by initializing BasicQueryFactory with a larger number. Do you have any cautions about upping this

Re: Lucene searching algorithm

2006-10-09 Thread Grant Ingersoll
Hi Michael, I think there are a number of good resources on this: 1. http://lucene.apache.org/java/scoring.html covers the basics of searching. The bottom has some pseudo code as well. 2. Lucene In Action 3. Search this list and other places for information on the Vector Space Model.

Re: threadsafe QueryParser?

2006-10-09 Thread Yonik Seeley
On 10/9/06, Stanislav Jordanov <[EMAIL PROTECTED]> wrote: Method static public Query parse(String query, String field, Analyzer analyzer) in class QueryParser is deprecated in 1.9.1 and the suggestion is: /"Use an instance of QueryParser and the [EMAIL PROTECTED] #parse(String)} method instead."

Re: Incremental updates / slow searches.

2006-10-09 Thread Yonik Seeley
The biggest thing would be to limit how often you open a new IndexSearcher, and when you do, warm up the new searcher in the background while you continue serving searches with the existing searcher. This is the strategy that Solr uses. There is also the issue of if you are analyzing/merging doc

Re: TermQuery and PhraseQuery..problem with word with space

2006-10-09 Thread Ismail Siddiqui
in fav_stores i see "Banana Republic" and "Ann Taylor" there .. and i am searching it with the capitals. On 10/9/06, Erick Erickson <[EMAIL PROTECTED]> wrote: OK, when you look in the "fav_stores" field in Luke, what do you see? And, are you searching on "Banana Republic" with the capitals? I

Re: TermQuery and PhraseQuery..problem with word with space

2006-10-09 Thread Doron Cohen
I would guess that one of your assumptions is wrong... The assumptions to check are: At indexing: - lpf.getLuceneFieldName() == "fav_stores" - pa.getPersonProfileChoice().getChoice() == "Banana Republic" At search: - the query is created like this: new TermQuery(new Term("fav_stores","Banana R

Re: TermQuery and PhraseQuery..problem with word with space

2006-10-09 Thread Erick Erickson
OK, when you look in the "fav_stores" field in Luke, what do you see? And, are you searching on "Banana Republic" with the capitals? If so, and your index has the letters in lower case, that's your problem. Erick On 10/9/06, Ismail Siddiqui <[EMAIL PROTECTED]> wrote: I am using StandardAnalyz

Re: deleteDocuments being ingnored

2006-10-09 Thread Simon Willnauer
System.out.println("Indexing " + f.getAbsolutePath()); Document doc = new Document(); doc.add(new Field("contents",loadContents (doc),Field.Store.NO,Field.Index.TOKENIZED)); doc.add(new Field("filename", f.getAbsolutePath(),Field.Stor

Re: deleteDocuments being ingnored

2006-10-09 Thread cfowler
My apologies, the IndexReader code I included was a commented out trial. Here is the active version. Sorry for the error: IndexReader ir = IndexReader.open(indexDir); System.out.println(">>>" + ir.numDocs()); int deleted = ir.deleteDocuments(new Ter

deleteDocuments being ingnored

2006-10-09 Thread cfowler
Hello, I'm brand new to this, so hopefully you can help me. I'm attempting to use the IndexReader object in lucene v2 to delete and readd documents. I very easily set up an index and my documents are added. Now I'm trying to update the same index by deleting the document before readdin

Re: TermQuery and PhraseQuery..problem with word with space

2006-10-09 Thread Ismail Siddiqui
I am using StandardAnalyzer while indexing the field.. I am also a creatign a field called full_text in which i am adding all these individual fields as TOKENIZED. here is the code while(choiceIt.hasNext()){ PersonProfileAnswer pa=(PersonProfileAnswer)choiceIt.next(); if(p

Re: How to search with empty content

2006-10-09 Thread Scott
You can get all document by using MatchAllDocsQuery. Kumar, Samala Santhosh (TPKM) wrote: I want to search without giving any input, when I search leaving blank the search text box it should give me all the documents present in the index. please give me some solution or pointers. regards Sa

Re: Performing a like query

2006-10-09 Thread Steven Rowe
Hi Rahil, Rahil wrote: > I was just wondering whether there is a > difference between the regular expression you sent me i.e. > (i) \s*(?:\b|(?<=\S)(?=\s)|(?<=\s)(?=\S))\s* > >and > (ii) \\b > > as they lead to the same output. For example, the string search "testing > a-new string=3/4

How to search with empty content

2006-10-09 Thread Kumar, Samala Santhosh (TPKM)
I want to search without giving any input, when I search leaving blank the search text box it should give me all the documents present in the index. please give me some solution or pointers. regards Santhosh

Re: highlight optimization

2006-10-09 Thread Erick Erickson
The fastest way to see if opening/closing your searcher is a problem would be to write a tiny little program that opened the index, fired off a few queries and timed each one. The queries can be canned, of course. I'm thinking this is, say, less that 20 lines (including imports). If you're familia

threadsafe QueryParser?

2006-10-09 Thread Stanislav Jordanov
Method static public Query parse(String query, String field, Analyzer analyzer) in class QueryParser is deprecated in 1.9.1 and the suggestion is: /"Use an instance of QueryParser and the [EMAIL PROTECTED] #parse(String)} method instead."/ My question is: in the context of multi threaded app, is

highlight optimization

2006-10-09 Thread Stelios Eliakis
Hi, I have a collection of 500 txt documents and I implement a web application(JSP) for searching these documents. In addition, the application shows the BestFragment of each result and highlights the query terms. My application is slow enough (about 2,5-3 seconds for each query) even if I run it

Re: Performing a like query

2006-10-09 Thread Rahil
Hi Steve Thanks for your response. I was just wondering whether there is a difference between the regular expression you sent me i.e. (i) \s*(?:\b|(?<=\S)(?=\s)|(?<=\s)(?=\S))\s* and (ii) \\b as they lead to the same output. For example, the string search "testing a-new string=3/4

Incremental updates / slow searches.

2006-10-09 Thread Rickard Bäckman
Hi, we are using a search system based on Lucene and have recently tried to add incremental updating of the index instead of building a new index every now and then. However we now run into problems as our searches starts to take very long time to complete. Our index is about 8-9GB large and we

Re: lucene link database

2006-10-09 Thread mark harwood
>>if you search the archive for database you'll bet a bunch of threads This was a hybrid implementation I did which worked with HSQLDB and Derby: http://www.mail-archive.com/java-user@lucene.apache.org/msg02953.html Cheers Mark - Original Message From: Erick Erickson <[EMAIL PROTECTED

Re: TermQuery and PhraseQuery..problem with word with space

2006-10-09 Thread Doron Cohen
> I am trying to index a field which has more than one word with space e.g. > "My Word" > i am indexng it UN_TOKENIZED .. but when i use TermQuery to query "My Word" > its not yielding any result.. Seems that it should work. Few things to check: - make sure you are indexing with UN_TOKENIZED. - c