Rickard Bäckman wrote:
> Hi,
>
> we are using a search system based on Lucene and have recently tried to add
> incremental updating of the index instead of building a new index every now
> and then. However we now run into problems as our searches starts to take
> very long time to complete.
>
>
On 10/9/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
don't forget to optimize your index every now and then as well... deleting
a document just marks it as "deleted" it still gets inspectected by every
query during scoring at least once to see that it can skip it, optimizing
is the only thing t
don't forget to optimize your index every now and then as well... deleting
a document just marks it as "deleted" it still gets inspectected by every
query during scoring at least once to see that it can skip it, optimizing
is the only thing that truely removes the "deleted" documents.
: Date: Mo
Doron:
Thanks for the suggestion, I'll certainly put it on my list, depending upon
what the PM decides. This app is geneaology reasearch, and users *can* put
in their own wildcards...
This is why I love this list... lots of smart people giving me suggestions I
never would have thought of ...
Th
See http://www.gossamer-threads.com/lists/lucene/java-dev/33964?
search_string=Lazy%20Field%20Loading;#33964 for the discussion on
Java Dev from wayback if you want more background info.
To some extent, I still think Lazy Fields are in the early adopter
stage, since they haven't officially b
Hi,
I was thinking of something along those lines.
Last week, I was able to take time to understand the JavaCC syntax and
possiblities.
I have some cleaning up, testing and documentation to do, but
basically, I
was able to expand the AND / OR / NOT patterns at r
: If you read the entire source as I did, I becomes clear ! :)
: The interesting code is in FieldsReader.
Not neccessarily. There can be differneces between how constants are
used and how they are suppose to be used (depending on wether or not the
code using them has any bugs in it)
: NO_LOAD
"Erick Erickson" <[EMAIL PROTECTED]> wrote on 09/10/2006 13:09:21:
> ... The kicker is that what we are indexing is
> OCR data, some of which is pretty trashy. So you wind up with
"interesting"
> words in your index, things like rtyHrS. So the whole question of
allowing
> very specific queries on d
I've already started that conversation with the PM, I'm just trying to get a
better idea of what's possible. I'll whimper tooth and nail to keep from
having to do a lot of work to add a feature to a product that nobody in
their right mind would ever use .
As far as the grammar, we don't actually
Erick,
On Monday 09 October 2006 21:20, Erick Erickson wrote:
> OK, forget the stuff about "TooManyBooleanClauses". I finally figured out
> that if I specify the surround to have the same semantics as a SpanRegex (
> i.e, and(eri*, mal*)) it blows up with TooManyBooleanClauses. So that makes
> mor
OK, forget the stuff about "TooManyBooleanClauses". I finally figured out
that if I specify the surround to have the same semantics as a SpanRegex (
i.e, and(eri*, mal*)) it blows up with TooManyBooleanClauses. So that makes
more sense to me now.
Specifying 20w(eri*, mal*) is what I was using bef
OK, I'm using the surround code, and it seems to be working...with the
following questions (always, more questions)...
I'm gettng an exception sometimes of TooManyBasicQueries. I can control
this by initializing BasicQueryFactory with a larger number. Do you have any
cautions about upping this
Hi Michael,
I think there are a number of good resources on this:
1. http://lucene.apache.org/java/scoring.html covers the basics of
searching. The bottom has some pseudo code as well.
2. Lucene In Action
3. Search this list and other places for information on the Vector
Space Model.
On 10/9/06, Stanislav Jordanov <[EMAIL PROTECTED]> wrote:
Method
static public Query parse(String query, String field, Analyzer analyzer)
in class QueryParser is deprecated in 1.9.1 and the suggestion is: /"Use
an instance of QueryParser and the [EMAIL PROTECTED] #parse(String)} method
instead."
The biggest thing would be to limit how often you open a new
IndexSearcher, and when you do, warm up the new searcher in the
background while you continue serving searches with the existing
searcher. This is the strategy that Solr uses.
There is also the issue of if you are analyzing/merging doc
in fav_stores i see "Banana Republic" and "Ann Taylor" there .. and i am
searching it with the capitals.
On 10/9/06, Erick Erickson <[EMAIL PROTECTED]> wrote:
OK, when you look in the "fav_stores" field in Luke, what do you see?
And, are you searching on "Banana Republic" with the capitals? I
I would guess that one of your assumptions is wrong...
The assumptions to check are:
At indexing:
- lpf.getLuceneFieldName() == "fav_stores"
- pa.getPersonProfileChoice().getChoice() == "Banana Republic"
At search:
- the query is created like this:
new TermQuery(new Term("fav_stores","Banana R
OK, when you look in the "fav_stores" field in Luke, what do you see?
And, are you searching on "Banana Republic" with the capitals? If so, and
your index has the letters in lower case, that's your problem.
Erick
On 10/9/06, Ismail Siddiqui <[EMAIL PROTECTED]> wrote:
I am using StandardAnalyz
System.out.println("Indexing " + f.getAbsolutePath());
Document doc = new Document();
doc.add(new Field("contents",loadContents
(doc),Field.Store.NO,Field.Index.TOKENIZED));
doc.add(new Field("filename",
f.getAbsolutePath(),Field.Stor
My apologies, the IndexReader code I included was a commented out trial.
Here is the active version. Sorry for the error:
IndexReader ir = IndexReader.open(indexDir);
System.out.println(">>>" + ir.numDocs());
int deleted = ir.deleteDocuments(new Ter
Hello,
I'm brand new to this, so hopefully you can help me. I'm
attempting to use the IndexReader object in lucene v2 to delete and readd
documents. I very easily set up an index and my documents are added. Now
I'm trying to update the same index by deleting the document before
readdin
I am using StandardAnalyzer while indexing the field..
I am also a creatign a field called full_text in which i am adding all
these individual fields as TOKENIZED.
here is the code
while(choiceIt.hasNext()){
PersonProfileAnswer pa=(PersonProfileAnswer)choiceIt.next();
if(p
You can get all document by using MatchAllDocsQuery.
Kumar, Samala Santhosh (TPKM) wrote:
I want to search without giving any input, when I search leaving blank
the search text box it should give me all the documents present in the
index. please give me some solution or pointers.
regards
Sa
Hi Rahil,
Rahil wrote:
> I was just wondering whether there is a
> difference between the regular expression you sent me i.e.
> (i) \s*(?:\b|(?<=\S)(?=\s)|(?<=\s)(?=\S))\s*
>
>and
> (ii) \\b
>
> as they lead to the same output. For example, the string search "testing
> a-new string=3/4
I want to search without giving any input, when I search leaving blank
the search text box it should give me all the documents present in the
index. please give me some solution or pointers.
regards
Santhosh
The fastest way to see if opening/closing your searcher is a problem would
be to write a tiny little program that opened the index, fired off a few
queries and timed each one. The queries can be canned, of course. I'm
thinking this is, say, less that 20 lines (including imports). If you're
familia
Method
static public Query parse(String query, String field, Analyzer analyzer)
in class QueryParser is deprecated in 1.9.1 and the suggestion is: /"Use
an instance of QueryParser and the [EMAIL PROTECTED] #parse(String)} method instead."/
My question is: in the context of multi threaded app, is
Hi,
I have a collection of 500 txt documents and I implement a web
application(JSP) for searching these documents.
In addition, the application shows the BestFragment of each result and
highlights the query terms.
My application is slow enough (about 2,5-3 seconds for each query) even if I
run it
Hi Steve
Thanks for your response. I was just wondering whether there is a
difference between the regular expression you sent me i.e.
(i) \s*(?:\b|(?<=\S)(?=\s)|(?<=\s)(?=\S))\s*
and
(ii) \\b
as they lead to the same output. For example, the string search "testing
a-new string=3/4
Hi,
we are using a search system based on Lucene and have recently tried to add
incremental updating of the index instead of building a new index every now
and then. However we now run into problems as our searches starts to take
very long time to complete.
Our index is about 8-9GB large and we
>>if you search the archive for database you'll bet a bunch of threads
This was a hybrid implementation I did which worked with HSQLDB and Derby:
http://www.mail-archive.com/java-user@lucene.apache.org/msg02953.html
Cheers
Mark
- Original Message
From: Erick Erickson <[EMAIL PROTECTED
> I am trying to index a field which has more than one word with space e.g.
> "My Word"
> i am indexng it UN_TOKENIZED .. but when i use TermQuery to query "My
Word"
> its not yielding any result..
Seems that it should work.
Few things to check:
- make sure you are indexing with UN_TOKENIZED.
- c
32 matches
Mail list logo