two copies of indexes vs. master/slave indexes

2008-05-16 Thread jian chen
I have seen two different designs for incremental index updates. 1) Have two copies of indexes A and B. The incremental updates happens on A index while B index is being used for search. Then, hot swap the two indexes. Bring B index up to date and perform incremental updates thereafter. In this s

Re: All results

2008-05-16 Thread Otis Gospodnetic
I can't tell where you have a bug, but a couple of things look bad here: 1) java + HTML 2) opening/closing of the searcher for each request 3) empty if block in the else block. who knows, maybe you are hitting that. I think you simply need to do a little bit more debugging on your end. Write a

Re: simultaneous read and writes to the RAMDirectory

2008-05-16 Thread Otis Gospodnetic
I don't think there are any problems with doing both operations at the same time. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: jian chen <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Friday, May 16, 2008 7:12:34 PM > Subj

simultaneous read and writes to the RAMDirectory

2008-05-16 Thread jian chen
Lucene gurus, I have a question regarding RAMDirectory usage. Can the IndexWriter keep adding documents to the index meanwhile the IndexReader is open on this RAMDirectory and searches going on? I know in a FSDirectory case, the IndexWriter can add documents to the index meanwhile IndexReader rea

RE: setPositionIncrement questions

2008-05-16 Thread Chris Hostetter
: I ended up hacking StandardTokenizer::next() to check for $^$^$, and if it : is there then set the current Token PositionIncrement to 500 and resume the from what i remember of your use case, it probably would have been a lot easier to just add each paragraph as a seperate field instance (and

Re: theoretical maximum score

2008-05-16 Thread Chris Hostetter
: Is it possible to compute a theoretical maximum score for a given query if : constraints are placed on 'tf' and 'lengthNorm'? If so, scores could be : compared to a 'perfect score' (a feature request from our customers) without thinking about it two hard, you'd also need to constrain: * field

Re: Possible Bug when Querying?

2008-05-16 Thread Matthew Hall
Very very interesting. I went ahead and turned on the AllowLeadingWildcard toggle and everything works just as expected now, which is odd in a way. I'm still not certain why a search for '\*ache*' would be considered to have a leading wildcard. I'm searching for the literal * character her

Re: Version 2.3 Does Not Index/Digest All Document Tokens

2008-05-16 Thread Grant Ingersoll
Can you reduce this down to a unit test? Thanks, Grant On May 16, 2008, at 11:37 AM, Dan Rugg wrote: After upgrading to version 2.3.x from 2.2.0, we started experiencing issues with our index searches. Some searches produced false positives, while others produce no hits for terms known to b

Version 2.3 Does Not Index/Digest All Document Tokens

2008-05-16 Thread Dan Rugg
After upgrading to version 2.3.x from 2.2.0, we started experiencing issues with our index searches. Some searches produced false positives, while others produce no hits for terms known to be in specific documents that where digested. After setting up tests that created indexes containing single

Re: Boosting Search

2008-05-16 Thread Vinicius Carvalho
Thanks a lot. That was it: new times: 4ms -> mysql 2-4ms -> lucene Now I tried a few times, with a pause and and open index so it would simulate the correct behaviour during a server usage of my index. Regards On Fri, May 16, 2008 at 2:25 PM, Karl Wettin <[EMAIL PROTECTED]> wrote: > > 16 maj

Re: Boosting Search

2008-05-16 Thread Chris Lu
You may need some more data to really compare the performance. >From previous experience, I would expect MySql's search time would increase as data grows, but Lucene's time stays almost unchanged. -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application

Re: All results

2008-05-16 Thread Hasan Diwan
On 15/05/2008, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > What does your code look like? If you are using Hits, what does > hits.length() give you? Me code is below: Hits hits = searcher.search(parsedQuery); if (hits.length() == 0) {

Re: Boosting Search

2008-05-16 Thread Karl Wettin
16 maj 2008 kl. 19.20 skrev Vinicius Carvalho: I know its a dumb test There is a lot of initial latency. You want to "warm" the index. but what can be done in order to speed things up? http://wiki.apache.org/lucene-java/BasicsOfPerformance karl

Boosting Search

2008-05-16 Thread Vinicius Carvalho
Hello there! We are starting with lucene, and in order to prove it's usage one of the benefits is performance. I do know that lucene (as other full text search engines) provide many more benefits than using a SGDB. Ok, so here's a simple test: I have a Table with 17.700 rows. It is stored on mysql

Re: CLucene and Lucene

2008-05-16 Thread Otis Gospodnetic
Kevin, CLucene is behind Lucene Java in terms of functionality and index format, thus you'd have to be careful about which version of CLucene and Lucene Java you use. A better place to ask is the CLucene mailing list, as Ben & Co. know exactly which Lucene Java version they are compatible with

RE: CLucene and Lucene

2008-05-16 Thread Kevin Daly (kedaly)
From: Kevin Daly (kedaly) Sent: Friday, May 16, 2008 1:34 PM To: 'java-user@lucene.apache.org' Subject: CLucene and Lucene I am have a question concerning interop between CLucene and Lucene. It is possible to have a C++ Application using CLucene acting as an

CLucene and Lucene

2008-05-16 Thread Kevin Daly (kedaly)
I am have a question concerning interop between CLucene and Lucene. It is possible to have a C++ Application using CLucene acting as an IndexWriter, and then have a Web Applicaltion using Lucene to query the index. Could there be issues with locking under load for example. I have done some basic

Re: Document clustering with Lucene

2008-05-16 Thread Grant Ingersoll
Do you want search result clustering or document clustering? My understanding of Carrot2 is it isn't designed for the latter. The difference being it is designed to work off of shorter snippets of text, as opposed to the whole document. FWIW, you _might_ find some help over on the Mahout

Re: search problem - not finding field values ending in "X"

2008-05-16 Thread Ulf Dittmer
D'oh! Of course - I'm using StandardAnalyzer. Changing to a PerFieldAnalyzerWrapper with a KeywordAnalyzer for that field fixes the issue. Thanks so much for fast response. Ulf --- Ian Lea <[EMAIL PROTECTED]> wrote: > Hi > > > I bet you are using an analyzer that is downcasing > isbn:00714

Re: search problem - not finding field values ending in "X"

2008-05-16 Thread Ian Lea
Hi I bet you are using an analyzer that is downcasing isbn:007149216X to isbn:007149216x. I've been there! Options include creating the query programmatically, using PerFieldAnalyzerWrapper, downcasing everything yourself in advance. Or convert to ISBN-13. -- Ian. On Fri, May 16, 2008 at 10

search problem - not finding field values ending in "X"

2008-05-16 Thread Ulf Dittmer
Hello- I'm experiencing a weird issue searching an index. The index has information about books, and one of the fields is the ISBN number. It is stored in the index in untokenized form to enable searches by ISBN. So a query like "isbn:0071490833" would return the Document for that book. But it doe