RE: Analyzers aren't reusable?? (lucene 4.2.1)

2013-12-05 Thread Scott Smith
phi.de eMail: u...@thetaphi.de > -Original Message----- > From: Scott Smith [mailto:ssm...@mainstreamdata.com] > Sent: Thursday, December 05, 2013 9:36 PM > To: java-user@lucene.apache.org > Subject: Analyzers aren't reusable?? (lucene 4.2.1) > > I wrote the following

RE: Analyzers aren't reusable?? (lucene 4.2.1)

2013-12-05 Thread Scott Smith
D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message----- > From: Scott Smith [mailto:ssm...@mainstreamdata.com] > Sent: Thursday, December 05, 2013 9:36 PM > To: java-user@lucene.apache.org > Subject: Analyzers aren't reusable?? (lucene 4.2.1) > > I

Analyzers aren't reusable?? (lucene 4.2.1)

2013-12-05 Thread Scott Smith
I wrote the following to demonstrate what for me was surprising behavior (this is Lucene 4.2.1). If you want to run this yourself, you should be able to since there are no references to anything other than standard lucene and java libraries. Basically, this is an analyzer that makes everything

RE: Highlighting phrases

2013-11-27 Thread Scott Smith
Never mind. I figured it out. Thanks anyway. -Original Message- From: Scott Smith [mailto:ssm...@mainstreamdata.com] Sent: Wednesday, November 27, 2013 9:27 AM To: java-user@lucene.apache.org Subject: Highlighting phrases I'm doing some highlighting with the following code fra

Highlighting phrases

2013-11-27 Thread Scott Smith
I'm doing some highlighting with the following code fragment: formatter = new SimpleHTMLFormatter(, ); Scorer score = new QueryScorer(myQuery); ht = new Highlighter(formatter, score); ht.setTextFragmenter(new NullFragmenter());

Phrase highlight

2013-11-26 Thread Scott Smith
I'm doing some highlighting with the following code fragment: formatter = new SimpleHTMLFormatter(, ); Scorer score = new QueryScorer(myQuery); ht = new Highlighter(formatter, score); ht.setTextFragmenter(new NullFragmenter());

RE: Can you escape characters you don't want the analyzer to modify

2013-09-18 Thread Scott Smith
ounds like you either need to have a custom analyzer or a field-aware analyzer. -- Jack Krupansky -Original Message----- From: Scott Smith Sent: Tuesday, September 17, 2013 4:26 PM To: java-user@lucene.apache.org Subject: Can you escape characters you don't want the analyzer to modify S

Can you escape characters you don't want the analyzer to modify

2013-09-17 Thread Scott Smith
Suppose I have a string like "ab@cd%d". My analyzer will turn this into "ab cd d". Can I pass it "ab\@cd\%d" and force it to treat it as a single word? I want to use the Query parser, but I don't want it messing with fields that have not been analyzed.

Lucene Query Syntax with analyzed and unanalyzed text

2013-09-16 Thread Scott Smith
I want to be sure I understand this correctly. Suppose I have a search that I'm going to run through the query parser that looks like: body:"some phrase" AND keyword:"my-keyword" clearly "body" and "keyword" are field names. However, the additional information is that the "body" field is anal

RE: classic.QueryParser - bug or new behavior?

2013-05-19 Thread Scott Smith
g the whole term in quotes. Otherwise the slash (even embedded in the middle of a term!) indicates the start of a regex query term. -- Jack Krupansky -Original Message- From: Scott Smith Sent: Sunday, May 19, 2013 2:50 PM To: java-user@lucene.apache.org Subject: classic.QueryParser - bug o

classic.QueryParser - bug or new behavior?

2013-05-19 Thread Scott Smith
I just upgraded from lucene 4.1 to 4.2.1. We believe we are seeing some different behavior. I'm using org.apache.lucene.queryparser.classic.QueryParser. If I pass the string "20110920/EXPIRED" (w/o quotes) to the parser, I get: org.apache.lucene.queryparser.classic.ParseException: Cannot pars

RE: Lucene slow performance -- still broke

2013-03-20 Thread Scott Smith
hy on earth do you set: lbsm.setMaxMergeDocs(10); if you have 10 docs in a segment you don't want to merge anymore? I don't think you should set this at all. simon On Wed, Mar 20, 2013 at 10:48 PM, Scott Smith wrote: > First, I decided I wasn't comfortable doing closes on

RE: Lucene slow performance -- still broke

2013-03-20 Thread Scott Smith
tRAMBufferSizeMB(50.0); Any help in figuring out what is causing this problem would be appreciated. I do now have an offline system that I can play with so I can do some intrusive things if need be. Scott -Original Message- From: Scott Smith [mailto:ssm...@mainstreamdata.com] Sent:

RE: Lucene slow performance

2013-03-16 Thread Scott Smith
Subject: RE: Lucene slow performance Please forceMerge only one time not every time (only to clean up your index)! If you are doing a reindex already, just fix your close logic as discussed before. Scott Smith schrieb: >Unfortunately, this is a production system which I can't touch (th

RE: Lucene slow performance

2013-03-15 Thread Scott Smith
nauer [mailto:simon.willna...@gmail.com] Sent: Friday, March 15, 2013 5:08 PM To: java-user@lucene.apache.org Subject: Re: Lucene slow performance On Sat, Mar 16, 2013 at 12:02 AM, Scott Smith wrote: > " Do you always close IndexWriter after adding few documents and when > closing, d

RE: Lucene slow performance

2013-03-15 Thread Scott Smith
March 16, 2013 12:08 AM > To: java-user@lucene.apache.org > Subject: Re: Lucene slow performance > > On Sat, Mar 16, 2013 at 12:02 AM, Scott Smith > wrote: > > " Do you always close IndexWriter after adding few documents and > > when > closing, disable "

RE: Lucene slow performance

2013-03-15 Thread Scott Smith
l the time with cancelling all merges)? Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Scott Smith [mailto:ssm...@mainstreamdata.com] > Sent: Friday, March 15, 2013 11:15 PM > To: java-

RE: Lucene slow performance

2013-03-15 Thread Scott Smith
a custom merge policy or somthing like this, any special IndexWriter settings? On Fri, Mar 15, 2013 at 11:15 PM, Scott Smith wrote: > We have a system that is using lucene and the searches are very slow. The > number of documents is fairly small (less than 30,000) and each document is

RE: Lucene slow performance

2013-03-15 Thread Scott Smith
n has changed since 1.4, but does it not merge all of the various files into a few files? -Original Message- From: Scott Smith [mailto:ssm...@mainstreamdata.com] Sent: Friday, March 15, 2013 4:15 PM To: java-user@lucene.apache.org Subject: Lucene slow performance We have a system th

Lucene slow performance

2013-03-15 Thread Scott Smith
We have a system that is using lucene and the searches are very slow. The number of documents is fairly small (less than 30,000) and each document is typically only 2 to 10 kilo-characters. Yet, searches are taking 15-16 seconds. One of the things I noticed was that the index directory has sev

RE: Which stemmer?

2012-11-15 Thread Scott Smith
Thanks for the suggestions I think Erick is correct as well. I'll let the customer decide. Here's an updated list. Fyi--the minStem was the English Minimal Stemmer--I changed the label. Interesting to see where the minimal stemmer and porter agree (and KStemmer doesn't). You may also find t

RE: Which stemmer?

2012-11-14 Thread Scott Smith
d how some common words are stemmed. -- Jack Krupansky -Original Message- From: Scott Smith Sent: Wednesday, November 14, 2012 10:55 AM To: java-user@lucene.apache.org Subject: Which stemmer? Does anyone have any experience with the stemmers? I know that Porter is what "everyone"

RE: CJKWidthFilter vs ICUFoldingFilter

2012-11-14 Thread Scott Smith
Thanks -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Wednesday, November 14, 2012 12:17 PM To: java-user@lucene.apache.org Subject: Re: CJKWidthFilter vs ICUFoldingFilter On Wed, Nov 14, 2012 at 9:47 AM, Scott Smith wrote: > Reading the documentation for these

Which stemmer?

2012-11-14 Thread Scott Smith
Does anyone have any experience with the stemmers? I know that Porter is what "everyone" uses. Am I better off with KStemFilter (better performance) or ?? Does anyone understand the differences between the various stemmers and how to choose one over another?

CJKWidthFilter vs ICUFoldingFilter

2012-11-14 Thread Scott Smith
Reading the documentation for these two filters seems to imply that CJKWidthFilter is a subset of ICUFoldingFilter. Is that true? I'm basically using the CjkAnalyzer (from Lucene 4.0) but adding ICUFoldingFilter because I need umlauts and accent characters removed from any German, French, etc.

RE: Near Real Time for multiple applications

2012-11-07 Thread Scott Smith
ccandless.com] Sent: Tuesday, November 06, 2012 5:32 AM To: java-user@lucene.apache.org Subject: Re: Near Real Time for multiple applications On Mon, Nov 5, 2012 at 6:33 PM, Scott Smith wrote: > I've been reading about NRT thinking it might be good to integrate it into my > code. However,

RE: Highlighting html pages

2012-11-05 Thread Scott Smith
tags being properly nested. Cheers Scott -----Original Message- From: Scott Smith [mailto:ssm...@mainstreamdata.com] Sent: Thursday, November 01, 2012 7:16 PM To: Michael Sokolov; java-user@lucene.apache.org Subject: RE: Highlighting html pages I was trying to play with this. Am I correct in

Near Real Time for multiple applications

2012-11-05 Thread Scott Smith
I've been reading about NRT thinking it might be good to integrate it into my code. However, I have a question. Suppose that the index writer and the index reader run in totally different JVMs (i.e., they are different applications and only communicate via the disk). Am I correct in thinking

RE: Highlighting html pages

2012-11-01 Thread Scott Smith
actory.com] Sent: Tuesday, October 23, 2012 9:04 PM To: java-user@lucene.apache.org Cc: Scott Smith Subject: Re: Highlighting html pages If you use HTMLStripCharFilter, it extracts the text only, leaving tags out, and remembering the word positions so that highlighting works properly. Should do ex

4.0 tokenStream or SimpleAnalyzer bug?

2012-11-01 Thread Scott Smith
I was doing some tokenizer/filter analysis attempting to fix a bug I have in highlighting under 4.0. I was running the displayTokensWithFullDetails code from LIA2. I would get an exception with a bad index value of -1. I fixed the problem by doing a reset() immediately after creating my Token

Highlighting and InvalidTokenOffsetsException in Lucene 4.0

2012-10-31 Thread Scott Smith
I'm migrating code from Lucene 3.5 to 4.0. I have the following code which is supposed to highlight text. I get the exception InvalidTokenOffsetsException. I have no idea what that means. I am using a custom analyzer which seems to work for searching/indexing, so I assume it will work here (

RE: Norms and Term Vectors in Lucene 4.0

2012-10-30 Thread Scott Smith
-user@lucene.apache.org Subject: Re: Norms and Term Vectors in Lucene 4.0 hey scott, On Mon, Oct 29, 2012 at 11:56 PM, Scott Smith wrote: > Converting some code to lucene 4.0, it appears that we can no longer set > whether we want to store norms or termvectors using the "sug

Norms and Term Vectors in Lucene 4.0

2012-10-29 Thread Scott Smith
Converting some code to lucene 4.0, it appears that we can no longer set whether we want to store norms or termvectors using the "sugared" Field classes (e.g., StringField() and TextField). I gather the defaults are to store norms and to not store termvectors? If I don't want norms on a field,

RE: lucene 4.0 indexReader is changed

2012-10-29 Thread Scott Smith
.html#openIfChanged? See: http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/index/DirectoryReader.html#openIfChanged(org.apache.lucene.index.DirectoryReader) -- Jack Krupansky -Original Message- From: Scott Smith Sent: Friday, October 26, 2012 7:54 PM To: java-user@lucene.apache.org Su

RE: Lucene 4.0 delete by ID

2012-10-29 Thread Scott Smith
2 01:47, Mossaab Bagdouri wrote: > Lucene document IDs are not stable. You could add a field with an ID > that you maintain. Your query would then be just a TermQuery on the ID. > > Regards, > Mossaab > > > 2012/10/26 Scott Smith > >> I'm currently converting

RE: Lucene 4.0 delete by ID

2012-10-29 Thread Scott Smith
cument IDs are not stable. You could add a field with an ID | > that you maintain. Your query would then be just a TermQuery on the | > ID. | > | > Regards, | > Mossaab | > | > | > 2012/10/26 Scott Smith | > | >> I'm currently converting some lucene code to

lucene 4.0 indexReader is changed

2012-10-26 Thread Scott Smith
How do I determine if the index has been modified in 4.0? The ifchanged() and isChanged() appear to have been removed.

Lucene 4.0 delete by ID

2012-10-26 Thread Scott Smith
I'm currently converting some lucene code to 4.0. It appears that you are no longer allowed to delete a document by its ID. Is that correct? Is my only option to figure some kind of query (which obviously isn't based on ID) and do the delete from there?

Highlighting html pages

2012-10-23 Thread Scott Smith
I need to take an html page that I retrieve from my lucene search and highlight all of the terms that are part of the search. I need to skip over any html tags since I don't want any words in tags which happen to match the search to be highlighted. Note that I don't want sections of the docum

RE: Lucene reorganizing indexes

2012-07-17 Thread Scott Smith
armed up after a commit and the never ending full GCs. Greets Ralf -Ursprüngliche Nachricht- Von: Scott Smith [mailto:ssm...@mainstreamdata.com] Gesendet: Montag, 16. Juli 2012 22:29 An: java-user@lucene.apache.org Betreff: Lucene reorganizing indexes We have an application that has

Lucene reorganizing indexes

2012-07-16 Thread Scott Smith
We have an application that has to do "real time" indexing of a number of documents. What it does is wake up about every 20 seconds and updates the index with any changes that have been queued since the last time it ran. This involves adding and deleting several hundred documents. This is all

Bizarre Search order request

2012-05-25 Thread Scott Smith
I really need this on Solr, but thought I would start here as I suspect that, if it's possible, it's some kind of custom relevancy ranking that would need to be done in lucene and then used in SOLR. I will simplify the actual problem somewhat, but I think it will have the gist of what I want to

RE: MoreLikeThis Interface changes

2011-09-26 Thread Scott Smith
OK. Thanks -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Monday, September 26, 2011 12:15 PM To: java-user@lucene.apache.org Subject: Re: MoreLikeThis Interface changes On Mon, Sep 26, 2011 at 2:06 PM, Scott Smith wrote: > "is" is the input stream

RE: MoreLikeThis Interface changes

2011-09-26 Thread Scott Smith
riginal Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Wednesday, September 21, 2011 6:59 PM To: java-user@lucene.apache.org Subject: Re: MoreLikeThis Interface changes On Wed, Sep 21, 2011 at 5:17 PM, Scott Smith wrote: > I'm updating my lucene code from 3.0 to 3.4.  There

RE: MoreLikeThis Interface changes

2011-09-22 Thread Scott Smith
Understand. Thanks for the information. -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Wednesday, September 21, 2011 6:59 PM To: java-user@lucene.apache.org Subject: Re: MoreLikeThis Interface changes On Wed, Sep 21, 2011 at 5:17 PM, Scott Smith wrote: >

MoreLikeThis Interface changes

2011-09-21 Thread Scott Smith
I'm updating my lucene code from 3.0 to 3.4. There's a change in the MLT interface I'm confused about. I used the MLT.like(InputStream) method. It now appears I should change to the MLT.like(InputStreamReader, fieldname) method. Easy enough to create an InputStreamReader from an InputStream.

RE: [ANNOUNCEMENT] NLP-based Analyzer library for Lucene

2011-02-14 Thread Scott Smith
One thing to note is that the Stanford POS Tagger is licensed using GPL v2. A commercial license is available, but it doesn't appear to be free ($3k min if I read correctly). I wonder what it would take to make this available using OpenNLP which has a friendlier license. -Original Message

RE: QueryParser in 3.x

2010-09-17 Thread Scott Smith
mail.com] Sent: Friday, September 17, 2010 1:03 AM To: java-user@lucene.apache.org Subject: Re: QueryParser in 3.x On Fri, Sep 17, 2010 at 1:06 AM, Scott Smith wrote: > I recently upgraded to Lucene 3.0 and am seeing some new behavior that I > don't understand.  Perhaps someone can e

QueryParser in 3.x

2010-09-16 Thread Scott Smith
I recently upgraded to Lucene 3.0 and am seeing some new behavior that I don't understand. Perhaps someone can explain why. I have a custom analyzer. Part of the analyzer uses the AsciiFoldingFilter. If I run a word with an umlaut through that analyzer using the AnalyzerDemo code in LIA2,

RE: Numeric Range Filter - bug or documentation oversight

2010-03-09 Thread Scott Smith
Thanks for looking at this Uwe. I'll check my code again, but I tried changing it several times and it did seem to make a difference. Scott -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Saturday, March 06, 2010 3:11 AM To: java-user@lucene.apache.org Subject: R

Numeric Range Filter - bug or documentation oversight

2010-03-05 Thread Scott Smith
I've been updating from 2.4.2 to 3.0.1. I had a number of issues (The Version object in the analyzers was an "interesting" addition-I guess I don't understand the use case for them. I understand what it says; I was just surprised and it caused me some problems since I create analyzers with reflect

Highlighting phrases in 2.9

2009-09-30 Thread Scott Smith
I've been looking at the changes I have to make in my code to go from 2.4.1 to 2.9. One of the features I have is to highlight query hits in documents which meet the search criteria. If the query has a phrase, then I need to highlight the phrase, but not isolated words from the phrase which also

RE: caching an indexreader

2009-06-19 Thread Scott Smith
gt; wrote: > On Fri, Jun 19, 2009 at 2:40 PM, Scott Smith > wrote: > > In my environment, one of the concerns is that new documents are > > constantly being added (and some documents may be deleted). This means > > that when a user does a search and pages through results, it

Filters vs Queries - revisited

2009-06-19 Thread Scott Smith
As I read about Filters, it seems to me that a filter is preferred for any portion of the query string where you are setting the boost to 0 (meaning you don't want it to contribute to the relevancy score). But, relevancy is only interesting if you are displaying the documents in relevancy ord

caching an indexreader

2009-06-19 Thread Scott Smith
In my environment, one of the concerns is that new documents are constantly being added (and some documents may be deleted). This means that when a user does a search and pages through results, it is possible that there are new items coming in which affect the search-thus changing where items are

RE: Queries and Filters

2009-06-17 Thread Scott Smith
t; -Original Message- > From: Scott Smith [mailto:ssm...@mainstreamdata.com] > Sent: Wednesday, June 17, 2009 2:15 AM > To: java-user@lucene.apache.org > Subject: Queries and Filters > > The last few versions of lucene have deprecated several of the > interfaces we

RE: Getting results for a specific date

2009-06-17 Thread Scott Smith
Clarification: Obviously, I should have said "June 11" when I talked of a newer date. ____ From: Scott Smith [mailto:ssm...@mainstreamdata.com] Sent: Tue 6/16/2009 5:41 PM To: java-user@lucene.apache.org Subject: Getting results for a specific date M

Queries and Filters

2009-06-16 Thread Scott Smith
The last few versions of lucene have deprecated several of the interfaces we were using and this is necessitating a fairly major upgrade of our code (which hasn't had much done to it for several years). I'm not complaining; the changes are probably necessary. In reading LIA2, I've learned abou

RE: Determining lucene version programmatically

2009-06-16 Thread Scott Smith
().getImplementationVersion(); cheers, João On Tue, Jun 16, 2009 at 11:36 PM, Scott Smith wrote: > Is there any way to programmatically determine the version of lucene > being loaded? > > > > -- Cumprimentos, João Carlos Galaio da Silva --

Getting results for a specific date

2009-06-16 Thread Scott Smith
Mostly, our users want to see search results in reverse date order (newer hits first). I know how to do that with a Sort object and it works fine. However, sometimes our users want to do a search and get results in date order starting at a certain date. Say for example, they want to start the

Determining lucene version programmatically

2009-06-16 Thread Scott Smith
Is there any way to programmatically determine the version of lucene being loaded?

Optimization error

2009-02-02 Thread Scott Smith
I'm optimizing a database and getting the error: maxClauseCount is set to 1024 I understand what that means coming out of the query parser, but what does it mean coming from the optimizer? Scott

RE: Boosting results

2008-11-07 Thread Scott Smith
e you could make one filter on A. >> >> You could also consider a custom scorer that, added 1,000,000 to every >> category A document. >> >> How much were you boosting by? What happens if you boost by a very large >> factor? >> As in ridiculously large? &

Boosting results

2008-11-06 Thread Scott Smith
I'm interested in comments on the following problem. I have a set of documents. They fall into 3 categories. Call these categories A, B, and C. Each document has an indexed, non-tokenized field called "category" which contains A, B, or C (they are mutually exclusive categories). All

RE: Bug in CJKTokenizer

2008-07-18 Thread Scott Smith
ively. Steve On 07/18/2008 at 5:03 PM, Scott Smith wrote: > org.apache.lucene.analysis.cjk.CJKTokenizer is in the > "contrib" portion of lucene, so I'm not sure if this is the > right place to mention this or not. I was doing some > detailed analysis of how this to

Bug in CJKTokenizer

2008-07-18 Thread Scott Smith
org.apache.lucene.analysis.cjk.CJKTokenizer is in the "contrib" portion of lucene, so I'm not sure if this is the right place to mention this or not. I was doing some detailed analysis of how this tokenizer worked and noticed the following behavior (which I would classify as a bug). If you

RE: Highlighting phrases

2008-04-21 Thread Scott Smith
a customer, that will be 1 beer... On Sun, 2008-04-20 at 17:12 -0600, Scott Smith wrote: > I've written some code to highlight items from a search using the standard Highlighter class, QueryScorer, and NullFragmenter. Everything works fine except when we do phrases. If I search for &qu

Highlighting phrases

2008-04-20 Thread Scott Smith
I've written some code to highlight items from a search using the standard Highlighter class, QueryScorer, and NullFragmenter. Everything works fine except when we do phrases. If I search for "fred smith" (with the quotes), it highlights any instances of "fred smith" just as expected. However

RE: Lucene highlighting

2007-11-28 Thread Scott Smith
Since what I'm dealing with is well-formed html, I wonder if I could modify the tokenizer to skip the html elements and then use the NullFragmenter. I can probably isolate the html text. Sounds like I have a plan or at least something to try. Thanks From: M

RE: Lucene highlighting

2007-11-28 Thread Scott Smith
. What kind of documents are you indexing? Matthijs Scott Smith wrote: > I've been looking at the highlighter examples. All of them seem to deal with > fragments. I need to highlight an entire document as it is displayed (i.e., > highlight all of the keywords in it). Can someo

Lucene highlighting

2007-11-27 Thread Scott Smith
I've been looking at the highlighter examples. All of them seem to deal with fragments. I need to highlight an entire document as it is displayed (i.e., highlight all of the keywords in it). Can someone point me to some examples of this or does the highlighter code not do this? Thanks Sco

RE: de-boosting fields

2006-12-13 Thread Scott Smith
ument. I guess that all makes sense, it just means I have to be careful as to which queries I set the category boost to zero and which I don't. -Original Message----- From: Scott Smith [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 12, 2006 3:31 PM To: java-user@lucene.apache.org Subj

RE: de-boosting fields

2006-12-12 Thread Scott Smith
I've implemented the zero boost solution and it seems to be doing what I want. Thanks to everyone who had suggestions. -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Monday, December 11, 2006 11:45 AM To: java-user@lucene.apache.org Subject: Re: de-boosting fiel

RE: de-boosting fields

2006-12-09 Thread Scott Smith
int . But again, don't be surprised if one of the more expert folks comes up with a *much* better idea Best Erick On 12/8/06, Scott Smith <[EMAIL PROTECTED]> wrote: > > I have a collection of documents for which I've always returned the > results sorted on the date

de-boosting fields

2006-12-08 Thread Scott Smith
I have a collection of documents for which I've always returned the results sorted on the date/time of the document (using a sort object in the search method on my Searcher). It works great. Suddenly, I have a requirement to return the documents in relevancy order. So, that's easy (I thought)

Large index question

2006-10-12 Thread Scott Smith
Supposed I want to index 500,000 documents (average document size is 4kBs). Let's assume I create a single index and that the index is static (I'm not going to add any new documents to it). I would guess the index would be around 2GB. Now, I do searches against this on a somewhat beefy mach

RE: Performance question

2006-07-21 Thread Scott Smith
Interesting and thanks for the answer. I guess I won't write code to control the order clauses get added--one less thing to do :-) -Original Message- From: Doron Cohen [mailto:[EMAIL PROTECTED] Sent: Thursday, July 20, 2006 6:47 PM To: java-user@lucene.apache.org Subject: Re: Performanc

Performance question

2006-07-20 Thread Scott Smith
I was reading a book on SQL query tuning. The gist of it was that the way to get the best performance (fastest execution) out of a SQL select statement was to "create" execution plans where the most selective term in the "where" clause is used first, the next most selective term is used next, etc.

RE: Managing a large archival (and constantly changing) database

2006-07-07 Thread Scott Smith
Thanks to everyone who commented. Clearly, I have a lot to think about, but thanks for the help. Scott -Original Message- From: Rob Staveley (Tom) [mailto:[EMAIL PROTECTED] Sent: Friday, July 07, 2006 2:53 PM To: java-user@lucene.apache.org Subject: RE: Managing a large archival (and co

Managing a large archival (and constantly changing) database

2006-07-06 Thread Scott Smith
I've been asked to do a project which provides full-text search for a large database of articles. The expectation is that most of the articles are fairly small (<2k bytes). There will be an initial population of around 400,000 articles. There will then be approximately 2000 new articles added ea

Can lucene do this?

2006-05-11 Thread Scott Smith
I'm building an application which has to provide "real-time" searching of emails as they come in. I have a number of search strings that I need to apply against each email as it comes in and then do something with the email based on which search string(s) get a hit. My initial thought was to

RE: Deletes and Hits

2005-05-04 Thread Scott Smith
pressed,indexed> DOC1: Document stored/uncompressed,indexed> HITS: 2 DOC0: Document stored/uncompressed,indexed> DOC1: Document stored/uncompressed,indexed> See also: http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexRead er.html#isDeleted(int) Otis --- Scott Smith <[EM

Deletes and Hits

2005-04-28 Thread Scott Smith
Suppose I do a search and get a hit list. Before I access the hit list, my delete routine (running in another thread) comes along and deletes some documents. What happens if I now try to access documents that have been deleted? Scott

Assorted questions

2005-03-08 Thread Scott Smith
I needed to return my hits list in date/time order (instead of relevancy). So, I implemented a class that converted dates to an int and stored the integer as a field in my index. I passed a Sort object to the IndexSearcher (indicating that the sort field was convertible to int) to get things back

large indexes

2005-03-08 Thread Scott Smith
I have the need to create an index which will potentially have a million+ documents. I know Lucene can accomplish this. However, the other requirement is that I need to be continually updating it during the date (adding 1-30 documents/minute). I guess I had thought that I might try to have an ac