best practices for generating queries from users questions?

2023-09-20 Thread qrdl kaggle
Given a knowledge base indexed by lucene, users often pose searches via questions. Is there a good reference code/paper/doc on how to translate those natural language questions into an effective and accurate lucene query?

Re: Autosuggest/Autocomplete: What are the best practices to build Suggester?

2021-11-18 Thread Michael Wechner
a little confused, because all the implementation examples I found were using an in-memory directory. My bad, everything good now, thank you :-) Michael Am 18.11.21 um 09:47 schrieb Michael Wechner: Hi I recently started to use the Autosuggest/Autocomplete package as suggested by Ro

Autosuggest/Autocomplete: What are the best practices to build Suggester?

2021-11-18 Thread Michael Wechner
Hi I recently started to use the Autosuggest/Autocomplete package as suggested by Robert https://www.mail-archive.com/java-user@lucene.apache.org/msg51403.html which works very fine, thanks again for your help :-) But it is not clear to me what are the best practices building a suggester

Re: Best practices in boosting by proximity?

2013-05-05 Thread Gili Nachum
Hi Karl, I guess I must have individual terms in my query, along side the SHOULD phrases with slops, since I don't want to miss on results , even if the terms distance is huge. Slop - will enrich the phrases with them. Shingles - Good idea. I'll index bi-grams if performance because an issue. In

Re: Best practices in boosting by proximity?

2013-05-04 Thread Karl Wettin
I just realized this mail contained several incomplete sentences. I blame norwegian beers. Please allow me to try it once again: The most simple solution is to make use of slop in PhraseQuery, SpanNearQuery, etc(?). Also consider permutations of #isInOrder() with alternative query boosts. Eve

Re: Best practices in boosting by proximity?

2013-05-04 Thread Karl Wettin
The most simple solution is to use of slop in PhraseQuery, SpanNearQuery, etc(?). Also consider permutations of #isInOrder() with alternative query boosts. Even though slop will create a greater score the closer the terms are, it might still in some cases (usually when combined with other subq

Best practices in boosting by proximity?

2013-05-04 Thread Gili Nachum
Hi. *I would like for hits that contain the search terms in proximity to each other to be ranked higher than hits in which the terms are scattered across the doc. Wondering if there's a best practice to achieve that?* I also want that all hits will contain all of the search terms (implicit AND): *

Re: HA Configuration / Best Practices

2011-02-09 Thread Ian Lea
t 2 servers at the WS (Lucene) tier that implies at > least 2 indexes. > > As far as best practices go: > > 1) What is the typical architecture for Lucene in a HA configuration? > > 2) How are indexes typically maintained in some sort of sync?  i.e. if a > request comes

HA Configuration / Best Practices

2011-02-08 Thread BrightMinds Dev
servers at the WS (Lucene) tier that implies at least 2 indexes. As far as best practices go: 1) What is the typical architecture for Lucene in a HA configuration? 2) How are indexes typically maintained in some sort of sync? i.e. if a request comes in to do a search on the UI tier and returns a

Re: Best practices for multiple languages?

2011-01-20 Thread Paul Libbrecht
Isn't this approach somewhat bad for term-frequency? Words that would appear in several languages would be a lot more frequent (hence less significative). I'm still preferring the split-field method with a proper query expansion. This way, the term-frequency is evaluated on the corpus of one lan

Re: AW: Best practices for multiple languages?

2011-01-20 Thread Bill Janssen
Dominique Bejean wrote: > Hi, > > During a recent Solr project we needed to index document in a lot of > languages. The natural solution with Lucene and Solr is to define one > field per languages. Each field is configured in the schema.xml file > to use a language specific processing (tokenizin

Re: AW: Best practices for multiple languages?

2011-01-20 Thread Dominique Bejean
Hi, During a recent Solr project we needed to index document in a lot of languages. The natural solution with Lucene and Solr is to define one field per languages. Each field is configured in the schema.xml file to use a language specific processing (tokenizing, stop words, stemmer, ...). Th

Re: AW: Best practices for multiple languages?

2011-01-19 Thread Bill Janssen
Paul Libbrecht wrote: > I did several changes of this sort and the precision and recall > measures went better in particular in presence of language-indication > failure which happened to be very common in our authoring environment. There are two kinds of failures: no language, or wrong languag

Re: AW: Best practices for multiple languages?

2011-01-19 Thread Trejkaz
On Thu, Jan 20, 2011 at 9:08 AM, Paul Libbrecht wrote: >>> Wouldn't it be better to prefer precise matches (a field that is >>> analyzed with StandardAnalyzer for example) but also allow matches are >>> stemmed. >> >> StandardAnalyzer isn't quite precise, is it?  StandardFilter does some >> kind o

Re: AW: Best practices for multiple languages?

2011-01-19 Thread Paul Libbrecht
Le 19 janv. 2011 à 20:56, Bill Janssen a écrit : > Paul Libbrecht wrote: > >> So you are only indexing "analyzed" and querying "analyzed". Is that correct? > > Yes, that's correct. I fall back to StandardAnalyzer if no > language-specific analyzer is available. > >> Wouldn't it be better to

Re: AW: Best practices for multiple languages?

2011-01-19 Thread Bill Janssen
Paul Libbrecht wrote: > So you are only indexing "analyzed" and querying "analyzed". Is that correct? Yes, that's correct. I fall back to StandardAnalyzer if no language-specific analyzer is available. > Wouldn't it be better to prefer precise matches (a field that is > analyzed with StandardA

Re: AW: Best practices for multiple languages?

2011-01-19 Thread Paul Libbrecht
So you are only indexing "analyzed" and querying "analyzed". Is that correct? Wouldn't it be better to prefer precise matches (a field that is analyzed with StandardAnalyzer for example) but also allow matches are stemmed. paul Le 19 janv. 2011 à 19:21, Bill Janssen a écrit : > Clemens Wyss w

Re: AW: Best practices for multiple languages?

2011-01-19 Thread Bill Janssen
Clemens Wyss wrote: > > 1) Docs in different languages -- every document is one language > > 2) Each document has fields in different languages > We mainly have 1)-models I've recently done this for UpLib. I run a language-guesser over the document to identify the primary language when the docu

Re: Best practices for multiple languages?

2011-01-19 Thread Paul Libbrecht
Because it does not find "junks" when you search "junk". Or... chevaux when you search cheval. paul Le 19 janv. 2011 à 18:59, Luca Rondanini a écrit : > why not just using the StandardAnalyzer? it works pretty well even with > Asian languages! > > > > On Wed, Jan 19, 2011 at 12:23 AM, Shai E

Re: Best practices for multiple languages?

2011-01-19 Thread Luca Rondanini
why not just using the StandardAnalyzer? it works pretty well even with Asian languages! On Wed, Jan 19, 2011 at 12:23 AM, Shai Erera wrote: > If you index documents, each in a different language, but all its fields > are > of the same language, then what you can do is the following: > > Creat

Re: Best practices for multiple languages?

2011-01-19 Thread Shai Erera
If you index documents, each in a different language, but all its fields are of the same language, then what you can do is the following: Create separate indexes per language --- This will work and is not too hard to set up. Requires some mainten

Re: Best practices for multiple languages?

2011-01-18 Thread Paul Libbrecht
But for this, you need a skillfully designed: - set of fields - multiplexing analyzer - query expansion In one of my projects, we do not split language by fields and it's a pain... I'm having recurring issues in one sense or the other. - the "die" example that Oti s mentioned is a good one: stop-

AW: Best practices for multiple languages?

2011-01-18 Thread Clemens Wyss
> An: java-user@lucene.apache.org > Betreff: Re: Best practices for multiple languages? > > Hi > > There are two types of multi-language docs: > 1) Docs in different languages -- every document is one language > 2) Each document has fields in different languages > >

Re: Best practices for multiple languages?

2011-01-18 Thread Vinaya Kumar Thimmappa
I think we should be using lucene with snowball jar's which means one index for all languages (ofcourse size of index is always a matter of concerns). Hope this helps. -vinaya On Tuesday 18 January 2011 11:23 PM, Clemens Wyss wrote: What is the "best practice" to support multiple languages, i

Re: Best practices for multiple languages?

2011-01-18 Thread Shai Erera
Hi There are two types of multi-language docs: 1) Docs in different languages -- every document is one language 2) Each document has fields in different languages I've dealt with both, and there are different solutions to each. Which of them is yours? Shai On Tue, Jan 18, 2011 at 7:53 PM, Cleme

Re: Best practices for multiple languages?

2011-01-18 Thread Otis Gospodnetic
lucene.com/ - Original Message > From: Clemens Wyss > To: "java-user@lucene.apache.org" > Sent: Tue, January 18, 2011 12:53:57 PM > Subject: Best practices for multiple languages? > > What is the "best practice" to support multiple languages, i.e. >Lu

Best practices for multiple languages?

2011-01-18 Thread Clemens Wyss
What is the "best practice" to support multiple languages, i.e. Lucene-Documents that have multiple language content/fields? Should a) each language be indexed in a seperate index/directory or should b) the Documents (in a single directory) hold the diverse localized fields? We most often will b

RE: Best practices for searcher memory usage?

2010-07-16 Thread Toke Eskildsen
On Thu, 2010-07-15 at 20:53 +0200, Christopher Condit wrote: [Toke: 140GB single segment is huge] > Sorry - I wasn't clear here. The total index size ends up being 140GB > but to try to help improve performance we build 50 separate indexes > (which end up being a bit under 3gb each) and then ope

RE: Best practices for searcher memory usage?

2010-07-15 Thread Christopher Condit
> [Toke: No frequent updates] > > So everything is rebuild from scratch each time? Or do you mean that you're > only adding new documents, not changing old ones? Everything is reindexed from scratch - indexing speed is not essential to us... > Either way, optimizing to a single 140GB segment is

RE: Best practices for searcher memory usage?

2010-07-15 Thread Toke Eskildsen
On Wed, 2010-07-14 at 20:28 +0200, Christopher Condit wrote: [Toke: No frequent updates] > Correct - in fact there are no updates and no deletions. We index > everything offline when necessary and just swap the new index in... So everything is rebuild from scratch each time? Or do you mean that

Re: Best practices for searcher memory usage?

2010-07-14 Thread Lance Norskog
Glen, thank you for this very thorough and informative post. Lance Norskog - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Best practices for searcher memory usage?

2010-07-14 Thread Glen Newton
There are a number of strategies, on the Java or OS side of things: - Use huge pages[1]. Esp on 64 bit and lots of ram. For long running, large memory (and GC busy) applications, this has achieved significant improvements. Like 300% on EJBs. See [2],[3],[4]. For a great article introducing and benc

RE: Best practices for searcher memory usage?

2010-07-14 Thread Christopher Condit
Hi Toke- > > * 20 million documents [...] > > * 140GB total index size > > * Optimized into a single segment > > I take it that you do not have frequent updates? Have you tried to see if you > can get by with more segments without significant slowdown? Correct - in fact there are no updates and n

Re: Best practices for searcher memory usage?

2010-07-14 Thread Michael McCandless
You can also set the termsIndexDivisor when opening the IndexReader. The terms index is an in-memory data structure and it an consume ALOT of RAM when your index has many unique terms. Flex (only on Lucene's trunk / next major release (4.0)) has reduced this RAM usage (as well as the RAM required

Re: Best practices for searcher memory usage?

2010-07-14 Thread Toke Eskildsen
On Tue, 2010-07-13 at 23:49 +0200, Christopher Condit wrote: > * 20 million documents [...] > * 140GB total index size > * Optimized into a single segment I take it that you do not have frequent updates? Have you tried to see if you can get by with more segments without significant slowdown? > Th

Re: Best practices for searcher memory usage?

2010-07-13 Thread Paul Libbrecht
Le 13-juil.-10 à 23:49, Christopher Condit a écrit : * are there performance optimizations that I haven't thought of? The first and most important one I'd think of is get rid of NFS. You can happily do a local copy which might, even for 10 Gb take less than 30 seconds at server start. pa

Best practices for searcher memory usage?

2010-07-13 Thread Christopher Condit
We're getting up there in terms of corpus size for our Lucene indexing application: * 20 million documents * all fields need to be stored * 10 short fields / document * 1 long free text field / document (analyzed with a custom shingle-based analyzer) * 140GB total index size * Optimized into a s

Re: Realtime search best practices

2009-10-13 Thread Michael McCandless
On Tue, Oct 13, 2009 at 5:23 AM, Ganesh wrote: > In case of 2.4.1, the reader after reopen, will be warmed before actual use. You mean you must warm it after you call reopen, before using it, right? > In 2.9, public void setMergedSegmentWarmer(IndexWriter.IndexReaderWarmer > warmer), does warm

Re: Realtime search best practices

2009-10-13 Thread Michael McCandless
OK I opened https://issues.apache.org/jira/browse/LUCENE-1976. Mike On Tue, Oct 13, 2009 at 6:05 AM, Michael McCandless wrote: > I agree isCurrent doesn't work right for an NRT reader.  Right now, it > will always return "true" because it's sharing the segmentInfos in use > by the writer. > > Si

Re: Realtime search best practices

2009-10-13 Thread Michael McCandless
I agree isCurrent doesn't work right for an NRT reader. Right now, it will always return "true" because it's sharing the segmentInfos in use by the writer. Similarly, getVersion will lie. I'll open an issue to track how to fix it. Mike On Mon, Oct 12, 2009 at 6:12 PM, Yonik Seeley wrote: > Go

Re: Realtime search best practices

2009-10-13 Thread Ganesh
performance? Does warming necessarly required in 2.9? If we do warming for the very first time is not enough? Do we need to do it on every request? Regards Ganesh - Original Message - From: "Yonik Seeley" To: Sent: Tuesday, October 13, 2009 3:42 AM Subject: Re: Realtime s

Re: Realtime search best practices

2009-10-12 Thread Yonik Seeley
Good point on isCurrent - I think it should only be with respect to the latest index commit point? and we should clarify that in the javadoc. [...] > // but what does the nrtReader say? > // it does not have access to the most recent commit > // state, as there's been a commit (with documents) > /

Re: Realtime search best practices

2009-10-12 Thread melix
wever, it's still a bit unclear on how to efficiently do it. >> >> > >> >> > Is the following implementation the good way to do achieve it ? The >> >> context >> >> > is concurrent read/writes on an index : >> &g

Re: Realtime search best practices

2009-10-12 Thread Jake Mannix
melix > >> >> wrote: > >> >> > > >> >> > Hi, > >> >> > > >> >> > I'm going to replace an old reader/writer synchronization mechanism > we > >> >> had > >> >> > implemented with the

Re: Realtime search best practices

2009-10-12 Thread John Wang
I think it was my email Yonik responded to and he is right, I was being lazy and didn't read the javadoc very carefully.My bad. Thanks for the javadoc change. -John On Mon, Oct 12, 2009 at 1:57 PM, Yonik Seeley wrote: > On Mon, Oct 12, 2009 at 4:35 PM, Jake Mannix > wrote: > > It may be surpri

Re: Realtime search best practices

2009-10-12 Thread Jake Mannix
On Mon, Oct 12, 2009 at 1:57 PM, Yonik Seeley wrote: > On Mon, Oct 12, 2009 at 4:35 PM, Jake Mannix > wrote: > > It may be surprising, but in fact I have read that > > javadoc. > > It was not your email I responded to. > Sorry, my bad then - you said "guys" and John and I were the last two to b

Re: Realtime search best practices

2009-10-12 Thread Michael McCandless
n) then it is necessary to first call >> >> >> IndexWriter.commit. >> >> >> >> >> >> Mike >> >> >> >> >> >> On Mon, Oct 12, 2009 at 5:24 AM, melix >> >>

Re: Realtime search best practices

2009-10-12 Thread Jake Mannix
gt;> >> > I'm going to replace an old reader/writer synchronization mechanism > we > >> >> had > >> >> > implemented with the new near realtime search facilities in Lucene > >> 2.9. >

Re: Realtime search best practices

2009-10-12 Thread Yonik Seeley
On Mon, Oct 12, 2009 at 4:35 PM, Jake Mannix wrote: >  It may be surprising, but in fact I have read that > javadoc. It was not your email I responded to. >  It talks about not needing to close the > writer, but doesn't specifically talk about the what > the relationship between commit() calls a

Re: Realtime search best practices

2009-10-12 Thread Michael McCandless
gt; > However, it's still a bit unclear on how to efficiently do it. >> >> > >> >> > Is the following implementation the good way to do achieve it ? The >> >> context >> >> > is concurrent read/writes on an index : >> >> &

Re: Realtime search best practices

2009-10-12 Thread Jason Rutherglen
and cache a searcher for some seconds, but I'm not sure it's the best thing > to do. > > Thanks, > > Cedric > > > -- > View this message in context: > http://www.nabble.com/Realtime-search-best-practices-tp25852756p25852756.html > Sent from the Lucene - Java User

Re: Realtime search best practices

2009-10-12 Thread Jake Mannix
> > 2. create a writer on this directory > >> > 3. on each write request, add document to the writer > >> > 4. on each read request, > >> > a. use writer.getReader() to obtain an up-to-date reader > >> > b. create an IndexSearcher with that reader

Re: Realtime search best practices

2009-10-12 Thread Yonik Seeley
eader() to obtain an up-to-date reader >> >  b. create an IndexSearcher with that reader >> >  c. perform Query >> >  d. close IndexSearcher >> > 5. on application close >> >  a. close writer >> >  b. close directory >> > >> &

Re: Realtime search best practices

2009-10-12 Thread John Wang
each request. I could introduce some kind of delay > > and cache a searcher for some seconds, but I'm not sure it's the best > thing > > to do. > > > > Thanks, > > > > Cedric > > > > > > -- > > View this message

Re: Realtime search best practices

2009-10-12 Thread Jake Mannix
On Mon, Oct 12, 2009 at 12:26 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Mon, Oct 12, 2009 at 3:17 PM, Jake Mannix > wrote: > > > Wait, so according to the javadocs, the IndexReader which you got from > > the IndexWriter forwards calls to reopen() back to > IndexWriter.getRea

Re: Realtime search best practices

2009-10-12 Thread Michael McCandless
On Mon, Oct 12, 2009 at 3:17 PM, Jake Mannix wrote: > Wait, so according to the javadocs, the IndexReader which you got from > the IndexWriter forwards calls to reopen() back to IndexWriter.getReader(), > which means that if the user has a NRT reader, and the user keeps calling > reopen() on it,

Re: Realtime search best practices

2009-10-12 Thread Jake Mannix
application close > > a. close writer > > b. close directory > > > > While this seems to be ok, I'm really wondering about the performance of > > opening a searcher for each request. I could introduce some kind of delay > > and cache a searcher for some seconds, b

Re: Realtime search best practices

2009-10-12 Thread Michael McCandless
be ok, I'm really wondering about the performance of > opening a searcher for each request. I could introduce some kind of delay > and cache a searcher for some seconds, but I'm not sure it's the best thing > to do. > > Thanks, > > Cedric > > > -- > View

Re: Realtime search best practices

2009-10-12 Thread Jake Mannix
not sure it's the best thing > to do. > > Thanks, > > Cedric > > > -- > View this message in context: > http://www.nabble.com/Realtime-search-best-practices-tp25852756p25852756.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. &g

Realtime search best practices

2009-10-12 Thread melix
#x27;m not sure it's the best thing to do. Thanks, Cedric -- View this message in context: http://www.nabble.com/Realtime-search-best-practices-tp25852756p25852756.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. ---

RE: best practices for reloading an index for a searcher

2007-12-06 Thread Beyer,Nathan
ailto:[EMAIL PROTECTED] Sent: Thursday, December 06, 2007 12:10 PM To: java-user@lucene.apache.org Subject: Re: best practices for reloading an index for a searcher If by reload you mean closing and opening the reader, then yes. You need to do this in order to see the changes since the *last* tim

Re: best practices for reloading an index for a searcher

2007-12-06 Thread Erick Erickson
If by reload you mean closing and opening the reader, then yes. You need to do this in order to see the changes since the *last* time you opened the reader. Think of it as the reader taking a snapshot of the index and using that for its lifetime. Be aware that opening a reader (and running the fi

best practices for reloading an index for a searcher

2007-12-06 Thread Beyer,Nathan
I did some searching on the lucene site and wiki, but didn't quite find what I was looking for in regards to a basic approach to how and when to reload index data. I have a long running process that will be continually indexing and concurrently searching the same index and I'm looking for a basic a

RE: best practices

2006-01-17 Thread Chris Hostetter
e.org : To: java-user@lucene.apache.org : Subject: RE: best practices : : If that's it, that's fine. I guess I had in mind something else? For : example, one of mine uses categories (something mentioned quite a bit), : but it has some slight differences from what I've seen before. Items

RE: best practices

2006-01-17 Thread John Powers
e I said, if this is wiki is it, perfect!Maybe it is what I was thinking of. -Original Message- From: Pasha Bizhan [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 17, 2006 11:38 AM To: java-user@lucene.apache.org Subject: RE: best practices Hi, > -Original Message- &

RE: best practices

2006-01-17 Thread Pasha Bizhan
Hi, > -Original Message- > From: John Powers [mailto:[EMAIL PROTECTED] > > Is there any repository of best practices? Does LIA represent that? > I was thinking about a blog or something that everyone could > post their solutions into. I think http://wiki.apache.

best practices

2006-01-17 Thread John Powers
Hia Is there any repository of best practices? Does LIA represent that? I was thinking about a blog or something that everyone could post their solutions into.I've written only 4 implementations of lucene, but each was so very different, I was thinking it might be nice to know what eve

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-07-18 Thread Chris Lamprecht
See the paper at: http://labs.google.com/papers/mapreduce.html "MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a re

RE: Best Practices for Distributing Lucene Indexing and Searching

2005-07-18 Thread Peter Gelderbloem
I am thinking of having a cluster of one indexer and a few searchers 1 to n. The indexer will consist of a number of stages as defined in SEDA. I must still do this decomposition. the resulting index will be published via message q to the searchers that will stop doing searches long enough to upda

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-07-15 Thread Andrzej Bialecki
Paul Smith wrote: I'm not sure how generic or Nutch-specific Doug and Mike's MapReduce code is in Nutch, I haven't been paying close enough attention. Me too.. :) I didn't even know Nutch was now fully in the ASF, and I'm a Member... :-$ Let me pipe in on behalf of the Nutch project... T

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-07-14 Thread Paul Smith
On 15/07/2005, at 3:57 PM, Otis Gospodnetic wrote: The problem that I saw (from your email only) with the "ship the full little index to the Queen" approach is that, from what I understand, you eventually do addIndexes(Directory[]) in there, and as this optimizes things in the end, this means y

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-07-14 Thread Otis Gospodnetic
t;>> an insignificant time. You also have to use bookkeeping to work > > >>>> out > >>>> > >>>> if a 'job' has not been completed in time (maybe failure by the > >>>> worker) and decide whether the job should be resubmi

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-07-14 Thread Erik Hatcher
On Jul 14, 2005, at 9:45 PM, Paul Smith wrote: Cl, I should go have a look at that.. That begs another question though, where does Nutch stand in terms of the ASF? Did I read (or dream) that Nutch may be coming in under ASF? I guess I should get myself subscribed to the Nutch mailing

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-07-14 Thread Paul Smith
lucene system based on this architecture? Any advice would be greatly appreciated. Peter Gelderbloem Registered in England 3186704 -Original Message- From: Luke Francl [mailto:[EMAIL PROTECTED] Sent: 13 May 2005 22:04 To: java-user@lucene.apache.org Subject: Re: Best Practices

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-07-14 Thread Paul Smith
essage- From: Luke Francl [mailto:[EMAIL PROTECTED] Sent: 13 May 2005 22:04 To: java-user@lucene.apache.org Subject: Re: Best Practices for Distributing Lucene Indexing and Searching On Tue, 2005-03-01 at 19:23, Chris Hostetter wrote: I don't really consider reading/writing to an NF

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-07-14 Thread Erik Hatcher
edu/~mdw/proj/seda/ I am just reading up on it now. Does anyone have experience building a lucene system based on this architecture? Any advice would be greatly appreciated. Peter Gelderbloem Registered in England 3186704 -Original Message- From: Luke Francl [mailto:[EMAIL

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-07-14 Thread Paul Smith
w/proj/seda/ I am just reading up on it now. Does anyone have experience building a lucene system based on this architecture? Any advice would be greatly appreciated. Peter Gelderbloem Registered in England 3186704 -Original Message- From: Luke Francl [mailto:[EMAIL PROTECT

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-07-14 Thread Otis Gospodnetic
ecture? Any advice would be > greatly > > appreciated. > > > > Peter Gelderbloem > > > >Registered in England 3186704 > > -Original Message- > > From: Luke Francl [mailto:[EMAIL PROTECTED] > > Sent: 13 May 2005 22:04 > > To: java-user

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-07-14 Thread Paul Smith
---Original Message- From: Luke Francl [mailto:[EMAIL PROTECTED] Sent: 13 May 2005 22:04 To: java-user@lucene.apache.org Subject: Re: Best Practices for Distributing Lucene Indexing and Searching On Tue, 2005-03-01 at 19:23, Chris Hostetter wrote: I don't really consider reading/writing

RE: Best Practices for Distributing Lucene Indexing and Searching

2005-07-14 Thread Peter Gelderbloem
Gelderbloem Registered in England 3186704 -Original Message- From: Luke Francl [mailto:[EMAIL PROTECTED] Sent: 13 May 2005 22:04 To: java-user@lucene.apache.org Subject: Re: Best Practices for Distributing Lucene Indexing and Searching On Tue, 2005-03-01 at 19:23, Chris Hostetter wrote

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-05-13 Thread Luke Francl
On Tue, 2005-03-01 at 19:23, Chris Hostetter wrote: > I don't really consider reading/writing to an NFS mounted FSDirectory to > be viable for the very reasons you listed; but I haven't really found any > evidence of problems if you take they approach that a single "writer" > node indexes to local

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-03-09 Thread Doug Cutting
Yonik Seeley wrote: I'm trying to support an interface where documents can be added one at a time at a high rate (via HTTP POST). You don't know all of the documents ahead of time, so you can't delete them all ahead of time. A simple solution is to queue documents as they're posted. When either

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-03-09 Thread Yonik Seeley
I'm trying to support an interface where documents can be added one at a time at a high rate (via HTTP POST). You don't know all of the documents ahead of time, so you can't delete them all ahead of time. Given this constraint, it seems like you can do one of two things: 1) collect all the docume

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-03-09 Thread Doug Cutting
Yonik Seeley wrote: This strategy looks very promising. One drawback is that documents must be added directly to the main index for this to be efficient. This is a bit of a problem if there is a document uniqueness requirement (a unique id field). This is easy to do with a single index. Here's th

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-03-09 Thread Yonik Seeley
This strategy looks very promising. One drawback is that documents must be added directly to the main index for this to be efficient. This is a bit of a problem if there is a document uniqueness requirement (a unique id field). If one takes the approach of adding docs to a separate lucene index