Lucene SpellChecker returns no suggetions after changing Server

2008-12-08 Thread Matthias W.
Hi, I'm using Lucene's SpellChecker (Lucene 2.1.0) class to get suggestions. Till now my testing server was a VMWare-Image from http://es.cohesiveft.com http://es.cohesiveft.com (Ubuntu 8.10, Tomcat6, Java5). Now I'm using a Debian Etch Server with Tomcat5.5 and Java6. Code-Sample: String index

Re: Issue upgrading from lucene 2.3.2 to 2.4 (moving from bitset to docidset)

2008-12-08 Thread Paul Elschot
Michael, The change from BitSet to DocIdSetIterator implies that you'll need to choose an underlying data structure yourself. A minimal approach would be to use DocIdBitSet around BitSet, but there are better ways. For your application you might consider to replace java's BitSet by lucene's Open

Issue upgrading from lucene 2.3.2 to 2.4 (moving from bitset to docidset)

2008-12-08 Thread Michael Stoppelman
Hi all, I'm working on upgrading to Lucene 2.4.0 from 2.3.2 and was trying to integrate the new DodIdSet changes since o.a.l.search.Filter#bits() method is now depreciated. For our app we actually heavily rely on bits from the Filter to do post-query filtering (I explain why below). For example,

Re: IR Pattern Language

2008-12-08 Thread Grant Ingersoll
IR is information retrieval, which I introduced, not Rob. I'm sure there are patterns that could be abstracted, I just don't know that anyone has formally done them, say like the Gang of Four did. On Dec 8, 2008, at 9:07 AM, Erick Erickson wrote: This still doesn't tell us why you care. N

Re: lucene search options

2008-12-08 Thread no spam
Yes I've seen that syntax too used to search for null values. You can do -(reporter:* AND -reporter:[* to *]) which says all values minus docs with a value. Your suggestion did the trick, thanks! On Mon, Dec 8, 2008 at 11:40 AM, Erick Erickson <[EMAIL PROTECTED]>wrote: > That'll teach me to scan

Re: IndexWriter.flush performance

2008-12-08 Thread Michael McCandless
IndexWriter.close() does a commit. Otherwise you will (in 3.0) need to do it by hand. Mike Laurent Mimoun wrote: Michael McCandless-2 wrote: So you should use commit sparingly, and, open your IndexWriter with autoCommit=false. Thank you for your respsonse. But I would be estonished

Re: IndexWriter.flush performance

2008-12-08 Thread Laurent Mimoun
Michael McCandless-2 wrote: > > > So you should use commit sparingly, and, open your IndexWriter with > autoCommit=false. > Thank you for your respsonse. But I would be estonished that no code is provided in lucene API to do the job of commiting regularly modifications : do I really hav

Re: lucene search options

2008-12-08 Thread no spam
The way I got that query was doing: new MatchAllDocsQuery().toString(). I thought the "matchalldocsquery" part was a bit odd but figured it might be a known keyword with lucene. Thanks for the help! On Mon, Dec 8, 2008 at 11:40 AM, Erick Erickson <[EMAIL PROTECTED]>wrote: > That'll teach me t

Re: lucene search options

2008-12-08 Thread Erick Erickson
That'll teach me to scan e-mail. You can't use MatchAllDocsQuery that way. What you're actually searching for is the word "matchalldocsquery" in the field "summary". Which returns nothing. Then you're subtracting any documents with reporter *mark*. That isn't what you're after at all. If you're do

Re: Open IndexReader read-only

2008-12-08 Thread Mark Miller
Chris Bamford wrote: Mark > Look for the static factory methods on IndexReader. I take it you mean IndexReader.open (dir, true) ? Yeah. If so, how do I then pass that into DelayCloseIndexSearcher() so that I can continue to rely on all the existing calls like: IndexReader reader = con

Re: lucene search options

2008-12-08 Thread no spam
Yes that is set. It works if I do a query like this: status:* -reporter:*mark* The status field only has a few possible values. On Mon, Dec 8, 2008 at 10:54 AM, Erick Erickson <[EMAIL PROTECTED]>wrote: > Have you enabled leading wildcards? They are not (or at least weren't > last I knew) enabl

RE: Indexing accented characters, then searching by any form

2008-12-08 Thread Dora
It seems that the index and search process does not work in the same way: The "tokenStream" method is called at time of search while for indexing the "resusableTokenStream" is called. Overriding resusableTokenStream (like I did for tokenStream) fixed the problem. -- View this message in context

Re: TopDocs - Get all docs?

2008-12-08 Thread Erick Erickson
I'm a great fan of not changing working code for a "might be better sometime in the far future if lots of things change" ... Erick On Mon, Dec 8, 2008 at 10:54 AM, Donna L Gresh <[EMAIL PROTECTED]> wrote: > Erick- > Thanks for the pointer; in my app the difference is between 30 > milliseconds an

Re: TopDocs - Get all docs?

2008-12-08 Thread Donna L Gresh
Erick- Thanks for the pointer; in my app the difference is between 30 milliseconds and 45 milliseconds (and this is a once-a-day kind of thing), but hey it's always worth doing something the better way in case my index ever gets a whole lot bigger or the use case changes-- thanks. Donna L. Gres

Re: lucene search options

2008-12-08 Thread Erick Erickson
Have you enabled leading wildcards? They are not (or at least weren't last I knew) enabled by default <<>> from http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-4d62118417eaef0dcb87f4370583f809848ea695 Best Erick On Mon, Dec 8, 2008 at 10:24 AM, no spam <[EMAIL PROTECTED]> wrote: > T

Re: lucene search options

2008-12-08 Thread no spam
The reason our users want to do this is because they want to search for instances where certain negative conditions are true. My client is the news industry and this is metadata for things like reporter, type, etc. Sometimes you want -reporter:mark for example and this is the only criteria to sea

Re: TopDocs - Get all docs?

2008-12-08 Thread Erick Erickson
is empid indexed? If it is this should run *much* faster if you used TermEnum/TermDocs to fetch all the empids.. FWIW Erick On Mon, Dec 8, 2008 at 9:17 AM, Donna L Gresh <[EMAIL PROTECTED]> wrote: > I have a need to get the list of all "empid"s (defined by me) in the index > so that I can re

Re: TopDocs - Get all docs?

2008-12-08 Thread Donna L Gresh
I have a need to get the list of all "empid"s (defined by me) in the index so that I can remove the ones that are "stale" by my definition; in this snippet I'm returning all the "empids" for later processing, but the core is very simple. public Vector getIndexIds() throws Exception {

Re: Open IndexReader read-only

2008-12-08 Thread Chris Bamford
Mark > Look for the static factory methods on IndexReader. I take it you mean IndexReader.open (dir, true) ? If so, how do I then pass that into DelayCloseIndexSearcher() so that I can continue to rely on all the existing calls like: IndexReader reader = contentSearcher.getIndexReader();

Re: IR Pattern Language

2008-12-08 Thread Erick Erickson
This still doesn't tell us why you care. Nor have you explained what IR stands for in your usage. Nor what you want Lucene to do in that space. It's really hard to respond to such a vague question usefully. Best Erick On Mon, Dec 8, 2008 at 3:20 AM, Robert Young <[EMAIL PROTECTED]> wrote: > I am

Re: Problem with PorterStemFilter

2008-12-08 Thread Erick Erickson
your output says you couldn't find "ugli", but you indexed "ugly". I assume that's just a typo, and the stemmer probably makes it moot anyway I don't see anything obvious in the code, but here's what I'd suggest... 1> write this out to a FSDir rather than a RAMDir, get a copy of Luke (goo

Re: I would want to know more about the lucene implementation in C++

2008-12-08 Thread Ariel
Thank you, very much. On Thu, Dec 4, 2008 at 11:33 AM, Otis Gospodnetic < [EMAIL PROTECTED]> wrote: > There is CLucene. It's not a part of Apache, but lives on SourceForge, > I think. > > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message -

Re: Open IndexReader read-only

2008-12-08 Thread Mark Miller
Look for the static factory methods on IndexReader. - Mark Chris Bamford wrote: Thanks Mark. I have identified the spot where I need to do the surgery. However, I discover that IndexReader is abstract, but it seems crazy that I need to make a concrete class for which I have no need to add a

Re: Open IndexReader read-only

2008-12-08 Thread Chris Bamford
Thanks Mark. I have identified the spot where I need to do the surgery. However, I discover that IndexReader is abstract, but it seems crazy that I need to make a concrete class for which I have no need to add any of my own logic... Is there a suitable subclass I can use? The documented one

Re: Open IndexReader read-only

2008-12-08 Thread Mark Miller
Chris Bamford wrote: So does that mean if you don't explicitly open an IndexReader, the IndexSearcher will do it for you? Or what? Right. The IndexReader takes a Directory, and the IndexSearcher takes an IndexReader - there are sugar constructors though - An IndexSearcher will also accept

Re: Fragment Highlighter Phrase?

2008-12-08 Thread Mark Miller
Ian Vink wrote: Is there a way to get phrases counted in the list of fragments that come back from Highlighter.GetBestFragments() in general. It seems to only take words into account. Ian Not sure I fully understand, but have you tried the SpanScorer? It allows the Highlighter to work with

Open IndexReader read-only

2008-12-08 Thread Chris Bamford
Hi Can someone guide me please? I have inherited a Lucene application and am attempting to update the API from 2.0 to 2.4. I note that the 2.4 CHANGELOG talks of opening an IndexReader with read-only=true to improve performance. Does anyone know how to do this? I have been combing my predeces

Re: IndexWriter.flush performance

2008-12-08 Thread Michael McCandless
Flushing is still done "synchronously" with an addDocument call. The time spent is in proportion to how large the RAM buffer is, and, how fast your IO system accepts writes. So, you'll be happily adding documents, until IW decides a flush is needed, and then it will flush (blocking) usin

Re: Improving Indexing Performance

2008-12-08 Thread buFka
It is interesting and i think, it will help us :) Thanks! buFka -- View this message in context: http://www.nabble.com/Improving-Indexing-Performance-tp20890720p20891965.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --

Re: Improving Indexing Performance

2008-12-08 Thread Karsten F.
Hi buFka, take a look to http://wiki.apache.org/lucene-java/ImproveIndexingSpeed e.g. your example does not set mergeFactor or RAMBufferSizeMB I also like the last tip: "Run a Java profiler" Because in my case, the leak of performance vanished after I switched from jdom to saxon. (we are indexi

Improving Indexing Performance

2008-12-08 Thread buFka
Hi all, I can already index with Lucene a very large database (8.0 million entries). For indexing and search, i'm using the follow example: http://kalanir.blogspot.com/2008/06/indexing-database-using-apache-lucene.html The indexing takes about 4 hours. Can I speed up this process? -- View th

Re: IR Pattern Language

2008-12-08 Thread Robert Young
I am not trying to solve a specific problem right now, I'm just looking for a set of patterns for solving common problems in text processing and IR. Things like token sources and filters, query parsing, index distribution. Cheers Rob On Mon, Dec 8, 2008 at 2:49 AM, Grant Ingersoll <[EMAIL PROTECT