Hi,
I'm using Lucene's SpellChecker (Lucene 2.1.0) class to get suggestions.
Till now my testing server was a VMWare-Image from http://es.cohesiveft.com
http://es.cohesiveft.com (Ubuntu 8.10, Tomcat6, Java5).
Now I'm using a Debian Etch Server with Tomcat5.5 and Java6.
Code-Sample:
String index
Michael,
The change from BitSet to DocIdSetIterator implies that you'll
need to choose an underlying data structure yourself.
A minimal approach would be to use DocIdBitSet around
BitSet, but there are better ways.
For your application you might consider to replace java's BitSet by
lucene's Open
Hi all,
I'm working on upgrading to Lucene 2.4.0 from 2.3.2 and was trying to
integrate the new DodIdSet changes since o.a.l.search.Filter#bits() method
is now depreciated. For our app we actually heavily rely on bits from the
Filter to do post-query filtering (I explain why below).
For example,
IR is information retrieval, which I introduced, not Rob. I'm sure
there are patterns that could be abstracted, I just don't know that
anyone has formally done them, say like the Gang of Four did.
On Dec 8, 2008, at 9:07 AM, Erick Erickson wrote:
This still doesn't tell us why you care. N
Yes I've seen that syntax too used to search for null values. You can do
-(reporter:* AND -reporter:[* to *]) which says all values minus docs with a
value.
Your suggestion did the trick, thanks!
On Mon, Dec 8, 2008 at 11:40 AM, Erick Erickson <[EMAIL PROTECTED]>wrote:
> That'll teach me to scan
IndexWriter.close() does a commit.
Otherwise you will (in 3.0) need to do it by hand.
Mike
Laurent Mimoun wrote:
Michael McCandless-2 wrote:
So you should use commit sparingly, and, open your IndexWriter with
autoCommit=false.
Thank you for your respsonse.
But I would be estonished
Michael McCandless-2 wrote:
>
>
> So you should use commit sparingly, and, open your IndexWriter with
> autoCommit=false.
>
Thank you for your respsonse.
But I would be estonished that no code is provided in lucene API to do the
job of commiting regularly modifications : do I really hav
The way I got that query was doing:
new MatchAllDocsQuery().toString(). I thought the "matchalldocsquery" part
was a bit odd but figured it might be a known keyword with lucene.
Thanks for the help!
On Mon, Dec 8, 2008 at 11:40 AM, Erick Erickson <[EMAIL PROTECTED]>wrote:
> That'll teach me t
That'll teach me to scan e-mail. You can't use MatchAllDocsQuery
that way.
What you're actually searching for is the word "matchalldocsquery"
in the field "summary". Which returns nothing. Then you're subtracting
any documents with reporter *mark*. That isn't what you're after at all.
If you're do
Chris Bamford wrote:
Mark
> Look for the static factory methods on IndexReader.
I take it you mean IndexReader.open (dir, true) ?
Yeah.
If so, how do I then pass that into DelayCloseIndexSearcher() so that
I can continue to rely on all the existing calls like:
IndexReader reader = con
Yes that is set. It works if I do a query like this:
status:* -reporter:*mark*
The status field only has a few possible values.
On Mon, Dec 8, 2008 at 10:54 AM, Erick Erickson <[EMAIL PROTECTED]>wrote:
> Have you enabled leading wildcards? They are not (or at least weren't
> last I knew) enabl
It seems that the index and search process does not work in the same way:
The "tokenStream" method is called at time of search while for indexing the
"resusableTokenStream" is called.
Overriding resusableTokenStream (like I did for tokenStream) fixed the
problem.
--
View this message in context
I'm a great fan of not changing working code for a "might be
better sometime in the far future if lots of things change" ...
Erick
On Mon, Dec 8, 2008 at 10:54 AM, Donna L Gresh <[EMAIL PROTECTED]> wrote:
> Erick-
> Thanks for the pointer; in my app the difference is between 30
> milliseconds an
Erick-
Thanks for the pointer; in my app the difference is between 30
milliseconds and 45 milliseconds (and this is a once-a-day kind of thing),
but hey it's always worth doing something the better way in case my index
ever gets a whole lot bigger or the use case changes-- thanks.
Donna L. Gres
Have you enabled leading wildcards? They are not (or at least weren't
last I knew) enabled by default
<<>>
from
http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-4d62118417eaef0dcb87f4370583f809848ea695
Best
Erick
On Mon, Dec 8, 2008 at 10:24 AM, no spam <[EMAIL PROTECTED]> wrote:
> T
The reason our users want to do this is because they want to search for
instances where certain negative conditions are true. My client is the news
industry and this is metadata for things like reporter, type, etc.
Sometimes you want -reporter:mark for example and this is the only criteria
to sea
is empid indexed? If it is this should run *much* faster if you used
TermEnum/TermDocs to fetch all the empids..
FWIW
Erick
On Mon, Dec 8, 2008 at 9:17 AM, Donna L Gresh <[EMAIL PROTECTED]> wrote:
> I have a need to get the list of all "empid"s (defined by me) in the index
> so that I can re
I have a need to get the list of all "empid"s (defined by me) in the index
so that I can remove the ones that are "stale" by my definition; in this
snippet I'm returning all the "empids" for later processing, but the core
is very simple.
public Vector getIndexIds() throws Exception {
Mark
> Look for the static factory methods on IndexReader.
I take it you mean IndexReader.open (dir, true) ?
If so, how do I then pass that into DelayCloseIndexSearcher() so that I
can continue to rely on all the existing calls like:
IndexReader reader = contentSearcher.getIndexReader();
This still doesn't tell us why you care. Nor have you explained
what IR stands for in your usage. Nor what you want Lucene
to do in that space. It's really hard to respond to such a vague
question usefully.
Best
Erick
On Mon, Dec 8, 2008 at 3:20 AM, Robert Young <[EMAIL PROTECTED]> wrote:
> I am
your output says you couldn't find "ugli", but you indexed "ugly". I
assume that's just a typo, and the stemmer probably makes it moot
anyway
I don't see anything obvious in the code, but here's what I'd suggest...
1> write this out to a FSDir rather than a RAMDir, get a copy of Luke
(goo
Thank you, very much.
On Thu, Dec 4, 2008 at 11:33 AM, Otis Gospodnetic <
[EMAIL PROTECTED]> wrote:
> There is CLucene. It's not a part of Apache, but lives on SourceForge,
> I think.
>
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message -
Look for the static factory methods on IndexReader.
- Mark
Chris Bamford wrote:
Thanks Mark.
I have identified the spot where I need to do the surgery. However, I
discover that IndexReader is abstract, but it seems crazy that I need
to make a concrete class for which I have no need to add a
Thanks Mark.
I have identified the spot where I need to do the surgery. However, I
discover that IndexReader is abstract, but it seems crazy that I need to
make a concrete class for which I have no need to add any of my own
logic... Is there a suitable subclass I can use? The documented one
Chris Bamford wrote:
So does that mean if you don't explicitly open an IndexReader, the
IndexSearcher will do it for you? Or what?
Right. The IndexReader takes a Directory, and the IndexSearcher takes an
IndexReader - there are sugar constructors though - An IndexSearcher
will also accept
Ian Vink wrote:
Is there a way to get phrases counted in the list of fragments that come
back from Highlighter.GetBestFragments() in general.
It seems to only take words into account.
Ian
Not sure I fully understand, but have you tried the SpanScorer? It
allows the Highlighter to work with
Hi
Can someone guide me please?
I have inherited a Lucene application and am attempting to update the
API from 2.0 to 2.4.
I note that the 2.4 CHANGELOG talks of opening an IndexReader with
read-only=true to improve performance. Does anyone know how to do this?
I have been combing my predeces
Flushing is still done "synchronously" with an addDocument call. The
time spent is in proportion to how large the RAM buffer is, and, how
fast your IO system accepts writes.
So, you'll be happily adding documents, until IW decides a flush is
needed, and then it will flush (blocking) usin
It is interesting and i think, it will help us :)
Thanks!
buFka
--
View this message in context:
http://www.nabble.com/Improving-Indexing-Performance-tp20890720p20891965.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
--
Hi buFka,
take a look to
http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
e.g. your example does not set mergeFactor or RAMBufferSizeMB
I also like the last tip: "Run a Java profiler"
Because in my case, the leak of performance vanished after I switched from
jdom to saxon.
(we are indexi
Hi all,
I can already index with Lucene a very large database (8.0 million entries).
For indexing and search, i'm using the follow example:
http://kalanir.blogspot.com/2008/06/indexing-database-using-apache-lucene.html
The indexing takes about 4 hours. Can I speed up this process?
--
View th
I am not trying to solve a specific problem right now, I'm just looking for
a set of patterns for solving common problems in text processing and IR.
Things like token sources and filters, query parsing, index distribution.
Cheers
Rob
On Mon, Dec 8, 2008 at 2:49 AM, Grant Ingersoll <[EMAIL PROTECT
32 matches
Mail list logo