Hey!
Consider a bunch of documents that represent, say, students. These students
have the following attributes:
1) Student IDs
2) Name
3) Self-description (optional)
So, all documents have id: and name:, but only some of the documents have an
added desc:
Assuming all of the fields are indexed,
So, it appears to me that the criteria for a "good suggestion" is the n-gram
overlap of a given term, not the edit distance.
Thus, if we're looking for "britney", but we mess up and type "birtney",
"kortney" will come up before "birtney."
Is there a way to force the SpellChecker to use the edit
I was wondering if the Lucene SpellChecker class was threadsafe,
specifically, indexDictionary().
Such that:
for (int i = 0; i < numReaders; i++) {
//spawn new thread to run:
spellchecker.indexDictionary(new LuceneDictionary(readers[i],
myField));
}
Would work.
Thanks,
Matt
--
Vie
Mmmkay. I think I'll wait, then.
Thank you so much for your help. I really appreciate it.
Also, I really dig Lucene, so thanks for your hard work!
-Matt
Michael McCandless-2 wrote:
>
>
> mattspitz wrote:
>
>> Is there no way to ensure consistency
#x27;m using an "unfinished" version
of Lucene. Is there a rough date for 2.4's release? I poked around the
website and couldn't find one.
Thanks,
Matt
Michael McCandless-2 wrote:
>
>
> mattspitz wrote:
>
>> Are the index files synced on writer.close()?
>
&g
merging? I
don't really have a sense for what of the segments are kept in memory during
a merge. It doesn't make sense to me that Lucene would pull all of the
segments into memory to merge them, but I don't really know how.
Thank you so much,
Matt
Michael McCandless-2 wrote:
>
&g
s what your maxBufferedSize
> setting is. If it's too low you will see lots of IO. Increasing it means
> less IO, but more JVM heap need. Is your disk IO caused by searches or
> indexing only?
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
&
Hi! I'm using Lucene 2.3.2 to store a relatively-large index of HTML
documents. I'm storing ~150 million documents, taking up 150 GB of space.
I index the HTML text, but I only store primary key information that allows
me to retrieve it later. Thus, my document size is small, but obviously, I