I think the way I've seen it done most often is to either index some
bi-grams which
contain stop words (so "the database" and "search the" are in the index as
individual
tokens), or else to index that piece of content twice - once with stop words
removed
(and stemming, if you use it), and then agai
Don't throw away the stopwords? :-) Lucene can't score something it
doesn't know exists. I suppose you could try to get fancy w/ payloads
and add payloads if stopwords exist, but I am just thinking out loud
there.
On Mar 21, 2008, at 9:20 PM, Chris Lu wrote:
Let's say "the" is consider
Let's say "the" is considered stopword. And for example two documents are
document A, content: "... search the database..."
document B, content: "... search database..."
So when the user's input is "search the database", searching with
query content:"search database"~1 can return both.
But is ther
Op Saturday 22 March 2008 00:32:32 schreef Paul Elschot:
> Milu,
>
> This is a PHP problem, not a Lucene one, so you might get better
> response at a PHP mailing list.
>
> The easy way around your problem is probably by invoking a shell
> script from php that exports the class path as you indicated
Milu,
This is a PHP problem, not a Lucene one, so you might get better
response at a PHP mailing list.
The easy way around your problem is probably by invoking a shell
script from php that exports the class path as you indicated,
so that java can see the correct classes.
Having said that, you'll
Hello,
My machine is Ubuntu 7.10. I am working with Apache Lucene. I have done with
indexer and tried with command line Searcher (the default command line
included in Lucene package: http://lucene.apache.org/java/2_3_1/demo2.html).
When I use this at command line:
java Searcher -query algorithm
Thank you so much Michael and Grant for your suggestions. I haven't tried
SnapshotDeletionPolicy yet (thanks for the hint, I will do it now) and I guess
Incremental back up may not work in my case since I have periodic Index
cleaning jobs in RAMDirectory.
Thanks again,
Roger
- Original Me
Hi Uwe,
Could you tell what Analyzer do you use when you marked so big indexing
speedup?
If you use StandardAnalyzer (that uses StandardTokenizer) may be the
reason is in it. You can see the pre last report in the thread "Indexing
Speed: 2.3 vs 2.2 (real world numbers)". According to the repor
I think both the original approach and addIndexes below will work
here, though the original approach should be faster.
But, there are some caveats. You have to make sure you do the backup
with the writer on the ramDir closed. If there is a writer open, it
could be changing files during
I think you could try:
IndexWriter writer = new IndexWriter(fileDirectory, ...)
writer.addIndexes(ramDir)
-Grant
On Mar 20, 2008, at 2:47 PM, roger dimitri wrote:
Hi,
I am using the Directory class's copy method to periodically sync
my RAM based index to a file based index that's supposed
10 matches
Mail list logo