Re: factor in stopwords when searching

2008-03-21 Thread Jake Mannix
I think the way I've seen it done most often is to either index some bi-grams which contain stop words (so "the database" and "search the" are in the index as individual tokens), or else to index that piece of content twice - once with stop words removed (and stemming, if you use it), and then agai

Re: factor in stopwords when searching

2008-03-21 Thread Grant Ingersoll
Don't throw away the stopwords? :-) Lucene can't score something it doesn't know exists. I suppose you could try to get fancy w/ payloads and add payloads if stopwords exist, but I am just thinking out loud there. On Mar 21, 2008, at 9:20 PM, Chris Lu wrote: Let's say "the" is consider

factor in stopwords when searching

2008-03-21 Thread Chris Lu
Let's say "the" is considered stopword. And for example two documents are document A, content: "... search the database..." document B, content: "... search database..." So when the user's input is "search the database", searching with query content:"search database"~1 can return both. But is ther

Re: Call Lucene default command line Search from PHP script

2008-03-21 Thread Paul Elschot
Op Saturday 22 March 2008 00:32:32 schreef Paul Elschot: > Milu, > > This is a PHP problem, not a Lucene one, so you might get better > response at a PHP mailing list. > > The easy way around your problem is probably by invoking a shell > script from php that exports the class path as you indicated

Re: Call Lucene default command line Search from PHP script

2008-03-21 Thread Paul Elschot
Milu, This is a PHP problem, not a Lucene one, so you might get better response at a PHP mailing list. The easy way around your problem is probably by invoking a shell script from php that exports the class path as you indicated, so that java can see the correct classes. Having said that, you'll

Call Lucene default command line Search from PHP script

2008-03-21 Thread milu07
Hello, My machine is Ubuntu 7.10. I am working with Apache Lucene. I have done with indexer and tried with command line Searcher (the default command line included in Lucene package: http://lucene.apache.org/java/2_3_1/demo2.html). When I use this at command line: java Searcher -query algorithm

Re: backup RAMDirectory to file

2008-03-21 Thread roger dimitri
Thank you so much Michael and Grant for your suggestions. I haven't tried SnapshotDeletionPolicy yet (thanks for the hint, I will do it now) and I guess Incremental back up may not work in my case since I have periodic Index cleaning jobs in RAMDirectory. Thanks again, Roger - Original Me

Re: feedback: Indexing speed improvement lucene 2.2->2.3.1

2008-03-21 Thread Ivan Vasilev
Hi Uwe, Could you tell what Analyzer do you use when you marked so big indexing speedup? If you use StandardAnalyzer (that uses StandardTokenizer) may be the reason is in it. You can see the pre last report in the thread "Indexing Speed: 2.3 vs 2.2 (real world numbers)". According to the repor

Re: backup RAMDirectory to file

2008-03-21 Thread Michael McCandless
I think both the original approach and addIndexes below will work here, though the original approach should be faster. But, there are some caveats. You have to make sure you do the backup with the writer on the ramDir closed. If there is a writer open, it could be changing files during

Re: backup RAMDirectory to file

2008-03-21 Thread Grant Ingersoll
I think you could try: IndexWriter writer = new IndexWriter(fileDirectory, ...) writer.addIndexes(ramDir) -Grant On Mar 20, 2008, at 2:47 PM, roger dimitri wrote: Hi, I am using the Directory class's copy method to periodically sync my RAM based index to a file based index that's supposed