Understanding/controlling role of Weight in IndexSearcher

2008-09-09 Thread Micah Jaffe
Quick summary of situation (using 2.3.2, StandardAnalyzer): I've taken a field that was being created as a "default" for a document, e.g. a giant string of glommed on values from other field values and instead created a boolean query to hit all the fields which would normally contribute to

10Gb of .nfsXXX files about a week old in NFS based index directory

2008-09-09 Thread David Loeng
Hi, We have a customer using lucene on an NFS directory, which contains ~10Gb of .nfs files. These files are the means by which NFS implements delete-on-close semantics (that is, if the index writer commits a delete of a file that is still held open by an index reader, the file is ren

Re: Incremental Indexing.

2008-09-09 Thread Jason Rutherglen
Hi Jang, Yes, and I have not completed it either... Perhaps when I do you can use it. Best regards, Jason On Tue, Sep 9, 2008 at 9:20 PM, 장용석 <[EMAIL PROTECTED]> wrote: > Thanks for your helps. > I have about 40 documents in my index and it is constant update (price > or name.. etc). > I wil

Re: Incremental Indexing.

2008-09-09 Thread 장용석
Thanks for your helps. I have about 40 documents in my index and it is constant update (price or name.. etc). I will try use function delete and add. And Jason I am interested in it (actually about lucene), but I am worried I do not understand core logic all about lucene and I am not good at e

RE: Re: Replacing FAST functionality at sesam.no - ShingleFilter+exactmatching

2008-09-09 Thread Steven A Rowe
On 09/09/2008 at 4:38 PM, Mck wrote: > > > Looks to me like MultiPhraseQuery is getting in the way. Shingles > > that begin at the same word are given the same position by > > ShingleFilter, and Solr's FieldQParserPlugin creates a > > MultiPhraseQuery when it encounters tokens in a query with the

Re: Replacing FAST functionality at sesam.no - ShingleFilter+exact matching

2008-09-09 Thread Mck
> Looks to me like MultiPhraseQuery is getting in the way. Shingles > that begin at the same word are given the same position by > ShingleFilter, and Solr's FieldQParserPlugin creates a > MultiPhraseQuery when it encounters tokens in a query with the same > position. I think what you want is to

RE: Re: Replacing FAST functionality at sesam.no - ShingleFilter+exact matching

2008-09-09 Thread Steven A Rowe
Hi mck, On 09/09/2008 at 12:58 PM, Mck wrote: > > *ShortVersion* > > is there a way to make the ShingleFilter perform exact matching via > > inserting ^ $ begin/end markers? > > Reading through the mailing list i see how exact matching can > be done, a la STFW to myself... > > So the ShortVersi

Re: memory leak during Lucene Search

2008-09-09 Thread Chris Lu
Thanks for the link! I will post the problem there. In the mean time, any J2EE application developers should know this problem and try to avoid Lucene checked out on or after May 23,2008, svn version 659602. I tried svn 659601, which worked fine. I will follow up on this email list when the proble

Re: memory leak during Lucene Search

2008-09-09 Thread Grant Ingersoll
Just chipping in that I recall there being a number of discussions on java-dev about ThreadLocal and web containers and how they should be handled. Not sure if it pertains here or not, but you might find http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal helpful. You might a

Re: Lucene Index

2008-09-09 Thread Grant Ingersoll
Term frequency information is kept in the index. On Sep 9, 2008, at 11:54 AM, Marie-Christine Plogmann wrote: Hi all, I am currently using a (slightly modified) version of the IndexFiles demo class of Lucene to index a corpus. As I understand it, the index lists for each term the documents

Re: Similarity percentage between two Strings

2008-09-09 Thread Thiago Moreira
For those interested in my solution I took this article as based to implement the requirements. http://www.catalysoft.com/articles/StrikeAMatch.html Thanks. - Original Message - From: [EMAIL PROTECTED] Sent: Thu, September 4, 2008 1:20 Subject:Re: Similarity percentage betwee

Re: Replacing FAST functionality at sesam.no - ShingleFilter+ exact matching

2008-09-09 Thread Mck
> *ShortVersion* > is there a way to make the ShingleFilter perform exact matching via > inserting ^ $ begin/end markers? Reading through the mailing list i see how exact matching can be done, a la STFW to myself... So the ShortVersion now stands: For my query "abcd efgh ijkl" Why does a (perfe

Replacing FAST functionality at sesam.no - ShingleFilter+ exact matching

2008-09-09 Thread Mck
-- original post was on solr's user list. -- -- i've reposted here as it's centered on the ShingleFilter which comes from lucene -- *ShortVersion* is there a way to make the ShingleFilter perform exact matching via inserting ^ $ begin/end markers? *LongVersion* At sesam.no we want to replace

Lucene Index

2008-09-09 Thread Marie-Christine Plogmann
Hi all, I am currently using a (slightly modified) version of the IndexFiles demo class of Lucene to index a corpus. As I understand it, the index lists for each term the documents it occurs in. My question is now, if this is in terms of frequency counts (the term occurs x times within the docum

Re: which version of lucene do you recommend

2008-09-09 Thread Michael McCandless
We try very hard to keep Lucene's trunk usable, but, sneaky things do slip in from time to time so you certainly have to do your own testing. And if you hit problems, be sure to raise them! Which exact JRE version are you using? It could be you are hitting the Sun JRE bug described here

Re: Building Relationships between documents?

2008-09-09 Thread Erick Erickson
You must get your head out of the RDBMS world when using lucene . There's nothing in Lucene that expresses relationships like a db. The usual solution is to de-normalize your database at index time so you can do reasonably simple searches that express your desired relationship... Best Erick On Tu

RE: Beginner: Specific indexing

2008-09-09 Thread Steven A Rowe
Hi Raymond, Check out SinkTokenizer/TeeTokenFilter: Look at the unit tests for usage hints:

Re: Building Relationships between documents?

2008-09-09 Thread Chris Lu
If you want to do it in just one search, yes, you have to put the Entities attributes into the documents. But you can search twice. The second time using values from the first search, say entitiy_id, to search the products. -- Chris Lu - Instant Scalable Full-Text Search O

which version of lucene do you recommend

2008-09-09 Thread Christian Reuschling
in the past, I made really good experiences with the svn versions of lucene - I never had problems, and everything feeled stable. Currently, I get unexpected exceptions from time to time: java.lang.RuntimeException: after flush: fdx size mismatch: 1 docs vs 0 length in bytes of _3g6n.fdx

Re: Beginner: Specific indexing

2008-09-09 Thread Raymond Balmès
Well that is well explained in "Lucene in Action" if you want to search files you have to build a file parser and there is a good example given. So not really my problem. But I thought I could go thru the token stream only once, where I have to go twice 1. for detecting my triplets , 2. for indexi

Building Relationships between documents?

2008-09-09 Thread lilalfyalien
Hi, I think I have a very easy question to answer (I am a Lucene beginner and like it very much!): I have built lucene documents and indexes from a dataset from a relational database. I have the table Entities and the table Products. Each product has one entity and Entities therefore can have mul

Re: Web Application Indexing Error

2008-09-09 Thread 叶双明
use classpath! 2008/9/9 Alexander Aristov <[EMAIL PROTECTED]> > Hi > > Build path and classpath at runtime are different matters. Where do you run > your servlet, in which container. > > Mainly all servlet containers should add all libraries located under > WEB-INF/lib, so you must place your luc

Re: Incremental Indexing.

2008-09-09 Thread Ian Lea
Such incremental indexing is standard practice and unlikely to cause a problem, particularly if you are only working with a few thousand documents. Instead of delete/add you could use IndexWriter.updateDocument(). -- Ian. 2008/9/9 장용석 <[EMAIL PROTECTED]>: > Hi~. > I hava a question about lucen

Scoring

2008-09-09 Thread Ulrich Vachon
Hi all, It is possible to have the score of each term composing the query like: - query = "foo bar" I would like to have the score for "foo" and "bar". Actually the score is based on results reached by the full query " foo bar". Regards, Ulrich -Message d'origine- De : Antony Bowesman