lucene nightly build after 11/20

2006-12-19 Thread Yonik Seeley
Anyone using a lucene nightly build dated later than 11/20 will want to upgrade to the next (future) nightly build that will be dated 12/21 http://issues.apache.org/jira/browse/LUCENE-754 Keep in mind that nightly builds are developer builds and not always stable (though we try our best) :-) -Y

Re: sorting by per doc hit count

2006-12-19 Thread Mark Miller
I appreciate your help Hoss. That has cleared up some things for me. The problem reamins that I would like to be able to switch between the hits per doc Similarity and the default Similarity on any given search. I was hoping that I could index with DefaultSimilarity and store the norms for nor

Re: Extracting data from Lucene index files

2006-12-19 Thread Venkateshprasanna
> Take a look at TermDocs and TermEnum. I need to get the frequency of each word in each of the documents I have indexed. This is what I could do with TermEnums and TermDocs. For each Term from TermEnum, I have instantiated a TermsDoc and for each doc, I am trying to get the frequency of the Ter

Re: MultiFieldQueryParser doesn't properly filter out documents when the query string specifies to exclude certain terms

2006-12-19 Thread Daniel Naber
On Tuesday 19 December 2006 23:05, Scott Sellman wrote: >                         new > BooleanClause.Occur[]{BooleanClause.Occur.SHOULD, > BooleanClause.Occur.SHOULD} Why do you explicitly specify these operators? > q.add(keywordQuery, BooleanClause.Occur.MUST); //true, false); You seem to wra

Re: sorting by per doc hit count

2006-12-19 Thread Chris Hostetter
: Foolish me...override a static method...silly silly. Still, I think : there must be some way. I don't care about the field : normalization...there must be some way to make it return a constant 1 : when using a new Similarity class. as discussed: norms are a value explicitly stored in your index

MultiFieldQueryParser doesn't properly filter out documents when the query string specifies to exclude certain terms

2006-12-19 Thread Scott Sellman
I am not sure if this is a problem with Lucene or if I am building my Query object improperly. It seems to me, when performing a search that should exclude certain terms, MultiFieldQueryParser doesn't filter out documents when it should. Consider the following example to clarify what I am talking

Re: how to define deault fields

2006-12-19 Thread Yonik Seeley
On 12/19/06, John Song <[EMAIL PROTECTED]> wrote: How to define default fields? Is it done during index time or during search time? Strangely, I can't find out any information on how default fields are defined? "default" field is simply a QueryParser concept (see it's constructors). It doe

Re: sorting by per doc hit count

2006-12-19 Thread Mark Miller
Foolish me...override a static method...silly silly. Still, I think there must be some way. I don't care about the field normalization...there must be some way to make it return a constant 1 when using a new Similarity class. Doron Cohen wrote: "Mark Miller" <[EMAIL PROTECTED]> wrote on 19/12

Re: I: Lucene id generation

2006-12-19 Thread Erick Erickson
I see your point, but I have to ask whether this is a practical or a theoretical problem? If it's a practical one, perhaps you'd be willing to talk about the issue you're actually trying to solve and maybe we can come up with a solution within the current framework. I know others on the list have

Re: sorting by per doc hit count

2006-12-19 Thread Mark Miller
Thanks for the tip Doron, What if I replace the decode static method in Similiarity so that it returns 1 always for the HitPerDocSimiliarity? This would not require a re-index right? Doron Cohen wrote: "Mark Miller" <[EMAIL PROTECTED]> wrote on 19/12/2006 09:21:00: LIA mentioned somethin

Re: Lucene scoring: coord_q_d factor

2006-12-19 Thread Doug Cutting
Karl Koch wrote: Are there any other papers that regard the combination of coordination level matching and TFxIDF as advantageous? We independently developed coordination-level matching combined with TFxIDF when I worked at Apple. This is documented in: http://www.informatik.uni-trier.de/~

how to define deault fields

2006-12-19 Thread John Song
Hi: How to define default fields? Is it done during index time or during search time? Strangely, I can't find out any information on how default fields are defined? thanks, john __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best sp

Re: Lucene 2.0.1 release date

2006-12-19 Thread Doug Cutting
Steven Rowe wrote: "2.1" is much more likely to be the label used for the next release than "2.0.1". The roadmap in Jira shows 21 issues scheduled for 2.0.1. If there is in fact no intent to merge these into the 2.0 branch, these should probably be retargetted for 2.1.0, and the 2.0.1 versio

I: Lucene id generation

2006-12-19 Thread Antonio Bruno
The problem in reality consists on the fact to have an only dictionary of the terms for all the fields. If the dictionary of the terms is the many large performances of a search they diminish, even if the search is made on a single term. Then it would be wanted to be able to index the fields of

Re: Lucene id generation

2006-12-19 Thread Steven Rowe
Antonio Bruno wrote: > To use but directly the docId would render efficient and fastest the > searches much. Thoughts to the possibility of being able to apply a > first CachingWrapperFilter F1 on an index and a second > CachingWrapperFilter F2 on an other index and after to make (F1 AND > F2) and

Re: sorting by per doc hit count

2006-12-19 Thread Doron Cohen
"Mark Miller" <[EMAIL PROTECTED]> wrote on 19/12/2006 09:21:00: > LIA mentioned something about needing to rebuild the > index if you change Similarity's. That does not make > sense to me yet. It would seem you could alternate them. > What does scoring have to do with indexing? For this part of yo

Re: sorting by per doc hit count

2006-12-19 Thread Mark Miller
Could I use another Similarity that returned 1 for most of the scoring terms and the actual term frequency (rig the equation)? Could I then alternate the DefaultSimilarity and HitsPerDocSimilairty per search? LIA mentioned something about needing to rebuild the index if you change Similarity's. Th

Re: Lucene 2.0.1 release date

2006-12-19 Thread Mark Diggory
We'ed primarily like to see a release of the LockFactory implementation. This functionality will help us better control our locking, but we want to depend on actual releases, not interim builds/snapshots. Any news on this now that this thread is a couple months old? -Mark Diggory George Ar

Re[2]: WRITE_LOCK_TIMEOUT

2006-12-19 Thread Maxim Patramanskij
Hello Guido, Wednesday, April 5, 2006, 5:23:37 PM, you wrote: GN> On 05.04.2006, at 17:15 Uhr, Bill Janssen wrote: >> Or, as I suggested a couple of days ago, a 1.9.2 release could be >> offered. GN> Would be a good idea, because the current nightly builds have a lot GN> of deprecated metho

Re: Lucene id generation

2006-12-19 Thread Erick Erickson
But you can do something very similar and very quickly using a unique ID (not the Lucene ID) that's shared across the indexes (assuming I'm reading your issue correctly). Then use TermDocs/TermEnum and create your filters that way. I predict endless problems with user (programmer) errors if Lucen

Help with jump from 1.4.3 to 2.0.0

2006-12-19 Thread JT Kimbell
Hi, I'm working on learning Lucene for my job, and the book one of my professors purchased for myself and her is Lucene In Action, which is a good book but it is based on version 1.4.3 (I believe). I am beginning to grasp a lot of the basic concepts behind Lucene and have a basic searching and i

RE: Lucene id generation

2006-12-19 Thread Antonio Bruno
To use but directly the docId would render efficient and fastest the searches much. Thoughts to the possibility of being able to apply a first CachingWrapperFilter F1 on an index and a second CachingWrapperFilter F2 on an other index and after to make (F1 AND F2) and to even extract the info of