Re: Indexing TREC GOV2 data in Lucene

2012-04-12 Thread Dr. Hany Azzam
Hi, I am not sure if there's something in the contrib for GOV2 but it really depends on what you want to parse. If you are just interested in full-text search then it should be similar to parsing a regular document while being conscious of the trec-specific delimiters. It's something like . Howeve

Re: How disabling norms on a field effects other fields

2012-03-06 Thread Hany Azzam
i.e. Field length :) A trivial question maybe: if one uses these flags does that mean they don't need to override the computeNorm method as shown in Simon's article on seachworkings? I am referring to the case when one doesn't want to use norms. h. -Original Message- From: Paul Taylor

Re: Filter and IndexSearcher in Lucene 4.0 (trunk)

2012-02-10 Thread Hany Azzam
Schindler wrote: > Whats the problem? > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > >> -Original Message- >> From: Hany Azzam [mailto:h...@eecs.qmul.ac.uk] >&

Re: Filter and IndexSearcher in Lucene 4.0 (trunk)

2012-02-10 Thread Hany Azzam
Hi, I apologise upfront for the trivial question. I have an IndexSearcher and I am applying a FieldCacheTermsFilter filter on it to only retrieve documents whose single docId is in a provided set of allowed docIds. I am particularly interested in the stats being estimated over the accepted set

Re: IndexSearcher with two Indexes

2012-01-27 Thread Hany Azzam
, the retrieval function is more complex than that and a simple combination using product or summation won't be feasible. Any ideas on how to resolve this problem (if possible :))? Thanks again, h. On 27 Jan 2012, at 20:29, Robert Muir wrote: > On Fri, Jan 27, 2012 at 3:21 PM, Hany Azza

Re: IndexSearcher with two Indexes

2012-01-27 Thread Hany Azzam
Hi, I have two indexes. One that contains all the documents in the collection and the other contains only the relevant documents. I am using Lucene 4.0 and the new SimilariyBase class to build my retrieval models (similarity functions). One of the retrieval models requires statistics to be comp

Re: IndexDocValues and storing Stats

2012-01-04 Thread Hany Azzam
sed (encoded/decoded) DL is not statistically significant. However, I still prefer to use the raw DL, and that's why I use the sum of the TF's in a document to cache it. h. On 4 Jan 2012, at 14:37, Simon Willnauer wrote: > Hey, > > On Wed, Jan 4, 2012 at 1:15 PM, Hany Azzam

Re: IndexDocValues and storing Stats

2012-01-04 Thread Hany Azzam
Hi, I am experimenting with the Lucene trunk (aka 4.0), especially with the new IndexDocValues feature. I am trying to store some query-independent statistics such as PageRank, etc. One stat that I am trying to store is the sum of all the term frequencies in a document. This can be seen as the