Re: Aggregating category hits

2006-05-16 Thread Marvin Humphrey
Thanks, all. The field cache and the bitsets both seem like good options until the collection grows too large, provided that the index does not need to be updated very frequently. Then for large collections, there's statistical sampling. Any of those options seems preferable to retriev

Re: Position of a word in a document?

2006-05-16 Thread Chris Hostetter
: There is a "TermPositions pos = reader.termPositions();" [reader is an : instance of IndexReader] - but I have no clue, how to get a position of : a hit in a document. What can I do with TermPosition? : : So, I have all hits of my query with "Hits hits = : searcher.search(query);" - with the hel

Re: Theoretical Lucene Performance

2006-05-16 Thread gekkokid
http://lucenebook.com http://www.amazon.com/exec/obidos/asin/1932394281 :) - Original Message - From: "Andreas Harth" <[EMAIL PROTECTED]> To: Sent: Tuesday, May 16, 2006 10:51 PM Subject: Theoretical Lucene Performance Hello, I'd like to learn a bit more about the index organizati

Re: Position of a word in a document?

2006-05-16 Thread Franz Coriand
But how can I retrieve this information during my search process??? I retrieve an object of the Typ Document ... but this object doesn't have a "getPosition()" or "getTermVector()" methode?! IndexReader has the appropriate get... methods. There is a "TermPositions pos = reader.termPosit

Theoretical Lucene Performance

2006-05-16 Thread Andreas Harth
Hello, I'd like to learn a bit more about the index organization of Lucene (ideally without sifting through source code). Are there any publications that explain the Lucene indexing structure in detail? Or is it possible to say in a few sentences how Lucene works and I can look up the details in

Zilverline Search Engine version 1.5.0 released

2006-05-16 Thread Michael Franken
All, I've just released Zilverline version 1.5.0. This version adds security and upload functionality, as well as some minor fixes and enhancements. The source will be made available as well very soon. Zilverline is protected by a Collaborative Source License. You can read more on this type

Re: Position of a word in a document?

2006-05-16 Thread Daniel Naber
On Dienstag 16 Mai 2006 18:42, Franz Coriand wrote: > "private boolean storeTermVector = true;" > "private boolean storePositionWithTermVector = true;" Use the optional Field.TermVector parameter in the Field constructor. > But how can I retrieve this information during my search process??? > I

Re: Search precondition: matching area

2006-05-16 Thread karl wettin
On Tue, 2006-05-16 at 17:51 +0200, David Trattnig wrote: > Is it possible to set more than one default-field at the > QueryParser's constructor? Actually I've set it to "contents" but i'd > like to search "contents" AND "title" and matches in title should have > a higher rating. I've posted a pat

Changing the scoring (newest doc date first)

2006-05-16 Thread Marcus Falck
Hello, I'm working on a very large implementation of a search engine based on the lucene api (1.4.3). We have also been investigating enterprise search companies such as FAST and Verity but have come to the conclusion that we might aswell save ourselves 1 millon dollars by doing our own implem

Re: Position of a word in a document?

2006-05-16 Thread Franz Coriand
Daniel Naber schrieb: On Montag 15 Mai 2006 14:54, Franz Coriand wrote: is it possible not only to get the document which contains the words of a query, but also get the position in the text of the query word? Yes, by using the term vectors with positions that were added in Lucene 1.9

Re: Search precondition: matching area

2006-05-16 Thread David Trattnig
Hi Mike, Hi Eks-Dev, first of all: Thank you so much! Both of you helped me a lot & it works fine! > Additionally: If I submit no area > > query-string: "hello" > > the query should be applied as it would have a matching area. I'm not sure exactly what you mean. This simple query will only re

Re: Search precondition: matching area

2006-05-16 Thread eks dev
try: 1. query-string: "hello +area:home" to get Filtering effect 2. to minimize scoring use boosts: "(hello)^HIGH_BOOST +(area:home)^LOW_BOOST" 3. If scoring via boosts does not work good enough for you, or is slow, use Filter interface from your code... search this list for Filter - Or

Search precondition: matching area

2006-05-16 Thread David Trattnig
Hello LuceneList, I've got at least following fields in my index: AREA = "home news business" CONTENTS = "... hello world ..." If I submit the query query-string: "hello area:home" Lucene should only search these documents which has the matching area. Actually Lucene searches the area, but

Re: Search precondition: matching area

2006-05-16 Thread Michael D. Curtin
David Trattnig wrote: Hello LuceneList, I've got at least following fields in my index: AREA = "home news business" CONTENTS = "... hello world ..." If I submit the query query-string: "hello area:home" Lucene should only search these documents which has the matching area. Actually Lucene s

Re: IndexReader seems loading the full index

2006-05-16 Thread Michael D. Curtin
Sharad Agarwal wrote: I am a newbie in lucene space. and trying to understand lucene search result caching; facing with a wierd issue. After creating the IndexReader from a file system directory, I rename/remove the index directory; but still I am able to search the index and able to get the

IndexReader seems loading the full index

2006-05-16 Thread Sharad Agarwal
I am a newbie in lucene space. and trying to understand lucene search result caching; facing with a wierd issue. After creating the IndexReader from a file system directory, I rename/remove the index directory; but still I am able to search the index and able to get the documents from Hits. Th

Extracting citation graph of ACM digital library and others DLs.

2006-05-16 Thread Trung
Hello everybody, I have an question. It's not related to Lucene, I know, but I post it here because many of you have excellent knowledge in computer science and I hope that you can help me. My question is how I can extract citation graph of ACM digital library (or any important digital library i

Search precondition: matching area

2006-05-16 Thread David Trattnig
Hello LuceneList, I've got at least following fields in my index: AREA = "home news business" CONTENTS = "... hello world ..." If I submit the query query-string: "hello area:home" Lucene should only search these documents which has the matching area. Actually Lucene searches the area, but it

Re: Aggregating category hits

2006-05-16 Thread Kapil Chhabra
Thanks a lot Jelda. I'll try this get back with the performance comparison chart. Regards, kapilChhabra Ramana Jelda wrote: Hi Kapil, As I remember FieldCache is in lucene api since 1.4 . Ok . Anyhow here is suedo code that can help. //1. initialize reader on opening documentId to the category

RE: Aggregating category hits

2006-05-16 Thread Ramana Jelda
Hi Kapil, As I remember FieldCache is in lucene api since 1.4 . Ok . Anyhow here is suedo code that can help. //1. initialize reader on opening documentId to the categoryid relation as below. Depending on your requirement you can either getStringIndex().. I get StringIndex in //my project. String

Re: Aggregating category hits

2006-05-16 Thread Kapil Chhabra
Hi Jelda, I have not yet migrated to Lucene 1.9 and I guess FieldCache has been introduced in this release. Can you please give me a pointer to your strategy of FieldCache? Thanks & Regards, Kapil Chhabra Ramana Jelda wrote: But this BitSet strategy is more memory consuming mainly if you hav

Re: Analyzer which distributes tokens to many fields

2006-05-16 Thread Erik Hatcher
On May 16, 2006, at 3:02 AM, Mathias Keilbach wrote: I'm going to create a small application with Lucene, which analyze diffenrent Strings. While analyzing the strings, patterns (like emails or urls) shall be sort out and saved in a seperate index field. I'm not sure if I can handle this with

Re: Aggregating category hits

2006-05-16 Thread Erik Hatcher
On May 16, 2006, at 1:37 AM, Kapil Chhabra wrote: Even I am doing the same in my application. Once in a day, all the filters [for different categories] are initialized. Each time a query is fired, the Query BitSet is ANDed with the BitSet of each filter. The cardinality obtained is the des

RE: Aggregating category hits

2006-05-16 Thread Ramana Jelda
But this BitSet strategy is more memory consuming mainly if you have documents in million numbers and categories in thousands. So I preferred in my project FieldCache strategy. Jelda > -Original Message- > From: Kapil Chhabra [mailto:[EMAIL PROTECTED] > Sent: Tuesday, May 16, 2006 7:38 A

Analyzer which distributes tokens to many fields

2006-05-16 Thread Mathias Keilbach
Hi! I'm going to create a small application with Lucene, which analyze diffenrent Strings. While analyzing the strings, patterns (like emails or urls) shall be sort out and saved in a seperate index field. I'm not sure if I can handle this with a self implemented Analyzer class. Afaik you can't