Re: Index types

2008-08-27 Thread John Patterson
index > year, month and day or B) generate your own filter which used the > (cached) timestamp from a stored field or C) use solr (which contains > timestamp-range implementation out of the box). > > Best regards >Karsten > > > John Patterson wrote: >>

Index types

2008-08-27 Thread John Patterson
Hi, I know that Lucene uses an inverted index which makes range queries and great-than/less-than type queries very slow for continuous data types like times, latitude, etc. Last time I looked they were converted into huge OR queries and so had a maximum clause limit. I was wondering if any wor

Re: Deleted document terms

2008-08-26 Thread John Patterson
r the deletion as > below. > > document.add(new Field("id", id, Field.Store.YES, > *Field.Index.TOKENIZED*)); > > > > Thanks > > > > On Tue, Aug 26, 2008 at 2:15 PM, Michael McCandless < > [EMAIL PROTECTED]> wrote: > >> >> >>

Deleted document terms

2008-08-26 Thread John Patterson
Hi, I just discovered some strange behaviour with deleted documents. I do a search for documents with a certain query and delete one using IndexWriter.deleteDocuments(Term) using a key for the term. Then I repeat the search and the document is still there because I use a custom HitCollector whi

Re: Listing fields in an index

2008-08-13 Thread John Patterson
Thanks! I was looking in IndexReader for a good couple of minutes and didn't see that! Erik Hatcher wrote: > > > On Aug 13, 2008, at 5:02 AM, John Patterson wrote: >> How do I list all the fields in an index? Some documents do not >> contain all >> fields.

Listing fields in an index

2008-08-13 Thread John Patterson
Hi, How do I list all the fields in an index? Some documents do not contain all fields. Thanks, John -- View this message in context: http://www.nabble.com/Listing-fields-in-an-index-tp18959436p18959436.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. -

Listing fields in an index

2008-08-13 Thread John Patterson
Hi, How do I list all the fields in an index? Some documents do not contain all fields. Thanks, John -- View this message in context: http://www.nabble.com/Listing-fields-in-an-index-tp18959421p18959421.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. -

Re: Mixing non scored an scored queries

2008-07-16 Thread John Patterson
Karl Wettin wrote: > > > After sleeping on this it hit me that you might be able to save a bit > of CPU ticks by decorating queries and bypassing the scorer rather > than evaluating the score and then multiply it with 0. Probably not > too much though. Not much but might be worth mention

Re: Mixing non scored an scored queries

2008-07-15 Thread John Patterson
Karl Wettin wrote: > > >> Or just set the boost to zero on the individual filter fields, or on >> the whole filter expression. >> >> +(my query) +(filter1 OR filter2 AND filter3)^0 > > That sounds perfect! I thought that boosts would be multiplied together to give 0 for the whole expressio

Re: Mixing non scored an scored queries

2008-07-15 Thread John Patterson
Karl Wettin wrote: > > > Feel free to post it as an issue in the Jira when it's implemented. > > Thanks a lot! Will do John -- View this message in context: http://www.nabble.com/Mixing-non-scored-an-scored-queries-tp18460018p18470916.html Sent from the Lucene - Java Users mailing list a

Re: Mixing non scored an scored queries

2008-07-15 Thread John Patterson
Karl Wettin wrote: > > > I think all you need to do is to create a custom query (sounds like > you want a clone of TermQuery) that uses a Scorer that always return 1f. > > Actually, I just thought that it would probably be better to create an adapter Query that always returns a constant s

Re: Mixing non scored an scored queries

2008-07-15 Thread John Patterson
John Patterson wrote: > > > I don't think filters are the way to go here because I need to use boolean > style logic e.g. > > Search for free text "open fire" restricted to "London" OR "Brighton" in > category "Pubs and bars&quo

Re: Mixing non scored an scored queries

2008-07-15 Thread John Patterson
eks dev wrote: > > do not forget that Filter does not have to be loaded in memory, not any > more since LUECEN-584 commit! Now it is only skipping iterator what you > need. > > > translated, you could use: > ConstantScoreQuery created with Filter made from TermDocs (you need to > implement on

Re: Mixing non scored an scored queries

2008-07-15 Thread John Patterson
Karl Wettin wrote: > > I think all you need to do is to create a custom query (sounds like > you want a clone of TermQuery) that uses a Scorer that always return 1f. > That sounds exactly like what is required. I imagine that would be quite useful to have in the core project? -- View this

Re: Mixing non scored an scored queries

2008-07-15 Thread John Patterson
Erick Erickson wrote: > > No, you create the filter via TermDocs/TermEnum. You can also cache > them. Creating filters is *much* faster than you think . > But I can have many terms in the query. With over 10 million documents and many concurrent searches, creating a filter for every search w

Re: Mixing non scored an scored queries

2008-07-15 Thread John Patterson
Erick Erickson wrote: > > One way would be to create Filters and add them in with > I could possibly wrap the standard BooleanQuery in an adapter which also wraps its Weight and Scorer to return a constant value. But that seems like a hell of a lot of internal jiggery pokery for something th

Re: Mixing non scored an scored queries

2008-07-15 Thread John Patterson
Erick Erickson wrote: > > One way would be to create Filters and add them in with > ConstantScoreRangeQuery > Would that mean running the query twice? i.e. once to create the filter and once to rank the results? -- View this message in context: http://www.nabble.com/Mixing-non-scored-a

Mixing non scored an scored queries

2008-07-15 Thread John Patterson
Hi, I have a number of fields that are used to filter documents from a search. They should not contribute to the score of the document but merely decide which documents are valid. i.e. it doesn't matter how rare they are in the index. I also have a single "combined" field that is used for free

Re: Sorted Index

2007-10-26 Thread John Patterson
Yonik Seeley wrote: > > On 10/26/07, John Patterson <[EMAIL PROTECTED]> wrote: > Most things in an inverted index are sorted (terms, matching document > ids, term positions within a field, etc). Can you be more specific > about what you are trying to accomplish? > S

Sorted Index

2007-10-26 Thread John Patterson
Hi, What's the best way to maintain an index that is sorted? -- View this message in context: http://www.nabble.com/Sorted-Index-tf4701044.html#a13438928 Sent from the Lucene - Java Users mailing list archive at Nabble.com. -

Re: Cache BitSet or doc number?

2007-10-26 Thread John Patterson
Thom Nelson wrote: > > Check out the HashDocSet from Solr, this is the best way to cache small > sets of search results. In general, the Solr BitSet/DocSet classes are > more efficient than using the standard java.util.BitSet. You can use > these independent of the rest of Solr (though I r

Re: Exit a search when have enough results

2007-10-26 Thread John Patterson
Yonik Seeley wrote: > > The easiest way would be to throw an exception from a custom hit > collector (and then catch it yourself and continue). > Cheers, I wonder if the performance penalty from throwing an exception is worth it? -- View this message in context: http://www.nabble.com/Exit-a

Cache BitSet or doc number?

2007-10-26 Thread John Patterson
Hi, I am thinking about caching search results for common queries and just want to check that for small numbers of results it would be better to store the doc number as ints or shorts than to store a Filter with a BitSet. I guess if you results contain less than 1/32 or 1/16 of the number of doc

Exit a search when have enough results

2007-10-26 Thread John Patterson
Hi, I am doing a simple conjunction search for documents that do not need to be scored or sorted and was wondering if there is a way to stop the search from a hit collector when I have enough hits? I guess I am after a hot collector that can return a boolean determining if the search should cont

Non scoring search

2005-12-06 Thread John Patterson
Hi, I was wondering if there is a standard way to retrive documents WITHOUT scoring and sorting them. I need a list of documents that contain certain terms but I do not need them sorted or scored. Looking at the source it appears that I can use the TermDocs directly and write a method similar

Re: NumberTools

2005-03-22 Thread John Patterson
Doug Cutting apache.org> writes: > I'd like to see benchmarks that demonstrate the improvement before we > consider including such a patch. You're making a lot of assumptions > about where time is spent performing numeric searching and sorting. > Sort and RangeFilter are already pretty effici

Re: NumberTools

2005-03-22 Thread John Patterson
Chris Hostetter fucit.org> writes: > I haven't worked through the math to prove to myself that your algorithm > is a viable way of expressing any Integer as a 4 byte String; such that > any two Integers sort lexigraphically correct as strings ... but let's > assume that i have, and that it works

Re: NumberTools

2005-03-21 Thread John Patterson
Chris Hostetter fucit.org> writes: > > So why couldn't a user specified NumberFormat object be used to > convert that string into an Integer? Allowing people to format > their numbers in a way that sorts lexigraphically for Range Filters, > but still get the good Numeric Sot

Re: NumberTools

2005-03-18 Thread John Patterson
Erik Hatcher ehatchersolutions.com> writes: > Lucene's index works with any String. But, when dealing with numbers > and dates such that range queries work, they need to be formatted in a > way that makes them orderable. What I am suggesting here is storing numeric values as unsigned binary v

Re: NumberTools

2005-03-18 Thread John Patterson
> Because Lucene deals with String's lexicographically ordered. I thought lexographical ordering simply used the Unicode value of the chars and so would also work with non alpa-numeric strings. > Is there an issue you're encountering? No issue - I will soon need to add a lot of unstored numeric

NumberTools

2005-03-18 Thread John Patterson
Hi all, I was wondering why NumberTools and DateTools create strings restricted to alpha-numberic values? John. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]