My Category Search Problem
Hi Lucene Users! I've been playing around with dotLucene on a few projects since for about 4 months, and I've found Lucene to be exceptionally powerful, speedy and thanks to LIA, really easy to use. But I've hit a problem that I fear will pose a performance problem for our architecture and Lucene installation. We have an index of about 100,000 documents with about 30 fields, built from our database. Each document in the index contains a TOKENIZED field of Category Names, so that each document can belong to many categories. The category field is a tokenized string field. We have a new requirement to not only allow searches across the whole index, but to return the number of documents in each of the (150) possible categories. This is like in an Amazon search (http://amazon.com/s/ref=nb_ss_gw/105-0072880-3737226?url=search-alias%3Daps &field-keywords=diamond&Go.x=0&Go.y=0&Go=Go), where a category list is presented on the left with the number of results in each category. So far, I can think of two possible ways to implement this: 1. Create a QueryFilter for the user enterered query, and perform a category field search for each category. 2. Create a separate index for each category, and sequentially (or concurrently) search across all the indexes. Does anyone know which solution is better than the other? Both solutions seem taxing to me because they both involve "number of categories + 1" searches. Regards, -V
Multiple time ranges in a document
Hello, I'm using a RangeFilter to find "Event" documents (with Start and End lucene friendly formatted date fields) that match a Users time range query. This works perfectly in sub-second times at decent loads, but I'm having trouble searching multiple performances in the one document. Indexing them is no problem, because I can add extra terms to the start and end fields. Here's a situation that doesn't work to well with the RangeFilter:- Let's say a comedian has a regular gig every Monday for the next 3 weeks, from 7pm-9pm. So, the start field will be 200702191900, 200702261900, 200703051900. And, the end field will be 200702192100, 200702262100, 200703052100. If someone searches for an event on Thursday anytime during his 3 week stint, the comedian's event will show up, because the Range Filter will consider the lowest term of the start field and the highest term of the end field. Also, sorting by start or end fields will break, but I could write my own SortComparatorSource to fix that. How could I get around the filter problem? I could write my own filter, but it would need to keep track of both fields, and their respective term positions for each field. Thanks for your help, -Vijay
Exact field searches
Hi Guys, Currently I construct a PrefixQuery to exact search through an index of documents that represent Compact Discs, something like www.discogs.com. On the search page, we offer a suggestion list as the user enters text, like google suggest. When a user selects an item out of this list, we mark the search as being an "exact" search, because they know what they want. An exact search wraps the name of the disc in a PrefixQuery and performs the search. But, I'm getting some unwanted results and I'm not sure which solution approach to use. In our dataset, there are hundreds of CDs with single English word titles. Like, "Pink" and "Dust" and "Walk" etc. If the user selects the "Pink" from the suggestion list, then CDs with titles like "Pink Sunset", "A Pink lady", "Pink McPinkington", "Tomorrow the Pink" appear in the results (along with the CDs just titled "Pink"). Obviously, the PhraseQuery finds instances of that phrase in the title field, but I need to somehow exclude those titles that have a different number of tokens from the query. How do I make search for a specific number of tokens in a field? Thanks for your help, Vijay Santhanam B.Eng.(Soft.) Spectrum Wired - Software Engineer T: +61 2 4925 3266 F: +61 2 4925 3255 M: +61 407 525 087 W: www.spectrumwired.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Exact searches with PhraseQuery
Hi Guys, For some reason, I said I was using "PrefixQuery" for exact queries. What I meant to say is PhraseQuery... but the editor between my brain and fingers had gone home. The TermQuery idea may be the simplest solution, because I store the name un-tokenized for sorting purposes. Otherwise; Between SpanFirstQuery, RegexQuery and the many other solutions at http://www.nabble.com/Search-for-docs-containing-only-a-certain-word-in-a-sp ecified-field--tf3655925.html I should have a good solution. Thanks for your help Guys! Vijay Santhanam B.Eng.(Soft.) Spectrum Wired - Software Engineer T: +61 2 4925 3266 F: +61 2 4925 3255 M: +61 407 525 087 W: www.spectrumwired.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]