In the case of such queries with keywords (not analyzed tokens), I would create directly the appropinquate TermQuerys and combine with BooleanQuery. QueryParser is normally not for program-internal queries, more for queries the user has entered. For your use-case, it seems better to just create the correct Query Objects using standard instantiation from the Java code.
----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Monday, May 04, 2009 9:51 PM > To: java-user@lucene.apache.org > Subject: Re: multi-field index and search (Not MultiFieldQuery). Help > setting up index and search > > MultiFieldQuery essentially (if I have this right) forms a "cross > product". > I.e. > it is NOT required to specify specific values for discrete fields. MFQ > helps > form queries expressing something like "does any term appear in any field > in a hit" or "Does every term appear in some field of a hit, regardless of > which > field and not necessarily the same field" (depending upon whether the > default operator is OR or AND). You can get something of the same effect > by creating a special field that is the concatenation of all the other > fields > and searching that concatenated field with and/or (except that MFQ does > interesting things with boosting). > > But if you know exactly what terms you require in which field, the > standard query parser is fine. i.e. +material:leather +gender:female > will look for "leather" ONLY in material and "female" ONLY in gender. > > HTH > Erick. > > P.S. A tip for you: If anything I say contradicts something Paul says, > listen to Paul <G>... > > > On Mon, May 4, 2009 at 3:33 PM, Christian Bongiorno > <christ...@bongiorno.org > > wrote: > > > Yeah, you definitely got the idea. You're the second person to recommend > > putting each item in it's own document and just store the HTS code > (which > > is > > easy for me). The HTS code actually comes with no extra info. I mean, > there > > is info, but we don't store any of it. > > > > I will try as you and Paul have recommended. Once done, then I would > need a > > MultiFieldQuery? Forgive me but the queries confuse me. > > > > Rebuilding my index will take some time, but I appreciate everyone's > help > > > > Christian > > > > On Mon, May 4, 2009 at 11:40 AM, Erick Erickson <erickerick...@gmail.com > > >wrote: > > > > > Hmmmm, tricky. Let's see if I understand your problem. > > > > > > Basically, you have a bunch of HSTs that have had > > > some number of items arbitrarily assigned to them, and > > > you want to see if you can make Lucene behave as a kind > > > of expert system to help you classify the next item. > > > > > > I *think* you'd get better results by indexing each item > > > along with its HST code as a separate document. Because > > > what you really want to ask is "given the attributes of my > > > new item, what other item is "most similar" to it and then > > > present the HSTs from these items to the classifier > > > (perhaps a person?). > > > > > > I'm going to assume further that the HST code has > > > some data associated with it that describes the > > > class, and that these need to be available to > > > the user to see if your suggestions are appropriate. > > > You could either index the HSTs in another index > > > OR index them in the same index but simply store > > > the data (don't index it) and the HST documents won't > > > interfere with your searches on "similar items". > > > > > > Mostly, this is just trying to see if I understand what > > > you're trying to accomplish. This may be gibberish, but > > > it's a start <G>. > > > > > > Best > > > Erick > > > > > > > > > On Mon, May 4, 2009 at 1:16 PM, Christian Bongiorno < > > > christ...@bongiorno.org > > > > wrote: > > > > > > > I am trying to build a search (have been experimenting with using > > Lucene) > > > > and someone suggested contacting your team > > > > > > > > Background: > > > > Currently the service I am working on applies taxing/duties to > products > > > for > > > > international shipping by looking up something called an HTS code (a > > > > universally recognized taxation code for duty/tariff). We already > have > > > > almost a million items classified by HTS code. As many as 50k items > > fall > > > > into the same HTS code. > > > > > > > > For purposes of HTS classification > > > > Description is only important if no other field exists. But taxation > is > > > > based on things like material (leather, cloth, etc) and product > > > > (shoes/bags/toys). Color is of fair relevancy as well (to a customs > > > > official > > > > black boots or brown make no difference; it wasn't made here so it > must > > > be > > > > taxed) > > > > > > > > The idea is to turn our entire existing knowledge base into an > index, > > > then > > > > when we get a new item that needs classification, we "search" for > the > > > > "Document(hts)" that best matches by using the new item attributes > for > > > the > > > > item to be classified as the search query. > > > > > > > > The document structure, as I see it, should be: > > > > > > > > Document(HTS) -> {{ASIN1: {Key,value},{Key,value},.}, {ASIN2: > > > > {Key,value},{Key,value},.} .} > > > > > > > > There are 1788 documents. Up to 50k ASINs and their attributes may > fall > > > > into > > > > a single document. > > > > > > > > On some fields, they are straightforward and very good indicators of > > > match. > > > > Such as > > > > > > > > Material -> "leather" > > > > Gender -> "women" > > > > > > > > Others are fuzzier > > > > > > > > Description -> "Stylish full calf leather boots. Sleek Italian > leather, > > > > designer" > > > > > > > > So for a query of: > > > > "Material" -> "Leather" > > > > "Gender" -> "womAn" > > > > "Description" -> "Short leather shoes, Made in Denmark" > > > > > > > > I would expect a very high match here since the first 2 fields, > which > > > don't > > > > vary much, are good indicators for HTS. > > > > > > > > I have searched through the archives and I don't see anything like > what > > I > > > > am > > > > looking for. > > > > > > > > Basically, every item will have attributes which I am treating as > > > > "Field(item.key, item.value)". I think that's the right approach but > > > > multi-field query queries your terms across all fields in the > search. > > > That > > > > isn't what I need. I very clearly know my fields and values and that > > > should > > > > give me enormous leverage when querying if I could build a query to > do > > > that > > > > > > > > > > > > Christian > > > > > > > > -- > > > > Christian Bongiorno > > > > > > > > > > > > > > > -- > > Christian Bongiorno > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org