MultiFieldQuery essentially (if I have this right) forms a "cross product". I.e. it is NOT required to specify specific values for discrete fields. MFQ helps form queries expressing something like "does any term appear in any field in a hit" or "Does every term appear in some field of a hit, regardless of which field and not necessarily the same field" (depending upon whether the default operator is OR or AND). You can get something of the same effect by creating a special field that is the concatenation of all the other fields and searching that concatenated field with and/or (except that MFQ does interesting things with boosting).
But if you know exactly what terms you require in which field, the standard query parser is fine. i.e. +material:leather +gender:female will look for "leather" ONLY in material and "female" ONLY in gender. HTH Erick. P.S. A tip for you: If anything I say contradicts something Paul says, listen to Paul <G>... On Mon, May 4, 2009 at 3:33 PM, Christian Bongiorno <christ...@bongiorno.org > wrote: > Yeah, you definitely got the idea. You're the second person to recommend > putting each item in it's own document and just store the HTS code (which > is > easy for me). The HTS code actually comes with no extra info. I mean, there > is info, but we don't store any of it. > > I will try as you and Paul have recommended. Once done, then I would need a > MultiFieldQuery? Forgive me but the queries confuse me. > > Rebuilding my index will take some time, but I appreciate everyone's help > > Christian > > On Mon, May 4, 2009 at 11:40 AM, Erick Erickson <erickerick...@gmail.com > >wrote: > > > Hmmmm, tricky. Let's see if I understand your problem. > > > > Basically, you have a bunch of HSTs that have had > > some number of items arbitrarily assigned to them, and > > you want to see if you can make Lucene behave as a kind > > of expert system to help you classify the next item. > > > > I *think* you'd get better results by indexing each item > > along with its HST code as a separate document. Because > > what you really want to ask is "given the attributes of my > > new item, what other item is "most similar" to it and then > > present the HSTs from these items to the classifier > > (perhaps a person?). > > > > I'm going to assume further that the HST code has > > some data associated with it that describes the > > class, and that these need to be available to > > the user to see if your suggestions are appropriate. > > You could either index the HSTs in another index > > OR index them in the same index but simply store > > the data (don't index it) and the HST documents won't > > interfere with your searches on "similar items". > > > > Mostly, this is just trying to see if I understand what > > you're trying to accomplish. This may be gibberish, but > > it's a start <G>. > > > > Best > > Erick > > > > > > On Mon, May 4, 2009 at 1:16 PM, Christian Bongiorno < > > christ...@bongiorno.org > > > wrote: > > > > > I am trying to build a search (have been experimenting with using > Lucene) > > > and someone suggested contacting your team > > > > > > Background: > > > Currently the service I am working on applies taxing/duties to products > > for > > > international shipping by looking up something called an HTS code (a > > > universally recognized taxation code for duty/tariff). We already have > > > almost a million items classified by HTS code. As many as 50k items > fall > > > into the same HTS code. > > > > > > For purposes of HTS classification > > > Description is only important if no other field exists. But taxation is > > > based on things like material (leather, cloth, etc) and product > > > (shoes/bags/toys). Color is of fair relevancy as well (to a customs > > > official > > > black boots or brown make no difference; it wasn’t made here so it must > > be > > > taxed) > > > > > > The idea is to turn our entire existing knowledge base into an index, > > then > > > when we get a new item that needs classification, we “search” for the > > > “Document(hts)” that best matches by using the new item attributes for > > the > > > item to be classified as the search query. > > > > > > The document structure, as I see it, should be: > > > > > > Document(HTS) -> {{ASIN1: {Key,value},{Key,value},…}, {ASIN2: > > > {Key,value},{Key,value},…} …} > > > > > > There are 1788 documents. Up to 50k ASINs and their attributes may fall > > > into > > > a single document. > > > > > > On some fields, they are straightforward and very good indicators of > > match. > > > Such as > > > > > > Material -> “leather” > > > Gender -> “women” > > > > > > Others are fuzzier > > > > > > Description -> “Stylish full calf leather boots. Sleek Italian leather, > > > designer” > > > > > > So for a query of: > > > “Material” -> ”Leather” > > > “Gender” -> ”womAn” > > > “Description” -> ”Short leather shoes, Made in Denmark” > > > > > > I would expect a very high match here since the first 2 fields, which > > don’t > > > vary much, are good indicators for HTS. > > > > > > I have searched through the archives and I don't see anything like what > I > > > am > > > looking for. > > > > > > Basically, every item will have attributes which I am treating as > > > "Field(item.key, item.value)". I think that's the right approach but > > > multi-field query queries your terms across all fields in the search. > > That > > > isn't what I need. I very clearly know my fields and values and that > > should > > > give me enormous leverage when querying if I could build a query to do > > that > > > > > > > > > Christian > > > > > > -- > > > Christian Bongiorno > > > > > > > > > -- > Christian Bongiorno >