I added this under Use Cases. Thanks for the suggestion. Peter
On 8/13/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > > There is also a Use Cases item on the Wiki... > > On Aug 13, 2007, at 3:26 PM, Peter Keegan wrote: > > > I suppose it could go under performance or HowTo/Interesting uses of > > SpanQuery. > > > > Peter > > > > On 8/13/07, Erick Erickson <[EMAIL PROTECTED]> wrote: > >> > >> Thanks for writing this up. Do you think this is an appropriate > >> subject > >> for the Wiki performance page? > >> > >> Erick > >> > >> On 8/13/07, Peter Keegan <[EMAIL PROTECTED]> wrote: > >>> > >>> I've been experimenting with using SpanQuery to perform what is > >>> essentially > >>> a limited type of database 'join'. Each document in the index > >>> contains 1 > >>> or > >>> more 'rows' of meta data from another 'table'. The meta data are > >>> simple > >>> tokens representing a column name/value pair ( e.g. color$red or > >>> location$123). Each row is represented by a span with a maximum > >>> token > >>> length equal to the maximum number of meta data columns. If a > >>> column has > >>> multiple values, they are all indexed at the same position ( e.g. > >>> color$red, > >>> color$blue). All rows are added to a single field. The spans are > >>> 'separated' > >>> from each other by introducing a position gap between them via ' > >>> Analyzer.getPositionIncrementGap'. This gap should be greater > >>> than the > >>> number of columns in each span. > >>> > >>> At query time, a SpanNearQuery is constructed to represent the > >>> meta data > >>> to > >>> join. The 'slop' value is set to the maximum number of meta data > >>> columns > >>> (minus 1). Using a simple Antlr parser, boolean span queries with > >>> AND, > >> OR, > >>> NOT can be constructed fairly easily. The SpanQuery is And'd to > >>> the main > >>> query to build the final query. > >>> > >>> This approach is flexible and pretty efficient because no stored > >>> fields > >> or > >>> external data are accessed at query time. Span queries are more > >> expensive > >>> compared than other queries, though. We measure performance via > >> throughput > >>> (as opposed to the response time for a single query), and the > >>> addition > >> of > >>> a > >>> SpanQuery reduced throughput by 5X for ordered spans and 10X for > >> unordered > >>> spans. Still, this may be acceptable for some applications, > >>> especially > >> if > >>> spans are not used on every query. > >>> > >>> I thought this might interest some of you. > >>> > >>> Peter > >>> > >> > > -------------------------- > Grant Ingersoll > http://lucene.grantingersoll.com > > Lucene Helpful Hints: > http://wiki.apache.org/lucene-java/BasicsOfPerformance > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >