Hi Rob, While the demo code uses a fixed number of 3 values, you don't need to encode the number of values up front. Since your read the byte[] of a document up front, you can read in a while loop as long as in.position() < in.length().
Shai On Tue, Apr 29, 2014 at 10:04 AM, Rob Audenaerde <rob.audenae...@gmail.com>wrote: > Hi Shai, > > I read the article on your blog, thanks for it! It seems to be a natural > fit to do multi-values like this, and it is helpful indeed. For my specific > problem, I have multiple values that do not have a fixed number, so it can > be either 0 or 10 values. I think the best way to solve this is to encode > the number of values as first entry in the BDV. This is not that hard so I > will take this road. > > -Rob > > > > Op 27 apr. 2014 om 21:27 heeft Shai Erera <ser...@gmail.com> het > volgende geschreven: > > > > Hi Rob, > > > > Your question got me interested, so I wrote a quick prototype of what I > > think solves your problem (and if not, I hope it solves someone else's! > > :)). The idea is to write a special ValueSource, e.g. MaxValueSource > which > > reads a BinadyDocValues, decodes the values and returns the maximum one. > It > > can then be embedded in an expression quite easily. > > > > I published a post on Lucene expressions and included some prototype code > > which demonstrates how to do it. Hope it's still helpful to you: > > http://shaierera.blogspot.com/2014/04/expressions-with-lucene.html. > > > > Shai > > > > > >> On Thu, Apr 24, 2014 at 1:20 PM, Shai Erera <ser...@gmail.com> wrote: > >> > >> I don't think that you should use the facet module. If all you want is > to > >> encode a bunch of numbers under a 'foo' field, you can encode them into > a > >> byte[] and index them as a BDV. Then at search time you get the BDV and > >> decode the numbers back. The facet module adds complexity here: yes, you > >> get the encoding/decoding for free, but at the cost of adding mock > >> categories to the taxonomy, or use associations, for no good reason IMO. > >> > >> Once you do that, you need to figure out how to extend the expressions > >> module to support a function like maxValues(fieldName) (cannot use 'max' > >> since it's reserved). I read about it some, and still haven't figured > out > >> exactly how to do it. The JavascriptCompiler can take custom functions > to > >> compile expressions, but the methods should take only double values. So > I > >> think it should be some sort of binding, but I'm not sure yet how to do > it. > >> Perhaps it should be a name like max_fieldName, which you add a custom > >> Expression to as a binding ... I will try to look into it later. > >> > >> Shai > >> > >> > >> On Wed, Apr 23, 2014 at 6:49 PM, Rob Audenaerde < > rob.audenae...@gmail.com>wrote: > >> > >>> Thanks for all the questions, gives me an opportunity to clarify it :) > >>> > >>> I want the user to be able to give a (simple) formula (so I don't know > it > >>> on beforehand) and use that formula in the search. The Javascript > >>> expressions are really powerful in this use case, but have the > >>> single-value > >>> limitation. Ideally, I would like to make it really flexible by for > >>> example > >>> allowing (in-document aggregating) expressions like: max(fieldA) - > fieldB > >>>> > >>> fieldC. > >>> > >>> Currently, using single values, I can handle expressions in the form of > >>> "fieldA - fieldB - fieldC > 0" and evaluate the long-value that I > receive > >>> from the FunctionValues and the ValueSource. I also optimize the query > by > >>> assuring the field exists and has a value, etc. to the search still > fast > >>> enough. This works well, but single value only. > >>> > >>> I also looked into the facets Association Fields, as they somewhat look > >>> like the thing that I want. Only in the faceting module, all ordinals > and > >>> values are stored in one field, so there is no easy way extract the > fields > >>> that are used in the expression. > >>> > >>> I like the solution one you suggested, to add all the numeric fields an > >>> encoded byte[] like the facets do, but then on a per-field basis, so > that > >>> each numeric field has a BDV field that contains all multiple values > for > >>> that field for that document. > >>> > >>> Now that I am typing this, I think there is another way. I could use > the > >>> faceting module and add a different facet field ($facetFIELDA, > >>> $facetFIELDB) in the FacetsConfig for each field. That way it would be > >>> relatively straightforward to get all the values for a field, as they > are > >>> exact all the values for the BDV for that document's facet field. Only > >>> aggregating all facets will be harder, as the > >>> TaxonomyFacetSum*Associations > >>> would need to do this for all fields that I need facet counts/sums for. > >>> > >>> What do you think? > >>> > >>> -Rob > >>> > >>> > >>>> On Wed, Apr 23, 2014 at 5:13 PM, Shai Erera <ser...@gmail.com> wrote: > >>>> > >>>> A NumericDocValues field can only hold one value. Have you thought > about > >>>> encoding the values in a BinaryDocValues field? Or are you talking > about > >>>> multiple fields (different names), each has its own single value, and > at > >>>> search time you sum the values from a different set of fields? > >>>> > >>>> If it's one field, multiple values, then why do you need to separate > the > >>>> values? Is it because you sometimes sum and sometimes e.g. avg? Do you > >>>> always include all values of a document in the formula, but the > formula > >>>> changes between searches, or do you sometimes use only a subset of the > >>>> values? > >>>> > >>>> If you always use all values, but change the formula between queries, > >>> then > >>>> perhaps you can just encode the pre-computed value under different NDV > >>>> fields? If you only use a handful of functions (and they are known in > >>>> advance), it may not be too heavy on the index, and definitely perform > >>>> better during search. > >>>> > >>>> Otherwise, I believe I'd consider indexing them as a BDV field. For > >>> facets, > >>>> we basically need the same multi-valued numeric field, and given that > >>> NDV > >>>> is single valued, we went w/ BDV. > >>>> > >>>> If I misunderstood the scenario, I'd appreciate if you clarify it :) > >>>> > >>>> Shai > >>>> > >>>> > >>>> On Wed, Apr 23, 2014 at 5:49 PM, Rob Audenaerde < > >>> rob.audenae...@gmail.com > >>>>> wrote: > >>>> > >>>>> Hi Shai, all, > >>>>> > >>>>> I am trying to write that Filter :). But I'm a bit at loss as how to > >>>>> efficiently grab the multi-values. I can access the > >>>>> context.reader().document() that accesses the storedfields, but that > >>>> seems > >>>>> slow. > >>>>> > >>>>> For single-value fields I use a compiled JavaScript Expression with > >>>>> simplebindings as ValueSource, which seems to work quite well. The > >>>> downside > >>>>> is that I cannot find a way to implement multi-value through that > >>>> solution. > >>>>> > >>>>> These create for example a LongFieldSource, which uses the > >>>>> FieldCache.LongParser. These parsers only seem te parse one field. > >>>>> > >>>>> Is there an efficient way to get -all- of the (numeric) values for a > >>>> field > >>>>> in a document? > >>>>> > >>>>> > >>>>>> On Wed, Apr 23, 2014 at 4:38 PM, Shai Erera <ser...@gmail.com> > wrote: > >>>>>> > >>>>>> You can do that by writing a Filter which returns matching documents > >>>>> based > >>>>>> on a sum of the field's value. However I suspect that is going to be > >>>>> slow, > >>>>>> unless you know that you will need several such filters and can > >>> cache > >>>>> them. > >>>>>> > >>>>>> Another approach would be to write a Collector which serves as a > >>>> Filter, > >>>>>> but computes the sum only for documents that match the query. > >>> Hopefully > >>>>>> that would mean you compute the sum for less documents than you > >>> would > >>>>> have > >>>>>> w/ the Filter approach. > >>>>>> > >>>>>> Shai > >>>>>> > >>>>>> > >>>>>> On Wed, Apr 23, 2014 at 5:11 PM, Michael Sokolov < > >>>>>> msoko...@safaribooksonline.com> wrote: > >>>>>> > >>>>>>> This isn't really a good use case for an index like Lucene. The > >>> most > >>>>>>> essential property of an index is that it lets you look up > >>> documents > >>>>> very > >>>>>>> quickly based on *precomputed* values. > >>>>>>> > >>>>>>> -Mike > >>>>>>> > >>>>>>> > >>>>>>>> On 04/23/2014 06:56 AM, Rob Audenaerde wrote: > >>>>>>>> > >>>>>>>> Hi all, > >>>>>>>> > >>>>>>>> I'm looking for a way to use multi-values in a filter. > >>>>>>>> > >>>>>>>> I want to be able to search on sum(field)=100, where field has > >>>> values > >>>>>> in > >>>>>>>> one documents: > >>>>>>>> > >>>>>>>> field=60 > >>>>>>>> field=40 > >>>>>>>> > >>>>>>>> In this case 'field' is a LongField. I examined the code in the > >>>>>>>> FieldCache, > >>>>>>>> but that seems to focus on single-valued fields only, or > >>>>>>>> > >>>>>>>> > >>>>>>>> It this something that can be done in Lucene? And what would be a > >>>> good > >>>>>>>> approach? > >>>>>>>> > >>>>>>>> Thanks in advance, > >>>>>>>> > >>>>>>>> -Rob > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>> --------------------------------------------------------------------- > >>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org > >>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >