On Thu, Nov 6, 2014 at 3:19 PM, Robert Muir <[email protected]> wrote:
> Do the concatenation yourself with your own TokenStream. You can index > a field with a tokenstream for expert cases (the individual stored > values can be added separately) > Yes, but that’s quite awkward and a fair amount of surrounding code when, in the end, it could be so much simpler if somehow the TokenStream could be notified. I’d feel a little better about it if Lucene included the tokenStream concatenating code (I’ve done a prototype for this, I could work on it more and contribute) and if the Solr layer had a nice way of presenting all the values to the Solr FieldType at once instead of separately — SOLR-4329. > No need to make the tokenstream API more complicated: its already very > complicated. > Ehh, that’s arguable. Steve’s suggestion amounts to one line of production code (javadoc & test is separate). If that’s too much then adding a boolean argument to reset() would feel cleaner, be 0 lines of new code, but would be backwards-incompatible. Shrug. Another idea is if Field.tokenStream(Analyzer analyzer, TokenStream reuse) had another boolean to indicate first value or not. I think I like the other ideas better though. > > On Thu, Nov 6, 2014 at 3:13 PM, [email protected] > <[email protected]> wrote: > > Are you suggesting that DefaultIndexingChain.PerField.invert(boolean > > firstValue) would, prior to calling reset(), call > > setPositionIncrement(Integer.MAX_VALUE), but only when ‘firstValue’ is > > false? Hmmmm. I guess that would work, although it seems a bit hacky > and > > it’s tying this to a specific attribute when ideally we notify the chain > as > > a whole what’s going on. But it doesn’t require any new API, save for > some > > javadocs. And it’s extremely unlikely there would be a > > backwards-incompatible problem, so that’s good. And I find this use is > > related to positions so it’s not so bad to abuse the position increment > for > > this. Nice idea Steve; this works for me. > > > > Does anyone else have an opinion before I create an issue? > > > > ~ David Smiley > > Freelance Apache Lucene/Solr Search Consultant/Developer > > http://www.linkedin.com/in/davidwsmiley > > > > On Thu, Nov 6, 2014 at 2:13 PM, Steve Rowe <[email protected]> wrote: > >> > >> Maybe the position increment gap would be useful? If set to a value > >> larger than likely max position for any individual value, it could be > used > >> to infer (non-)first-value-ness. > >> > >> > On Nov 5, 2014, at 1:03 PM, [email protected] wrote: > >> > > >> > Several times now, I’ve had to come up with work-arounds for a > >> > TokenStream not knowing it’s processing the first value or a > >> > subsequent-value of a multi-valued field. Two of these times, the > use-case > >> > was ensuring the first position of each value started at a multiple > of 1000 > >> > (or some other configurable value), and the third was encoding > sentence > >> > paragraph counters (similar to a do-it-yourself position increment). > >> > > >> > The work-arounds are awkward and hacky. For example if you’re in > >> > control of your Tokenizer, you can prefix subsequent values with a > special > >> > flag, and then do the right think in reset(). But then the > highlighter or > >> > value retrieval in general is impacted. It’s also possible to create > the > >> > fields with the constructor that accepts a TokenStream that you’ve > told it’s > >> > the first or subsequent value but it’s awkward going that route, and > >> > sometimes (e.g. Solr) it’s hard to know all the values you have > up-front to > >> > even do that. > >> > > >> > It would be nice if TokenStream.reset() took a boolean ‘first’ > argument. > >> > Such a change would obviously be backwards incompatible. Simply > overloading > >> > the method to call the no-arg version is problematic because > TokenStreams > >> > are a chain, and it would likely result in the chain getting > doubly-reset. > >> > > >> > Any ideas? > >> > > >> > ~ David Smiley > >> > Freelance Apache Lucene/Solr Search Consultant/Developer > >> > http://www.linkedin.com/in/davidwsmiley > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [email protected] > >> For additional commands, e-mail: [email protected] > >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
