Re: Iterating BinaryDocValues

Robert Muir Fri, 10 Jan 2014 11:55:25 -0800

Yeah, i dont think its from newer docvalues-using code like yours shai.

instead the problems i had doing this are historical, because e.g.
fieldcache pointed to large arrays and consumers were lazy about it,
knowing that there reference pointed to bytes that would remain valid
across invocations.


we just have to remove these assumptions. I don't apologize for not doing
this, as you show, its some small % improvement (which we should go and get
back!), but i went with safety first initially rather than bugs.


On Fri, Jan 10, 2014 at 2:50 PM, Shai Erera <[email protected]> wrote:

> I agree with Robert. We should leave cloning BytesRefs to whoever needs
> that, and not penalize everyone else who don't need it. I must say I didn't
> know I can "own" those BytesRefs and I clone them whenever I need to. I
> think I was bitten by one of the other APIs, so I assumed returned
> BytesRefs are not "mine" across all the APIs.
>
> Shai
>
>
> On Fri, Jan 10, 2014 at 9:46 PM, Robert Muir <[email protected]> wrote:
>
>> the problem is really simpler to solve actually.
>>
>> Look at the comments in the code, it tells you why it is this way:
>>
>>           // NOTE: we could have one buffer, but various consumers (e.g.
>> FieldComparatorSource)
>>           // assume "they" own the bytes after calling this!
>>
>> That is what we should fix. There is no need to make bulk APIs or even
>> change the public api in any way (other than javadocs).
>>
>> We just move the clone'ing out of the codec, and require the consumer to
>> do it, same as termsenum or other apis. The codec part is extremely simple
>> here, its even the way i had it initially.
>>
>> But at the time (and even still now) this comes with some risk of bugs.
>> So initially I removed the reuse and went with a more conservative approach
>> to start with.
>>
>>
>> On Fri, Jan 10, 2014 at 2:36 PM, Mikhail Khludnev <
>> [email protected]> wrote:
>>
>>> Adrian,
>>>
>>> Please find bulkGet() scratch. It's ugly copy-paste, just reuses ByteRef
>>> that provides 10% gain.
>>> ...
>>> bulkGet took:101630 ms
>>> ...
>>> get took:114422 ms
>>>
>>>
>>>
>>> On Fri, Jan 10, 2014 at 8:58 PM, Adrien Grand <[email protected]> wrote:
>>>
>>>> I don't think we should add such a method. Doc values are commonly
>>>> read from collectors, so why do we need a method that works on top of
>>>> a DocIdSetIterator? I'm also curious how specialized implementations
>>>> could make this method faster than the default implementation?
>>>>
>>>> --
>>>> Adrien
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>>
>>>
>>>
>>> --
>>> Sincerely yours
>>> Mikhail Khludnev
>>> Principal Engineer,
>>> Grid Dynamics
>>>
>>> <http://www.griddynamics.com>
>>>  <[email protected]>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>
>>
>

Re: Iterating BinaryDocValues

Reply via email to