[
https://issues.apache.org/jira/browse/LUCENE-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14014954#comment-14014954
]
Uwe Schindler edited comment on LUCENE-5722 at 6/1/14 11:21 AM:
----------------------------------------------------------------
bq. Sorry Uwe, we shouldn't add extra conditionals to seek here, its the entire
point of the issue!!!!!!!!
Can you please quote, what you are referring to? Thanks.
Do you mean that one:
bq. There is one thing, we might want to add an assert: In the single buffer
case, there is the slight chance to not catch an exception, if the cast from
the seek offset to int luckily gets into the valid slice area. Maybe we should
not add a hard check, but for our own safety while writing the code, we should
maybe check that the long offset is <= Integer.MAX_VALUE.
(this is just an assert to catch bugs while testing, I dont want to add a hard
check.)
or that one:
bq. In any case, we might improve the multi-buffer seek, too: if we already
know before that we will land in the same buffer - we could maybe do this in a
small check at the begining of the seek method: If we hit the same buffer, just
do curBuffer.position() and spare out the whole other stuff (which does many
assignments and additional checks). I will think about this a bit more...
Don't get me wrong, I was just adding ideas. I know what this issue is about,
but sometimes one additional check that triggers the "common case" is better
than having crazy code executed on every call. My idea is basically, we can
throw it away:
Under the assumption that we seek in most cases not accross byte buffers, the
overhead of seek could be completely optimized away with one additional check:
we may add another "state" variable with the (long) start offset of the current
buffer (curBufOffset - unfortunately, this is one more state field that we need
to maintain). On seek we maybe check that seek's position parameter minus
current buffers start offset is positive and less than Integer.MAX_VALUE away -
this is one check: {{(difference&Integer.MAX_VALUE)==difference}}. If this is
the case, we can directly optimize to only call position() on the current
buffer. If any Exception happens we fall-through to our dumb seek (because it
is not always the EOF case).
Instead of throwing that away because you don't like it, I wanted to do a
benchmark. Don't just say: "this is bullshit" I try hard to think about an
optimization of the cross-buffer seek, too. There must be a way to do this!
Maybe you have another idea instead of mine, but please don't tell me that I
don't know what I am talking about! In the ideal case I would like to do no
checks at all and let ByteBuffer throw exceptions if we seek outside the
current buffer. But this is, as far as I know, impossible because of the cast
from long to int (it would be nice to have something like the x86 carry-flag
(http://en.wikipedia.org/wiki/Carry_flag) in Java, too, so you could detect
overflow while casting long->int)..
We can also maybe move the negative check on the beginning of seek down or do
it more intelligent?
was (Author: thetaphi):
bq. Sorry Uwe, we shouldn't add extra conditionals to seek here, its the entire
point of the issue!!!!!!!!
Can you please quote, what you are referring to? Thanks.
Do you mean that one:
bq. There is one thing, we might want to add an assert: In the single buffer
case, there is the slight chance to not catch an exception, if the cast from
the seek offset to int luckily gets into the valid slice area. Maybe we should
not add a hard check, but for our own safety while writing the code, we should
maybe check that the long offset is <= Integer.MAX_VALUE.
(this is just an assert to catch bugs while testing, I dont want to add a hard
check.)
or that one:
bq. In any case, we might improve the multi-buffer seek, too: if we already
know before that we will land in the same buffer - we could maybe do this in a
small check at the begining of the seek method: If we hit the same buffer, just
do curBuffer.position() and spare out the whole other stuff (which does many
assignments and additional checks). I will think about this a bit more...
Don't get me wrong, I was just adding ideas. I know what this issue is about,
but sometimes one additional check that triggers the "common case" is better
than having crazy code executed on every call. My idea is basically, we can
throw it away:
Under the assumption that we seek in most cases not accross byte buffers, the
overhead of seek could be completely optimized away with one additional check:
we may add another "state" variable with the (long) start offset of the current
buffer (curBufOffset - unfortunately, this is one more state field that we need
to maintain). On seek we maybe check that seek's position parameter minus
current buffers start offset is positive and less than Integer.MAX_VALUE away -
this is one check: {{(difference&Integer.MAX_VALUE)==(long)Integer.MAX_VALUE}}.
If this is the case, we can directly optimize to only call position() on the
current buffer. If any Exception happens we fall-through to our dumb seek
(because it is not always the EOF case).
Instead of throwing that away because you don't like it, I wanted to do a
benchmark. Don't just say: "this is bullshit" I try hard to think about an
optimization of the cross-buffer seek, too. There must be a way to do this!
Maybe you have another idea instead of mine, but please don't tell me that I
don't know what I am talking about! In the ideal case I would like to do no
checks at all and let ByteBuffer throw exceptions if we seek outside the
current buffer. But this is, as far as I know, impossible because of the cast
from long to int (it would be nice to have something like the x86 carry-flag
(http://en.wikipedia.org/wiki/Carry_flag) in Java, too, so you could detect
overflow while casting long->int)..
We can also maybe move the negative check on the beginning of seek down or do
it more intelligent?
> Speed up MMapDirectory.seek()
> -----------------------------
>
> Key: LUCENE-5722
> URL: https://issues.apache.org/jira/browse/LUCENE-5722
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Robert Muir
> Attachments: LUCENE-5722.patch
>
>
> For traditional lucene access which is mostly sequential, occasional
> advance(), I think this method gets drowned out in noise.
> But for access like docvalues, its important. Unfortunately seek() is complex
> today because of mapping multiple buffers.
> However, the very common case is that only one map is used for a given clone
> or slice.
> When there is the possibility to use only a single mapped buffer, we should
> instead take advantage of ByteBuffer.slice(), which will adjust the internal
> mmap address and remove the offset calculation. furthermore we don't need the
> shift/mask or even the negative check, as they are then all handled with the
> ByteBuffer api: seek is a one-liner (with try/catch of course to convert
> exceptions).
> This makes docvalues access 20% faster, I havent tested conjunctions or
> anyhting like that.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]