[
https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915713#action_12915713
]
Simon Willnauer commented on LUCENE-2662:
-----------------------------------------
{quote}
How about renaming key back to ord? And then maybe rename values to
bytesStart? And in their decls add comments saying they are indexed
by hash code? And maybe rename addByOffset -> addByBytesStart?
{quote}
I don't like addByBytesStart I would like to keep offset since it really is an
offset into the pool. addByPoolOffset?
The names ord and bytesStart are a good compromise :) lets shoot for that.
{quote}
On the nocommit in ByteBlockPool - I think that's fine? It's an
internal class....
{quote}
you refer to this: // nocommit - public arrays are not nice! ?
yeah that more of an style thing but if somebody changes them its their fault
for being stupid I guess.
{quote}
The nocommit in BytesRefHash seems wrong? (Ie, compact is used
internally)... though maybe we make it private if it's not used
externally?
{quote}
Ah yeah thats bogus - its from a previous iteration which was wrong as well, I
will remove.
{quote}
On the "nocommit factor this out!" in THPF.java... I agree, the
postingsArray.textStarts should go away right? Ie, it's a
[wasteful] copy of what the BytesRefHash is already storing?
{quote}
Yeah that is the reason for that nocommit. Yet, I though about this a little
and I have two options for this.
* we could factor out a super class from ParallelPostingArray which only has
the textStart int array, the grow and copy method and let ParallelPostingArray
subclass it.
BytesRefHash would accept this class, don't have a good name for it but lets
call it TextStartArray for now, and use it internally. It would call grow()
once needed inside BytesRefHash and all the other code would be unchanged since
PPA is a subclass.
* the other way would be to bind the ByteRefHash to the postings array which
seems odd to me though.
More ideas?
{quote}
Can we impl BytesRefHash.bytesUsed as an AtomicLong (hmm maybe
AtomicInt - none of these classes can address > 2GB)? Then the
pool would add in blockSize every time it binds a new block. That
method (DW.bytesUsed) is called alot - at least once on every
addDoc.
{quote}
I did exactly that in the not yet uploaded patch. But I figured that it would
maybe make more sense to use that AtomicInt in the allocator as well as in THPF
or is that what you mean?
{quote}
I'm confused again - when do we use RecyclingByteBlockAllocator
from a single thread...? Ie, why did the sync need to be
conditional for this class, again....? It seems like we always
need it sync'd (both the main pool & per-doc pool need this)? If
so we can simplify and make these methods sync'd?
{quote}
man, I am sorry - I thought I will use this in LUCENE-2186 in a single
threaded env but if so I should change it there if needed. I was one step ahead
though.
I will change and maybe have a second one if needed. Agree?
simon
> BytesHash
> ---------
>
> Key: LUCENE-2662
> URL: https://issues.apache.org/jira/browse/LUCENE-2662
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Affects Versions: Realtime Branch, 4.0
> Reporter: Jason Rutherglen
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: Realtime Branch, 4.0
>
> Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch,
> LUCENE-2662.patch
>
>
> This issue will have the BytesHash separated out from LUCENE-2186
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]