[ https://issues.apache.org/jira/browse/SOLR-12723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17897805#comment-17897805 ]
David Smiley commented on SOLR-12723: ------------------------------------- Exposing a plain array out of DocCollection is dangerous due to its mutability. While working on some code recently, I almost ended up modifying this array in a spot further removed where it was obtained from. It has me wondering if there might be such an innocent mistake like this somewhere; it would be such a nasty subtle bug for this to slip through! Using an unmodifiable List wrapper would be sufficient and address the goals of this JIRA issue. I admit though; I'm a suspicious there is actually a performance problem with an Iterator, and only somewhat less suspicious when this was done 6 years ago. Iterators are often locally scoped, not escaping and thus not burdening GC for modern Java. And in the context of search and indexing, this is one of the smallest things I can imagine to worry about. It was uncovered in a simulation framework that magnifies things that normally are trivial. Proposal: change the type signature of org.apache.solr.common.cloud.DocCollection#getActiveSlices from Collection<Slice> to List<Slice> with an implementation that's unmodifiable. Remove getActiveSlicesArr. Furthermore, consider ordering the slices by range and documenting that order, or leave out of scope to another time. HashBasedRouter does a linear scan but it could do in {{{}O(Log(N){}}}. Not sure if there should be an ordering consistency guarantee between the shard ID/name and the hash. > Reduce object creation in HashBasedRouter > ----------------------------------------- > > Key: SOLR-12723 > URL: https://issues.apache.org/jira/browse/SOLR-12723 > Project: Solr > Issue Type: Improvement > Reporter: Andrzej Bialecki > Assignee: Andrzej Bialecki > Priority: Major > Fix For: 7.5 > > Attachments: SOLR-12723.patch > > > When the default {{CompositeIdRouter}} is used it calls > {{HashBasedRouter.hashToSlice}} method for every update, which obtains a > collection of active slices from the current {{DocCollection}} and then > iterates over it checking what range contains the document's id hash. > Each time this creates a new iterator, which is wasteful - a much lightweight > approach would be to construct a {{Slice[]}} when {{DocCollection}} is > constructed and use indexed access to this array. > This change has especially visible impact on simulator performance for large > scale tests, where other costs are not present. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org