[ 
https://issues.apache.org/jira/browse/SOLR-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926290#action_12926290
 ] 

Yonik Seeley commented on SOLR-2068:
------------------------------------

Going back over my old notes on how to efficiently do a string field 
per-segment:

Phase1:
 - Basically, hash based on ord (or a direct index lookup if the # of ords is 
small enough).  We don't look up the value of the string at this point.
 - When a segment changes, we need to convert the ords from the old segment to 
the new segment (i.e. look up it's value in the old segment, and find the ord 
of that in the new segment).
   - if the group value is not found in the new segment, the remove it from the 
hash.  Keep it in the ordered map since it can still be pushed out by other 
insertions.

Phase 2:
 - at the start of each segment, look up the ords for the values and hash the 
group based on that ord (or leave it out of the hash if it didn't exist in that 
segment).

Martijn's optimization in SOLR-2205 probably made Phase1 less important (except 
if there are very few unique groups), so perhaps we should start with Phase2 
first.


> Search Grouping: collapse by string specialization
> --------------------------------------------------
>
>                 Key: SOLR-2068
>                 URL: https://issues.apache.org/jira/browse/SOLR-2068
>             Project: Solr
>          Issue Type: Sub-task
>            Reporter: Yonik Seeley
>
> Create specialized implementations for collapsing by an indexed string field.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to