Re: How to choose BinId for Document partitioned index

Josh Elser Sat, 06 Feb 2016 11:20:46 -0800

You can get *really* fancy if you have lots of ingesters and lots ofservers, include some attribute in the data you're hashing to controlhow many servers a given client will need to write to for some batch ofdocuments. This is probably overkill for most setups though.

Guava provides a decent murmur3 implementation which will be much fasterthan your run-of-the-mill MD5 for generating the hash (which you'll modby the max number of bins).


William Slacum wrote:

Often it'll be a hash of the document mod the number of bins you're
using. The hash should be "good" in the sense that it uniquely
identifies the document. It can be as simple as some unique field in the
document or just a hash (like murmur) of the whole document.

On Saturday, February 6, 2016, Jamie Johnson <[email protected]
<mailto:[email protected]>> wrote:

    Just found this excellent write up that explains a bit.

    https://www.slideshare.net/mobile/acordova00/text-indexing-in-accumulo

    On Feb 6, 2016 8:52 AM, "Jamie Johnson" <[email protected]
    <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:

        Reading the examples for table design I've come across a
        question associated with the document partitioned index,
        specifically what is typically chosen as the BinId or maybe more
        appropriately what factors should influence what is chosen as
        the BinId and what impact do they have?

Re: How to choose BinId for Document partitioned index

Reply via email to