Hi Gordon, I have limited knowledge of configuring Innostore but can help answer some of your merge_index questions.
The most important merge_index setting in terms of memory usage is 'buffer_rollover_size'. This affects how large the buffer is allowed to grow, in bytes, before getting converted to an on-disk segment. Each partition maintains a separate buffer, so any increases to this number will be multiplied by the number of partitions in your system. The higher this number, the less frequently merge_index will need to perform compactions. The second most important settings for memory usage are a combination of 'segment_full_read_size' and 'max_compact_segments'. During compaction, the system will completely page any segments smaller than the 'segment_full_read_size' value into memory. This should generally be as large or larger than the 'buffer_rollover_size'. The higher this number, the quicker each compaction will be. 'max_compact_segments' is the maximum number of segments to compact at one time. The higher this number, the more segments merge_index can involve in each compaction. In the worst case, a compaction could take ('segment_full_read_size' * 'max_compact_segments') bytes of RAM. The rest of the settings have a much smaller impact on performance and memory usage, and exist mainly for tweaking and special cases. This is a completely unscientific estimate based on observing other Riak Search applications, but I'd set buffer_rollover_size so that (# Partitions * buffer_rollover_size) is about one-half the memory you wish for merge_index to consume, hopefully somewhere between 1M and 10M. The rest of the memory will be used by in-memory offset tables, compaction processes, and during query operations. Hope that helps. Best, Rusty On Mon, May 23, 2011 at 2:05 PM, Gordon Tillman <gtill...@mezeo.com> wrote: > Greetings! > > We are working with a riaksearch cluster that uses innostore as the primary > backend in tandem with merge_index that is required by search. From reading > the Basho wiki it looks like the following are the most important factors > affecting memory and performance: > > • innostore > • put data_home_dir and log_group_home_dir on different > spindles > • noatime > • buffer_pool_size > • flush_method > • merge_index > • data_root > • buffer_rollover_size > • max_compact_segments > • segment_file_buffer_size > • segment_full_read_size > • segment_block_size > > Ideally, data_home_dir, log_group_home_dir, and data_root would all be on > different spindles, but if you had just 2 disks available what would you > recommend? Would it be best to have data_home_dir and data_root on one and > then log_group_home_dir on the other? > > in calculating the proper setting for buffer_pool_size you are directed to > allocate 60-80 percent of available RAM. So lets assume you want to take > the remaining 20-40% of available RAM and split it up between innostore and > merge_index? > > Would it be best to give each of them half of that value? > > Determining the approximate memory requirements for merge_index isn't (to > me) real obvious. I looks like the following all have an effect: > > * buffer_rollover_size > * buffer_delayed_write_size > * max_compact_segments > * segment_query_read_ahead_size > * segment_compaction_read_ahead_size > * segment_full_read_size > * segment_block_size > * segment_values_staging_size > > Is there a formula for determining the (approximate) proper values to use > given a certain amount of available RAM? > > Thanks in advance for any advice. Sorry for all the questions! > > --gordon > > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com