Morning Rusty, Thanks very much for your time and trouble. Great info, very helpful, and very timely!
Regards, --gordon On May 24, 2011, at 16:32 , Rusty Klophaus wrote: Hi Gordon, I have limited knowledge of configuring Innostore but can help answer some of your merge_index questions. The most important merge_index setting in terms of memory usage is 'buffer_rollover_size'. This affects how large the buffer is allowed to grow, in bytes, before getting converted to an on-disk segment. Each partition maintains a separate buffer, so any increases to this number will be multiplied by the number of partitions in your system. The higher this number, the less frequently merge_index will need to perform compactions. The second most important settings for memory usage are a combination of 'segment_full_read_size' and 'max_compact_segments'. During compaction, the system will completely page any segments smaller than the 'segment_full_read_size' value into memory. This should generally be as large or larger than the 'buffer_rollover_size'. The higher this number, the quicker each compaction will be. 'max_compact_segments' is the maximum number of segments to compact at one time. The higher this number, the more segments merge_index can involve in each compaction. In the worst case, a compaction could take ('segment_full_read_size' * 'max_compact_segments') bytes of RAM. The rest of the settings have a much smaller impact on performance and memory usage, and exist mainly for tweaking and special cases. This is a completely unscientific estimate based on observing other Riak Search applications, but I'd set buffer_rollover_size so that (# Partitions * buffer_rollover_size) is about one-half the memory you wish for merge_index to consume, hopefully somewhere between 1M and 10M. The rest of the memory will be used by in-memory offset tables, compaction processes, and during query operations. Hope that helps. Best, Rusty On Mon, May 23, 2011 at 2:05 PM, Gordon Tillman <gtill...@mezeo.com<mailto:gtill...@mezeo.com>> wrote: Greetings! We are working with a riaksearch cluster that uses innostore as the primary backend in tandem with merge_index that is required by search. From reading the Basho wiki it looks like the following are the most important factors affecting memory and performance: • innostore • put data_home_dir and log_group_home_dir on different spindles • noatime • buffer_pool_size • flush_method • merge_index • data_root • buffer_rollover_size • max_compact_segments • segment_file_buffer_size • segment_full_read_size • segment_block_size Ideally, data_home_dir, log_group_home_dir, and data_root would all be on different spindles, but if you had just 2 disks available what would you recommend? Would it be best to have data_home_dir and data_root on one and then log_group_home_dir on the other? in calculating the proper setting for buffer_pool_size you are directed to allocate 60-80 percent of available RAM. So lets assume you want to take the remaining 20-40% of available RAM and split it up between innostore and merge_index? Would it be best to give each of them half of that value? Determining the approximate memory requirements for merge_index isn't (to me) real obvious. I looks like the following all have an effect: * buffer_rollover_size * buffer_delayed_write_size * max_compact_segments * segment_query_read_ahead_size * segment_compaction_read_ahead_size * segment_full_read_size * segment_block_size * segment_values_staging_size Is there a formula for determining the (approximate) proper values to use given a certain amount of available RAM? Thanks in advance for any advice. Sorry for all the questions! --gordon _______________________________________________ riak-users mailing list riak-users@lists.basho.com<mailto:riak-users@lists.basho.com> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com