Re: Upgrade from 1.3.1 to 1.4.2 => high IO

Matthew Von-Maszewski Wed, 11 Dec 2013 04:35:16 -0800

Ok, I am now suspecting that your servers are either using swap space (which is 
slow) or your leveldb file cache is thrashing (opening and closing multiple 
files per request).


How many servers do you have and do you use Riak's active anti-entropy feature? 
 I am going to plug all of this into a spreadsheet. 

Matthew Von-Maszewski


On Dec 11, 2013, at 7:09, Simon Effenberg <seffenb...@team.mobile.de> wrote:

> Hi Matthew
> 
> Memory: 23999 MB
> 
> ring_creation_size, 256
> max_open_files, 100
> 
> riak-admin status:
> 
> memory_total : 276001360
> memory_processes : 191506322
> memory_processes_used : 191439568
> memory_system : 84495038
> memory_atom : 686993
> memory_atom_used : 686560
> memory_binary : 21965352
> memory_code : 11332732
> memory_ets : 10823528
> 
> Thanks for looking!
> 
> Cheers
> Simon
> 
> 
> 
> On Wed, 11 Dec 2013 06:44:42 -0500
> Matthew Von-Maszewski <matth...@basho.com> wrote:
> 
>> I need to ask other developers as they arrive for the new day.  Does not 
>> make sense to me.
>> 
>> How many nodes do you have?  How much RAM do you have in each node?  What 
>> are your settings for max_open_files and cache_size in the app.config file?  
>> Maybe this is as simple as leveldb using too much RAM in 1.4.  The memory 
>> accounting for maz_open_files changed in 1.4.
>> 
>> Matthew Von-Maszewski
>> 
>> 
>> On Dec 11, 2013, at 6:28, Simon Effenberg <seffenb...@team.mobile.de> wrote:
>> 
>>> Hi Matthew,
>>> 
>>> it took around 11hours for the first node to finish the compaction. The
>>> second node is running already 12 hours and is still doing compaction.
>>> 
>>> Besides that I wonder because the fsm_put time on the new 1.4.2 host is
>>> much higher (after the compaction) than on an old 1.3.1 (both are
>>> running in the cluster right now and another one is doing the
>>> compaction/upgrade while it is in the cluster but not directly
>>> accessible because it is out of the Loadbalancer):
>>> 
>>> 1.4.2:
>>> 
>>> node_put_fsm_time_mean : 2208050
>>> node_put_fsm_time_median : 39231
>>> node_put_fsm_time_95 : 17400382
>>> node_put_fsm_time_99 : 50965752
>>> node_put_fsm_time_100 : 59537762
>>> node_put_fsm_active : 5
>>> node_put_fsm_active_60s : 364
>>> node_put_fsm_in_rate : 5
>>> node_put_fsm_out_rate : 3
>>> node_put_fsm_rejected : 0
>>> node_put_fsm_rejected_60s : 0
>>> node_put_fsm_rejected_total : 0
>>> 
>>> 
>>> 1.3.1:
>>> 
>>> node_put_fsm_time_mean : 5036
>>> node_put_fsm_time_median : 1614
>>> node_put_fsm_time_95 : 8789
>>> node_put_fsm_time_99 : 38258
>>> node_put_fsm_time_100 : 384372
>>> 
>>> 
>>> any clue why this could/should be?
>>> 
>>> Cheers
>>> Simon
>>> 
>>> On Tue, 10 Dec 2013 17:21:07 +0100
>>> Simon Effenberg <seffenb...@team.mobile.de> wrote:
>>> 
>>>> Hi Matthew,
>>>> 
>>>> thanks!.. that answers my questions!
>>>> 
>>>> Cheers
>>>> Simon
>>>> 
>>>> On Tue, 10 Dec 2013 11:08:32 -0500
>>>> Matthew Von-Maszewski <matth...@basho.com> wrote:
>>>> 
>>>>> 2i is not my expertise, so I had to discuss you concerns with another 
>>>>> Basho developer.  He says:
>>>>> 
>>>>> Between 1.3 and 1.4, the 2i query did change but not the 2i on-disk 
>>>>> format.  You must wait for all nodes to update if you desire to use the 
>>>>> new 2i query.  The 2i data will properly write/update on both 1.3 and 1.4 
>>>>> machines during the migration.
>>>>> 
>>>>> Does that answer your question?
>>>>> 
>>>>> 
>>>>> And yes, you might see available disk space increase during the upgrade 
>>>>> compactions if your dataset contains numerous delete "tombstones".  The 
>>>>> Riak 2.0 code includes a new feature called "aggressive delete" for 
>>>>> leveldb.  This feature is more proactive in pushing delete tombstones 
>>>>> through the levels to free up disk space much more quickly (especially if 
>>>>> you perform block deletes every now and then).
>>>>> 
>>>>> Matthew
>>>>> 
>>>>> 
>>>>> On Dec 10, 2013, at 10:44 AM, Simon Effenberg <seffenb...@team.mobile.de> 
>>>>> wrote:
>>>>> 
>>>>>> Hi Matthew,
>>>>>> 
>>>>>> see inline..
>>>>>> 
>>>>>> On Tue, 10 Dec 2013 10:38:03 -0500
>>>>>> Matthew Von-Maszewski <matth...@basho.com> wrote:
>>>>>> 
>>>>>>> The sad truth is that you are not the first to see this problem.  And 
>>>>>>> yes, it has to do with your 950GB per node dataset.  And no, nothing to 
>>>>>>> do but sit through it at this time.
>>>>>>> 
>>>>>>> While I did extensive testing around upgrade times before shipping 1.4, 
>>>>>>> apparently there are data configurations I did not anticipate.  You are 
>>>>>>> likely seeing a cascade where a shift of one file from level-1 to 
>>>>>>> level-2 is causing a shift of another file from level-2 to level-3, 
>>>>>>> which causes a level-3 file to shift to level-4, etc … then the next 
>>>>>>> file shifts from level-1.
>>>>>>> 
>>>>>>> The bright side of this pain is that you will end up with better write 
>>>>>>> throughput once all the compaction ends.
>>>>>> 
>>>>>> I have to deal with that.. but my problem is now, if I'm doing this
>>>>>> node by node it looks like 2i searches aren't possible while 1.3 and
>>>>>> 1.4 nodes exists in the cluster. Is there any problem which leads me to
>>>>>> an 2i repair marathon or could I easily wait for some hours for each
>>>>>> node until all merges are done before I upgrade the next one? (2i
>>>>>> searches can fail for some time.. the APP isn't having problems with
>>>>>> that but are new inserts with 2i indices processed successfully or do
>>>>>> I have to do the 2i repair?)
>>>>>> 
>>>>>> /s
>>>>>> 
>>>>>> one other good think: saving disk space is one advantage ;)..
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> Riak 2.0's leveldb has code to prevent/reduce compaction cascades, but 
>>>>>>> that is not going to help you today.
>>>>>>> 
>>>>>>> Matthew
>>>>>>> 
>>>>>>> On Dec 10, 2013, at 10:26 AM, Simon Effenberg 
>>>>>>> <seffenb...@team.mobile.de> wrote:
>>>>>>> 
>>>>>>>> Hi @list,
>>>>>>>> 
>>>>>>>> I'm trying to upgrade our Riak cluster from 1.3.1 to 1.4.2 .. after
>>>>>>>> upgrading the first node (out of 12) this node seems to do many merges.
>>>>>>>> the sst_* directories changes in size "rapidly" and the node is having
>>>>>>>> a disk utilization of 100% all the time.
>>>>>>>> 
>>>>>>>> I know that there is something like that:
>>>>>>>> 
>>>>>>>> "The first execution of 1.4.0 leveldb using a 1.3.x or 1.2.x dataset
>>>>>>>> will initiate an automatic conversion that could pause the startup of
>>>>>>>> each node by 3 to 7 minutes. The leveldb data in "level #1" is being
>>>>>>>> adjusted such that "level #1" can operate as an overlapped data level
>>>>>>>> instead of as a sorted data level. The conversion is simply the
>>>>>>>> reduction of the number of files in "level #1" to being less than eight
>>>>>>>> via normal compaction of data from "level #1" into "level #2". This is
>>>>>>>> a one time conversion."
>>>>>>>> 
>>>>>>>> but it looks much more invasive than explained here or doesn't have to
>>>>>>>> do anything with the (probably seen) merges.
>>>>>>>> 
>>>>>>>> Is this "normal" behavior or could I do anything about it?
>>>>>>>> 
>>>>>>>> At the moment I'm stucked with the upgrade procedure because this high
>>>>>>>> IO load would probably lead to high response times.
>>>>>>>> 
>>>>>>>> Also we have a lot of data (per node ~950 GB).
>>>>>>>> 
>>>>>>>> Cheers
>>>>>>>> Simon
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> riak-users mailing list
>>>>>>>> riak-users@lists.basho.com
>>>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> Simon Effenberg | Site Ops Engineer | mobile.international GmbH
>>>>>> Fon:     + 49-(0)30-8109 - 7173
>>>>>> Fax:     + 49-(0)30-8109 - 7131
>>>>>> 
>>>>>> Mail:     seffenb...@team.mobile.de
>>>>>> Web:    www.mobile.de
>>>>>> 
>>>>>> Marktplatz 1 | 14532 Europarc Dreilinden | Germany
>>>>>> 
>>>>>> 
>>>>>> Geschäftsführer: Malte Krüger
>>>>>> HRB Nr.: 18517 P, Amtsgericht Potsdam
>>>>>> Sitz der Gesellschaft: Kleinmachnow 
>>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Simon Effenberg | Site Ops Engineer | mobile.international GmbH
>>>> Fon:     + 49-(0)30-8109 - 7173
>>>> Fax:     + 49-(0)30-8109 - 7131
>>>> 
>>>> Mail:     seffenb...@team.mobile.de
>>>> Web:    www.mobile.de
>>>> 
>>>> Marktplatz 1 | 14532 Europarc Dreilinden | Germany
>>>> 
>>>> 
>>>> Geschäftsführer: Malte Krüger
>>>> HRB Nr.: 18517 P, Amtsgericht Potsdam
>>>> Sitz der Gesellschaft: Kleinmachnow 
>>>> 
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> riak-users@lists.basho.com
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>> 
>>> 
>>> -- 
>>> Simon Effenberg | Site Ops Engineer | mobile.international GmbH
>>> Fon:     + 49-(0)30-8109 - 7173
>>> Fax:     + 49-(0)30-8109 - 7131
>>> 
>>> Mail:     seffenb...@team.mobile.de
>>> Web:    www.mobile.de
>>> 
>>> Marktplatz 1 | 14532 Europarc Dreilinden | Germany
>>> 
>>> 
>>> Geschäftsführer: Malte Krüger
>>> HRB Nr.: 18517 P, Amtsgericht Potsdam
>>> Sitz der Gesellschaft: Kleinmachnow 
> 
> 
> -- 
> Simon Effenberg | Site Ops Engineer | mobile.international GmbH
> Fon:     + 49-(0)30-8109 - 7173
> Fax:     + 49-(0)30-8109 - 7131
> 
> Mail:     seffenb...@team.mobile.de
> Web:    www.mobile.de
> 
> Marktplatz 1 | 14532 Europarc Dreilinden | Germany
> 
> 
> Geschäftsführer: Malte Krüger
> HRB Nr.: 18517 P, Amtsgericht Potsdam
> Sitz der Gesellschaft: Kleinmachnow 

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Upgrade from 1.3.1 to 1.4.2 => high IO

Reply via email to