On 5/25/23 22:12, Igor Fedotov wrote:

On 25/05/2023 20:36, Stefan Kooman wrote:
On 5/25/23 18:17, Igor Fedotov wrote:
Perhaps...

I don't like the idea to use fragmentation score as a real index. IMO it's mostly like a very imprecise first turn marker to alert that something might be wrong. But not a real quantitative high-quality estimate.

Chiming in on the high fragmentation issue. We started collecting "fragmentation_rating" of each OSD this afternoon. All OSDs that have been provisioned a year ago have a fragmentation rating of ~ 0.9. Not sure for how long they are on this level.

Could you please collect allocation probes from existing OSD logs? Just a few samples from different OSDs...

10 OSDs from one host, but I have checked other nodes and they are similar:

CNT     FRAG    Size    Ratio   Avg Frag size
21350923        37146899        317040259072    1.73982637659271        
8534.77053554322
20951932        38122769        317841477632    1.8195347808498 8337.31352599283
21188454        37298950        278389411840    1.76034315670223        
7463.73321072041
21605451        39369462        270427185152    1.82220042525379        
6868.95810646333
19215230        36063713        290967818240    1.87682962941375        
8068.16032059705
19293599        35464928        269238423552    1.83817068033807        
7591.68109835159
19963538        36088151        315796836352    1.80770317365589        
8750.70702159277
18030613        31753098        297826177024    1.76106591606176        
9379.43683554909
17889602        31718012        299550142464    1.77298589426417        
9444.16511551859
18475332        33264944        266053271552    1.80050588536109        
7998.0074985847
18618154        31914219        254801883136    1.71414518324427        
7983.96110323113
16437108        29421873        275350355968    1.78996651965784        
9358.69568766067
17164338        28605353        249404649472    1.66655731202683        
8718.81040838755
17895480        29658102        309047177216    1.65729569701399        
10420.3288941416
19546560        34588509        301368737792    1.76954456436324        
8712.97279081905
18525784        34806856        314875801600    1.87883309014075        
9046.37297893266
18550989        35236438        273069948928    1.89943716747393        
7749.64679823767
19085807        34605572        255512043520    1.81315738967705        
7383.55209155335
17203820        31205542        277097357312    1.81387284916954        
8879.74826112618
18003801        33723670        269696761856    1.87314167713807        
7997.25420916525
18655425        33227176        306511810560    1.78109992133655        
9224.7325069094
26380965        45627920        335281111040    1.72957736762093        
7348.15680925188
24923956        44721109        328790982656    1.79430219664968        
7352.03106559813
25312482        43035393        287792226304    1.70016488308021        
6687.33817079351
25841471        46276699        288168476672    1.79079197929561        
6227.07502693742
25618384        43785917        321591488512    1.70915999229303        
7344.63294469772
26006097        45056206        298747666432    1.73252472295247        
6630.55532088077
26684805        45196730        351100243968    1.69372532420604        
7768.26650883814
24025872        42450135        353265467392    1.76685095966548        
8321.89267223768
24080466        45510525        371726323712    1.88993539410741        
8167.91991988666
23195936        45095051        326473826304    1.94409274969546        
7239.68193990955
23653302        43312705        307549573120    1.83114835298683        
7100.67803707942
21589455        40034670        322982109184    1.85436223378497        
8067.56017182107
22469039        42042723        314323701760    1.87114023879704        
7476.29266924504
23647633        43486098        370003841024    1.83891969230071        
8508.55464254346
23750561        37387139        320471453696    1.57415814304344        
8571.70305799542
23142315        38640274        329341046784    1.66968058294946        
8523.25857689312
23539469        39573256        292528910336    1.68114480407353        
7392.08596674481
23810938        37968499        277270380544    1.59458224619291        
7302.64266027477
19361754        33610252        286391676928    1.73590946357443        
8520.96190555191
20331818        34119736        256076865536    1.67814486633709        
7505.24170339419
21017537        35862221        318755282944    1.70629988661374        
8888.33078531305
21660731        42648077        329217507328    1.96891217567865        
7719.39863380007
20708620        42285124        344562262016    2.04190931119505        
8148.54562129225
21371937        43158447        312754188288    2.01939800777066        
7246.65065654471
21447150        40034134        283613331456    1.86664120873869        
7084.28790931259
18906469        36598724        302526169088    1.93577785465916        
8266.03050663734
20086704        36824872        280208515072    1.83329589563325        
7609.21898308296
20912511        40116356        340019290112    1.91829455582833        
8475.82691987278
17728197        30717152        270751887360    1.73267208165613        
8814.35516417668
16778676        30875765        267493560320    1.84017886751017        
8663.54437922429
17700395        31528725        239652761600    1.78124414737637        
7601.09270514428
17727766        31338207        232399462400    1.76774710361136        
7415.85063880649
15488369        27225173        246367821824    1.75778179096844        
9049.26561252705
16332731        29287976        227973730304    1.7932075168568 7783.86769724204
17043318        31659676        274151649280    1.85760049774346        
8659.33211950748
21627836        34504152        279215091712    1.59535850003671        
8092.2171833697
21244729        35619286        303324131328    1.67661757417569        
8515.72744405938
22132156        38534232        281272401920    1.74109707160929        
7299.28656473548
22035014        34627308        246920048640    1.57146748352418        
7130.78962534425
20277457        33126067        265162657792    1.63364010585746        
8004.65258347754
20669142        34587911        254815776768    1.67340816566067        
7367.1918714027
21648239        34364823        292156514304    1.58741886580243        
8501.61557078295
21117643        34737044        292367892480    1.64492997632359        
8416.60253186771
20531946        37038043        292538568704    1.8039226773731 7898.32682855301
21393711        35682241        257189515264    1.66788459468299        
7207.77361668512
21738966        34753281        252140285952    1.59866301828707        
7255.1505554828
19197606        32922066        269381632000    1.71490476468785        
8182.40361950553
20044574        33864896        245486792704    1.68947945713389        
7249.00477190304
20601681        35851902        305202065408    1.74024158514055        
8512.85561943129
                        1.76995040322111        8014.69622126768


So average fragment size is around 8 KiB, and the ratio between requests / fragments a bit lower than two.





And after reading your mails it might not be a problem at all. But we will start collecting this information in the coming weeks.

We will be re-provisioning all our OSDs, so that might be a good time to look at the behavior and development of "cnt versus frags" ratio.

After we completely emptied a host, even after having the OSDs run idle for a couple of hours, the fragmentation ratio would not drop lower than 0.27 for some OSDs, and up to 0.62 for others. Is it expected that this will not go to ~ zero?

You might be facing the issue fixed by https://github.com/ceph/ceph/pull/49885

Possibly.


I have read some tracker tickets that got mentioned in PRs [1,2]. The problem seems to reveal itself in Pacific release. I wonder if this has something to do with the change in default allocator: bitmap -> hybrid in Pacific.

BlueFS 4K allocation unit will not be backported to Pacific [3]. Would it make sense to skip re-provisiong OSDs in Pacific altogether and do re-provisioning in Quincy release with BlueFS 4K alloc size support [4]?

Gr. Stefan

[1]: https://tracker.ceph.com/issues/58022
[2]: https://tracker.ceph.com/issues/57672
[3]: https://tracker.ceph.com/issues/58589
[4]: https://tracker.ceph.com/issues/58588
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to