Dear Lustre-Users,

Recently we had an issue with file data distribution over our Lustre OSTs. We 
have a Lustre storage cluster here, of two OSS servers in active-active 
failover mode. The version of luster is 1.8, possibly with DDN patches. 

The cluster has 12 OSTs, 7.3Tb each. Normally, they are occupied to about 60% 
of the space (4.5Tb or so); but recently, one of them got completely filled 
(99%) with two other also keeping up (80%). The rest of OSTs stayed at the 
usual 60%. 

Why would that happen, shouldn't' Lustre try to distribute the space evenly? I 
have checked the filled OSTs for large files; there were no files that can be 
called large enough to explain the difference (with size of the order of 
magnitude of the difference between 99% and 60% occupation, i.e. 2-3Tb); some 
users did have large directories, but the files were of about 5-10Gb size.

I have checked our Lustre parameters, the qos_prio_free seems to be default 
90%, qos_threshold_rr is 16%, and stripe count is 1. 

Could you please suggest what might have caused such behavior of Lustre, are 
there any tunables/better values of tresholds, etc. to change to avoid such 
imbalances, etc.? 

Thank you very much in advance!

--
Grigory Shamov
HPC Analyst,
University of Manitoba
Winnipeg MB Canada

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to