Hsieh, This sounds similar to a bug with pre-2.5 servers and 2.7 (or newer) clients. The client and server have a disagreement about which does the delete, and the delete doesn’t happen. Since you’re running 2.5, I don’t think you should see this, but the symptoms are the same. You can temporarily fix things by restarting/remounting your OST(s), which will trigger orphan cleanup. But if that works, the only long term fix is to upgrade your servers to a version that is expected to work with your clients. (The 2.10 maintenance release is nice if you are not interested in the newest features, otherwise, 2.12 is also an option.)
I would also recommend where possible that you keep clients and servers in sync - we do interop testing, but same version on both is much more widely used. - Patrick ________________________________ From: lustre-discuss <[email protected]> on behalf of Tung-Han Hsieh <[email protected]> Sent: Sunday, March 3, 2019 4:00:17 AM To: [email protected] Subject: [lustre-discuss] Data migration from one OST to anther Dear All, We have a problem of data migration from one OST two another. We have installed Lustre-2.5.3 on the MDS and OSS servers, and Lustre-2.8 on the clients. We want to migrate some data from one OST to another in order to re-balance the data occupation among OSTs. In the beginning we follow the old method (i.e., method found in Lustre-1.8.X manuals) for the data migration. Suppose we have two OSTs: root@client# /opt/lustre/bin/lfs df UUID 1K-blocks Used Available Use% Mounted on chome-OST0028_UUID 7692938224 7246709148 55450156 99% /work[OST:40] chome-OST002a_UUID 14640306852 7094037956 6813847024 51% /work[OST:42] and we want to migrate data from chome-OST0028_UUID to chome-OST002a_UUID. Our procedures are: 1. We deactivate chome-OST0028_UUID: root@mds# echo 0 > /opt/lustre/fs/osc/chome-OST0028-osc-MDT0000/active 2. We find all files located in chome-OST0028_UUID: root@client# /opt/lustre/bin/lfs find --obd chome-OST0028_UUID /work > list 3. In each file listed in the file "list", we did: cp -a <file> <file>.tmp mv <file>.tmp <file> During the migration, we really saw that more and more data written into chome-OST002a_UUID. But we did not see any disk release in chome-OST0028_UUID. In Lustre-1.8.X, doing this way we did saw that chome-OST002a_UUID has more data coming in, and chome-OST0028_UUID has more and more free space. It looks like that the data files referenced by MDT have copied to chome-OST002a_UUID, but the junks still remain in chome-OST0028_UUID. Even though we activate chome-OST0028_UUID after migration, the situation is still the same: root@mds# echo 1 > /opt/lustre/fs/osc/chome-OST0028-osc-MDT0000/active Is there any way to cure this problem ? Thanks very much. T.H.Hsieh _______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
