Bug#536823: I also see these losses of the uptimed database

Martin Steigerwald Tue, 07 Dec 2010 04:33:19 -0800

Ted, I cc'd you. Could you please have a look at the save_records function 
in the middle of my mail and tell us whether its safe to use on Ext4 at 
least. I understand there might be a problem when using it on XFS, as XFS 
doesn't cover the rename case. Thanks.



Hi!

It ate it, about 13 days ago - on my ThinkPad T42:

shambhala:~> uprecords | cut -c1-66
     #               Uptime | System                              
----------------------------+-------------------------------------
     1    10 days, 21:01:41 | Linux 2.6.37-rc3-tp42     Fri Nov 26
     2     2 days, 02:09:03 | Linux 2.6.37-rc3-tp42     Wed Nov 24
     3     0 days, 13:59:05 | Linux 2.6.37-rc3-tp42     Tue Nov 23
     4     0 days, 06:40:23 | Linux 2.6.36-tp42-gtt-vr  Tue Nov 23
->   5     0 days, 02:04:05 | Linux 2.6.37-rc3-tp42   
     6     0 days, 00:41:55 | Linux 2.6.37-rc3-tp42     Tue Nov 23
----------------------------+-------------------------------------
1up in     0 days, 04:36:19 | at                        Tue Dec  7
no1 in    10 days, 18:57:37 | at                        Sat Dec 18
    up    13 days, 22:36:12 | since                     Tue Nov 23
  down     0 days, 00:06:49 | since                     Tue Nov 23
   %up               99.966 | since                     Tue Nov 23

I don't remember what might have happened at that time.

Its not the first time. I already restored it from a backup in october:

shambhala:~> ls -l /var/spool/uptimed
insgesamt 28
-rw-r--r-- 1 daemon daemon   11  7. Dez 10:50 bootid
-rw-r--r-- 1 root   root    254  7. Dez 12:35 records
-rw-r--r-- 1 daemon daemon 9806  3. Mär 2010  records-2010-03-03-aus-dem-
rsync-backup
-rw-r--r-- 1 daemon daemon 1450  9. Mär 2010  records-2010-03-09-
unvollstaendig
-rw-r--r-- 1 daemon daemon  254  7. Dez 12:30 records.old

As you see the last working backup here is 9802 bytes, way bigger than the 
current file.

This is on a

shambhala:~> df -hT /var/spool/uptimed
Dateisystem   Typ     Size  Used Avail Use% Eingehängt auf
/dev/mapper/shambhala-debian
              ext4     20G   14G  5,5G  72% /

and a quite recent kernel 2.6.36 / 2.6.37-rc3 which has the Ext4 safeguard 
for the rename and truncate case which was introduced in 2.6.30 I believe 
- that it will flush written data *before* renaming the file. But according 
to libuptimed/urec.d

247 void save_records(int max, time_t log_threshold) {
248 »·······FILE *f;
249 »·······Urec *u;
250 »·······int i = 0;
251 »·······
252 »·······f = fopen(FILE_RECORDS".tmp", "w");
253 »·······if (!f) {
254 »·······»·······printf("uptimed: cannot write to %s\n", FILE_RECORDS);
255 »·······»·······return;
256 »·······}
257 
258 »·······for (u = urec_list; u; u = u->next) {
259 »·······»·······/* Ignore everything below the threshold */
260 »·······»·······if (u->utime >= log_threshold) {
261 »·······»·······»·······fprintf(f, "%lu:%lu:%s\n", (unsigned long)u-
>utime, (unsigned long)u->btime, u->sys);
262 »·······»·······»·······/* Stop processing when we've logged the max 
number specified. */
263 »·······»·······»·······if ((max > 0) && (++i >= max)) break;
264 »·······»·······}
265 »·······}
266 »·······fclose(f);
267 »·······rename(FILE_RECORDS, FILE_RECORDS".old");
268 »·······rename(FILE_RECORDS".tmp", FILE_RECORDS);
269 }

uptimed uses the rename case. Thus I do not get, *why* it ate my old 
records again.

Nonetheless, I think there should be a safeguard, like using the old file 
if the current one is empty.

I would also keep more than one backup given the small size of this file. 
Maybe logrotate can do this while keeping the original file instead of 
truncating it.

I have the following configuration:

shambhala:~> cat /etc/uptimed.conf
# Uptimed configuration file.

# Interval to write the logfile with in seconds.
UPDATE_INTERVAL=300

# Maximum number of entries in logfile. Set to 0 for unlimited.
LOG_MAXIMUM_ENTRIES=0

# Minimum uptime that must be reached for it to be considered a record.
LOG_MINIMUM_UPTIMED=1h
[...]

An option to fsync() would be fine, thus people here can easily test, 
whether fsync helps in that case.

Then there is the slight chance that uptimed gets confused during runtime 
and writes out an empty configuration file by accident. But I find this 
highly unlikely.

I will restore as much as possible from my backup. Its easily possible to 
combine the contents of a backup and a new records file.

I also lost the records on a Lenny => Squeeze update on my Dell 
workstation at work. So this is three losses within just a few month. In 
the current state, uptimed is hardly usable for me.

For now I done a backup for myself as fcrontab jobs:

# Backup der uptimed-Datenbank
@ 1d cp -p /var/spool/uptimed/records ~/Backup/uptimed/records-$(date 
+%Y-%M-%d)
@ 30d find ~/Backup/uptimed/ -name "records-*" -and -mtime +30 -delete

Something like that should go into uptimed or a cron-job that comes with 
the package. Could be a cron.daily or at least cron.weekly job (using some 
directory in /var for backups).

So, I hope this was enough constructive feedback to show what can be done 
about it. I can craft up a cron-job for the uptimed package if you want 
that does the backup. I am not that much into C programming currently, but 
eventually I could come up with a patch for uptimed as well.

But I think this bug needs acknowledgment as being serious cause data loss 
is involved. Just denying that there is a problem, doesn't help proceeding 
further. A user of uptimed IMHO rightly does not care whether its a 
problem in the kernel, the filesystem, or the userspace program.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

signature.asc
Description: This is a digitally signed message part.

Bug#536823: I also see these losses of the uptimed database

Reply via email to