I have far too many Debian bugs that amount to problems with Linux device names being fundamentally unstable: whether a disk is /dev/sda or /dev/sdb depends on bus probing order, and whether a disk is /dev/sda or /dev/hda depends on whether the Linux kernel was built with or without libata. Of course, GRUB has already committed to the idea that it should be as independent of this as reasonably possible; that's why we use UUIDs everywhere.
There is still one significant place in GRUB where we use traditional device names, though: device.map. UUIDs don't apply to disk devices, only filesystems, and so the mechanism for finding stable device names here is inevitably kernel-specific. Some almost certainly have simpler device architectures and don't need any of this. But on Linux, it's vitally important. We can actually mostly work without a device.map nowadays, but there are still some warts (e.g. LVM/RAID probing requires nasty teardown/setup of the upper disk layers if you don't have a device.map, and we haven't yet got round to removing 'set root=' statements in grub.cfg if they're going to be meaningless at boot time due to including OS device names). I don't really want to reopen that discussion just now, but I would like to make sure that *if* you have a device.map then it uses stable device names if possible. This is important because if you have a device.map but it doesn't include the disk you're trying to boot from (e.g. because your kernel just started using libata, so device.map lists /dev/hda but you only have /dev/sda), then grub-probe will fail and you're screwed unless you know which files to edit by hand. I'm OK with the odd user who's doing something weird having to edit files by hand, but right now most Debian users upgrading from lenny to squeeze will have to do this which is really too much. On Linux, the sane choices for stable names for disk devices are those in /dev/disk/by-id/. They're symlinks to the real disk devices, and they're constructed using information such as the serial number of the disk so they only change if the physical disk changes. They're built in userspace by udev rules, so are not necessarily available for all devices (I hear there are problems with virtual disks under VMware, for instance), but they're pretty widely available. There may be multiple symlinks for any one disk - for example, my laptop disk has both ata-* and scsi-SATA_* by-id names - but this can be handled pretty stably by just sorting the list of by-id names and picking the first. So, I propose the following. On Linux, use by-id names if we have them, but if we don't, fall back to what we did before. Call canonicalize_file_name on every device name and make sure we never emit the same one twice. 'grub-probe --target=device' still emits traditional device names (i.e. the symlink target), but that's OK and in fact it's probably best to leave this alone. This means that a Debian package can convert people's device.map files to stable device names on upgrade, and that it only needs to do this once. Thereafter, people only need to change it if they're adding or removing disks that are relevant for booting, but other than that they can leave it alone. The same Debian upgrade can also start remembering the disk on which grub-install should be run in terms of by-id names, which should sort out the vast bulk of upgrade bugs we have. The result looks something like this: $ sudo ./grub-mkdevicemap -m - (hd0) /dev/disk/by-id/ata-FUJITSU_MHW2080BJ_G2_K30TT772603R (hd1) /dev/disk/by-id/usb-WD_My_Book_AV_1020_574341563535393535373637202020-0:0 $ sudo ./grub-mkdevicemap -m - | sudo ./grub-probe --device-map=/proc/self/fd/0 --target=device / /dev/sda5 $ sudo ./grub-mkdevicemap -m - | sudo ./grub-probe --device-map=/proc/self/fd/0 --target=drive / (hd0,msdos5) $ sudo ./grub-mkdevicemap -m - | sudo ./grub-probe --device-map=/proc/self/fd/0 --target=fs / ext2 What do people think? 2010-06-21 Colin Watson <cjwat...@ubuntu.com> Change grub-mkdevicemap to emit /dev/disk/by-id/ names where possible on Linux. * util/deviceiter.c (check_device): Maintain and check a list of which devices (by canonicalized name) have already been seen. (clear_seen_devices): New function. (compare_file_names) [__linux__]: New function. (grub_util_iterate_devices): Clear the list of seen devices on exit and (just in case) on entry. (grub_util_iterate_devices) [__linux__]: Iterate over non-partition devices in /dev/disk/by-id/, in sorted order. Remove DM-RAID seen-devices list, superseded by general code in check_device. === modified file 'util/deviceiter.c' --- util/deviceiter.c 2010-06-11 20:31:16 +0000 +++ util/deviceiter.c 2010-06-21 08:54:07 +0000 @@ -28,6 +28,7 @@ #include <errno.h> #include <fcntl.h> #include <limits.h> +#include <dirent.h> #include <grub/util/misc.h> #include <grub/util/deviceiter.h> @@ -345,18 +346,37 @@ get_xvd_disk_name (char *name, int unit) } #endif +static struct seen_device +{ + struct seen_device *next; + const char *name; +} *seen; + /* Check if DEVICE can be read. If an error occurs, return zero, otherwise return non-zero. */ static int check_device (const char *device) { + char *real_device; char buf[512]; FILE *fp; + struct seen_device *seen_elt; /* If DEVICE is empty, just return error. */ if (*device == 0) return 0; + /* Have we seen this device already? */ + real_device = canonicalize_file_name (device); + if (! real_device) + return 0; + if (grub_named_list_find (GRUB_AS_NAMED_LIST (seen), real_device)) + { + grub_dprintf ("deviceiter", "Already seen %s (%s)\n", + device, real_device); + goto fail; + } + fp = fopen (device, "r"); if (! fp) { @@ -379,7 +399,7 @@ check_device (const char *device) break; } /* Error opening the device. */ - return 0; + goto fail; } /* Make sure CD-ROMs don't get assigned a BIOS disk number @@ -387,7 +407,7 @@ check_device (const char *device) #ifdef __linux__ # ifdef CDROM_GET_CAPABILITY if (ioctl (fileno (fp), CDROM_GET_CAPABILITY, 0) >= 0) - return 0; + goto fail; # else /* ! CDROM_GET_CAPABILITY */ /* Check if DEVICE is a CD-ROM drive by the HDIO_GETGEO ioctl. */ { @@ -395,14 +415,14 @@ check_device (const char *device) struct stat st; if (fstat (fileno (fp), &st)) - return 0; + goto fail; /* If it is a block device and isn't a floppy, check if HDIO_GETGEO succeeds. */ if (S_ISBLK (st.st_mode) && MAJOR (st.st_rdev) != FLOPPY_MAJOR && ioctl (fileno (fp), HDIO_GETGEO, &hdg)) - return 0; + goto fail; } # endif /* ! CDROM_GET_CAPABILITY */ #endif /* __linux__ */ @@ -410,7 +430,7 @@ check_device (const char *device) #if defined(__FreeBSD_kernel__) || defined(__NetBSD__) || defined(__OpenBSD__) # ifdef CDIOCCLRDEBUG if (ioctl (fileno (fp), CDIOCCLRDEBUG, 0) >= 0) - return 0; + goto fail; # endif /* CDIOCCLRDEBUG */ #endif /* __FreeBSD_kernel__ || __NetBSD__ || __OpenBSD__ */ @@ -418,21 +438,43 @@ check_device (const char *device) if (fread (buf, 1, 512, fp) != 512) { fclose (fp); - return 0; + goto fail; } + /* Remember that we've seen this device. */ + seen_elt = xmalloc (sizeof *seen_elt); + seen_elt->name = real_device; /* steal memory */ + grub_list_push (GRUB_AS_LIST_P (&seen), GRUB_AS_LIST (seen_elt)); + fclose (fp); return 1; + +fail: + free (real_device); + return 0; +} + +static void +clear_seen_devices (void) +{ + while (seen) + { + struct seen_device *seen_elt = seen; + seen = seen->next; + free (seen_elt); + } + seen = NULL; } #ifdef __linux__ -# ifdef HAVE_DEVICE_MAPPER -struct dmraid_seen +/* Like strcmp, but doesn't require a cast for use as a qsort comparator. */ +static int +compare_file_names (const void *a, const void *b) { - struct dmraid_seen *next; - const char *name; -}; -# endif /* HAVE_DEVICE_MAPPER */ + const char *left = *(const char **) a; + const char *right = *(const char **) b; + return strcmp (left, right); +} #endif /* __linux__ */ void @@ -441,6 +483,8 @@ grub_util_iterate_devices (int NESTED_FU { int i; + clear_seen_devices (); + /* Floppies. */ for (i = 0; i < floppy_disks; i++) { @@ -453,10 +497,56 @@ grub_util_iterate_devices (int NESTED_FU /* In floppies, write the map, whether check_device succeeds or not, because the user just may not insert floppies. */ if (hook (name, 1)) - return; + goto out; } #ifdef __linux__ + { + DIR *dir = opendir ("/dev/disk/by-id"); + + if (dir) + { + struct dirent *entry; + char **names; + size_t names_len = 0, names_max = 1024, i; + + names = xmalloc (names_max * sizeof *names); + + /* Dump all the directory entries into names, resizing if + necessary. */ + for (entry = readdir (dir); entry; entry = readdir (dir)) + { + /* Skip partition entries. */ + if (strstr (entry->d_name, "-part")) + continue; + if (names_len >= names_max) + { + names_max *= 2; + names = xrealloc (names, names_max * sizeof *names); + } + names[names_len++] = xasprintf (entry->d_name); + } + + qsort (names, names_len, sizeof *names, &compare_file_names); + + closedir (dir); + + /* Now add all the devices in sorted order. */ + for (i = 0; i < names_len; ++i) + { + char *path = xasprintf ("/dev/disk/by-id/%s", names[i]); + if (check_device (path)) + { + if (hook (path, 0)) + goto out; + } + free (path); + free (names[i]); + } + free (names); + } + } + if (have_devfs ()) { i = 0; @@ -476,10 +566,10 @@ grub_util_iterate_devices (int NESTED_FU { strcat (name, "/disc"); if (hook (name, 0)) - return; + goto out; } } - return; + goto out; } #endif /* __linux__ */ @@ -492,7 +582,7 @@ grub_util_iterate_devices (int NESTED_FU if (check_device (name)) { if (hook (name, 0)) - return; + goto out; } } @@ -506,7 +596,7 @@ grub_util_iterate_devices (int NESTED_FU if (check_device (name)) { if (hook (name, 0)) - return; + goto out; } } @@ -519,7 +609,7 @@ grub_util_iterate_devices (int NESTED_FU if (check_device (name)) { if (hook (name, 0)) - return; + goto out; } } @@ -532,7 +622,7 @@ grub_util_iterate_devices (int NESTED_FU if (check_device (name)) { if (hook (name, 0)) - return; + goto out; } } #endif /* __linux__ */ @@ -546,7 +636,7 @@ grub_util_iterate_devices (int NESTED_FU if (check_device (name)) { if (hook (name, 0)) - return; + goto out; } } @@ -569,7 +659,7 @@ grub_util_iterate_devices (int NESTED_FU if (check_device (name)) { if (hook (name, 0)) - return; + goto out; } } } @@ -590,7 +680,7 @@ grub_util_iterate_devices (int NESTED_FU if (check_device (name)) { if (hook (name, 0)) - return; + goto out; } } } @@ -611,7 +701,7 @@ grub_util_iterate_devices (int NESTED_FU if (check_device (name)) { if (hook (name, 0)) - return; + goto out; } } } @@ -632,7 +722,7 @@ grub_util_iterate_devices (int NESTED_FU if (check_device (name)) { if (hook (name, 0)) - return; + goto out; } } } @@ -650,7 +740,7 @@ grub_util_iterate_devices (int NESTED_FU if (check_device (name)) { if (hook (name, 0)) - return; + goto out; } } } @@ -664,7 +754,7 @@ grub_util_iterate_devices (int NESTED_FU if (check_device (name)) { if (hook (name, 0)) - return; + goto out; } } @@ -685,7 +775,6 @@ grub_util_iterate_devices (int NESTED_FU unsigned int next = 0; void *top_handle, *second_handle; struct dm_tree_node *root, *top, *second; - struct dmraid_seen *seen = NULL; /* Build DM tree for all devices. */ tree = dm_tree_create (); @@ -721,7 +810,6 @@ grub_util_iterate_devices (int NESTED_FU { const char *node_name, *node_uuid; char *name; - struct dmraid_seen *seen_elt; node_name = dm_tree_node_get_name (second); dmraid_check (node_name, "dm_tree_node_get_name failed\n"); @@ -733,40 +821,21 @@ grub_util_iterate_devices (int NESTED_FU goto dmraid_next_child; } - /* Have we already seen this node? There are typically very few - DM-RAID disks, so a list should be fast enough. */ - if (grub_named_list_find (GRUB_AS_NAMED_LIST (seen), node_name)) - { - grub_dprintf ("deviceiter", "Already seen DM device %s\n", - node_name); - goto dmraid_next_child; - } - name = xasprintf ("/dev/mapper/%s", node_name); if (check_device (name)) { if (hook (name, 0)) { free (name); - while (seen) - { - struct dmraid_seen *seen_elt = seen; - seen = seen->next; - free (seen_elt); - } if (task) dm_task_destroy (task); if (tree) dm_tree_free (tree); - return; + goto out; } } free (name); - seen_elt = xmalloc (sizeof *seen_elt); - seen_elt->name = node_name; - grub_list_push (GRUB_AS_LIST_P (&seen), GRUB_AS_LIST (seen_elt)); - dmraid_next_child: second = dm_tree_next_child (&second_handle, top, 1); } @@ -774,12 +843,6 @@ dmraid_next_child: } dmraid_end: - while (seen) - { - struct dmraid_seen *seen_elt = seen; - seen = seen->next; - free (seen_elt); - } if (task) dm_task_destroy (task); if (tree) @@ -787,5 +850,8 @@ dmraid_end: } # endif /* HAVE_DEVICE_MAPPER */ #endif /* __linux__ */ + +out: + clear_seen_devices (); } Thanks, -- Colin Watson [cjwat...@ubuntu.com] _______________________________________________ Grub-devel mailing list Grub-devel@gnu.org http://lists.gnu.org/mailman/listinfo/grub-devel