On May 22, 2011, at 2:28 PM, Marcel Moolenaar wrote:

> 
> On May 22, 2011, at 9:26 AM, Warner Losh wrote:
> 
>> 
>> On May 22, 2011, at 7:03 AM, Adrian Chadd wrote:
>> 
>>> This also bit me on embedded platform stuff.
>>> 
>>> Is it possible to disable this by default for now and have it just warn 
>>> loudly?
>>> And/or hide the default value behind a kernel configuration variable
>>> so we can disable it
>>> (but still get the warnings) for now?
>> 
>> Or just delete it entirely as a bad idea?  We had this with Marcel's warning 
>> for a long time that turned out to be utterly bogus so we removed it.
> 
> Really?
> 
> The warning wasn't bogus. It was there to help us understand what was
> going on and we found over time that it was a harmless condition.
> That's why we removed the warning. The warning was good to raise
> awareness.

Except from the moment it went in we knew that the problem it was reporting was 
often due to the artificial geometries used throughout the system.  That was 
the root cause of that problem: the firmware reported fake geometry and the MBR 
described a different geometry.  The rest of the world ignored the firmware 
reported geometry and used what was described in the MBR.

> This is in no way similar to what we have now. Here we have to deal with
> broken and fundamentally invalid MBRs that has caused us harm before and
> we're trying to do something sensible. As it turns out, a whole bunch of
> people have invalid MBRs, probably caused by crappy tools. Now what do
> you suggest we should do? Accept it silently and suffer the consequences
> later,  should we raise awareness so that administrators can try and fix
> things or should we reject the MBR out of pedanticism?
> 
> Rather than just calling it a bad idea, why not come up with something
> constructive?


Looking at one of my flash drives that shows the problem:

da0: 15423MB (31588351 512 byte sectors: 255H 63S/T 1966C)
GEOM_PART: partition 1 has end offset beyond last LBA: 31588350 > 31588325

So why does gpart think the last LBA is 25 less than it really is.  Well, let's 
do some sanity checks first.  fdisk -s da0 tells us:

/dev/da0: 1966 cyl 255 hd 63 sec
Part        Start        Size Type Flags
   1:          63    31588288 0x0b 0x00

which puts the ending sector at 31588350, which should be fine.  Diskinfo 
reports the correct info:

sudo diskinfo da0
da0     512     16173235712     31588351        0       0       1966    255     
63

I'm pretty sure this problem is due to a bug in g_part_mbr.c:

        basetable->gpt_last = msize - (msize % basetable->gpt_sectors) - 1;

This is wrong, or at least it is a widely disregarded part of what makes up an 
MBR.  When I correct the size, the geom code is fine.  There's no requirement 
in MBR that a partition end of any particular boundary, although sometimes 
you'll find mistaken documentation that suggests this is the case.  Reading 
between the lines at http://www.boot-us.com/gloss03.htm suggests that this 
restriction was only for disks < 8GB in size (from the fact it said that all 
the partitions can be described with the CHS fields).  This is one of the 
things I learned when I tried to make fdisk enforce that: this isn't a 
requirement of MBR as it is implemented in the wild today.

Replacing the above with:
        basetable->gpt_last = msize - 1;
seems to fix the problem.  I'm unsure if this should be unconditional, or just 
for disks that are larger than 8GB in size.  Given that my thumb drive that 
triggers the bogus behavior is 16GB in size, I don't know for sure.  I have one 
or two 2GB drives that didn't trigger the problem.  They work either way.

Please note: this is with a USB drive that I've not repartitioned since I 
brought it home.  So this isn't a tool issue.  This is likely why 6/7 of phk's 
USB thumb drives have failed.

So please, let's change the code.  A similar change is necessary when creating 
the MBR as well, at least for media that's 8GB or larger.

I'll admit that I was wrong about WHERE the bogusity was (I thought it was the 
checks).  I was right that it was a bogisty in the FreeBSD code, and nothing 
wrong with my media.  MacOS, Linux and Windows handle it correctly, which 
strongly suggests there's a bug in FreeBSD's MBR handling.

Warner_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Reply via email to