Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread John Baldwin
On Wednesday, June 27, 2012 12:50:20 am Andrey V. Elsukov wrote:
> On 26.06.2012 21:37, John Baldwin wrote:
> >> 4. The gptboot now searches the backup GPT header in the previous sectors,
> >> when it finds the "GEOM::" signature in the last sector. PMBR code also
> >> tries to do the same:
> >> common/gpt.c
> >> i386/pmbr/pmbr.s
> > 
> > GPT really wants the backup header at the last LBA.  I know you can set it, 
> > but I've interpreted that as a way to see if the primary header is correct 
> > or 
> > not.  It seems to me that GPT tables created in this fashion (inside a GEOM 
> > provider) will not work properly with partition editors for other OS's.  
> > I'm 
> > hesitant to encourage the use of this as I do think putting GPT inside of a 
> > gmirror violates the GPT spec.
> 
> The standard says:
> "The following test must be performed to determine if a GPT is valid:
> • Check the Signature
> • Check the Header CRC
> • Check that the MyLBA entry points to the LBA that contains the GUID 
> Partition Table
> • Check the CRC of the GUID Partition Entry Array
> If the GPT is the primary table, stored at LBA 1:
> • Check the AlternateLBA to see if it is a valid GPT
> If the primary GPT is corrupt, software must check the last LBA of the device 
> to see if it has a
> valid GPT Header and point to a valid GPT Partition Entry Array."

Right, we break the last rule.  If you want to use a partition editor
that doesn't grok gmirror (because you are using another OS's editor),
to repair a GPT, it will do the wrong thing.

> If a user wants modify GPT in the disk editor from the another OS,
> he can do it, and it should work. The result depends only from the partition 
> editor,
> it might overwrite the last sector and might don't.

I would not assume it would work at all.  If it can't trust the
primary GPT, it has to assume the alternate is at the last LBA.

> >> 5. Also the pmbr image now contains one fake partition record.
> >> When several first sectors are damaged the kernel can't detect GPT
> >> (see RECOVERING section in the gpart(8)). We can restore PMBR with dd(1)
> >> command, but the old pmbr image has an empty partition table and
> >> loader doesn't able to boot from GPT, when there is no partition record
> >> in the PMBR. Now it will be able. When pmbr is installed via 'gpart 
> > bootcode'
> >> command, the kernel correctly modifies this partition record. So, this is 
> > only
> >> for the first rescue step.
> > 
> > As I said earlier, I do not think this is appropriate and that instead
> > gpart should have an appropriate 'recover' command to install just the pmbr 
> > on 
> > a disk and also create a correct entry in the MBR if needed while doing so.
> 
> gpart(8) is only one of several geom(8)' tools to manage objects of a GEOM 
> class.
> It only sends control requests to the kernel. If GPT is not detected,
> there is no geom objects to manage. And we can't write bootcode with gpart(8).
> I think that adding such functions to the gpart(8) is not good. Maybe,
> the boot0cfg is the better tool for that. Also we still haven't any tool to
> install zfsboot.

We can't write bootcode with gpart?  What do you think the 'bootcode' command
does?

Also, there is no reason we can't have a 'recover' command that attempts to
recover a corrupted table including repairing the PMBR.  gpart(8) already
generates a full PMBR when you use 'gpart create' to create a GPT even though
there isn't a GPT object yet.

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread John Baldwin
On Tuesday, June 26, 2012 5:23:08 pm Pawel Jakub Dawidek wrote:
> On Tue, Jun 26, 2012 at 01:37:11PM -0400, John Baldwin wrote:
> > > 4. The gptboot now searches the backup GPT header in the previous sectors,
> > > when it finds the "GEOM::" signature in the last sector. PMBR code also
> > > tries to do the same:
> > > common/gpt.c
> > > i386/pmbr/pmbr.s
> > 
> > GPT really wants the backup header at the last LBA.  I know you can set it, 
> > but I've interpreted that as a way to see if the primary header is correct 
> > or 
> > not. [...]
> 
> My interpretation is different: The way to verify if the header is valid
> is to check its checksum, not to check if the backup header location in
> the primary header points at the last LBA.
> 
> Of course if primary header's checksum is incorrect it is hard to trust
> that the backup header location is correct. And we need the backup
> header when the primary header is invalid...

Right, which is why this fails.

> > [...] It seems to me that GPT tables created in this fashion (inside a GEOM 
> > provider) will not work properly with partition editors for other OS's.  
> > I'm 
> > hesitant to encourage the use of this as I do think putting GPT inside of a 
> > gmirror violates the GPT spec.
> 
> I don't think so. Most common case is to configure partitions on top of
> a mirror. Mirroring partitions is less common. Mostly because of
> hardware RAIDs being popular. You don't expect hardware RAID vendor to
> mirror partitions. Partition editors for other OS's won't work, but only
> because they don't support gmirror. If they wouldn't recognize and
> support some hardware (or pseudo-hardware) RAIDs there will be the same
> problem.

Hardware RAIDs hide the metadata from the disk that the BIOS (and disk
editors) see.  Thus, putting a GPT on a hardware RAID volume works fine
as the logical volume is always seen by all OS's consistently.  The same
is even true of the "software" RAID that graid supports since the metadata
is defined by the vendor and thus the logical volume is always seen other
OS's consistently.

My approach has been to only use gmirror with MBR so far, though I realize
that doesn't work above 2TB (until recently one had to have a hardware RAID
to get above 2TB anyway which made this last a moot point).

I won't object to patch our tools to handle this, but I think it is a really
bad idea that users will have a hard way to recover from when they are bitten
by it.

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Andrey V. Elsukov
On 27.06.2012 16:07, John Baldwin wrote:
>> • Check the Signature
>> • Check the Header CRC
>> • Check that the MyLBA entry points to the LBA that contains the GUID 
>> Partition Table
>> • Check the CRC of the GUID Partition Entry Array
>> If the GPT is the primary table, stored at LBA 1:
>> • Check the AlternateLBA to see if it is a valid GPT
>> If the primary GPT is corrupt, software must check the last LBA of the 
>> device to see if it has a
>> valid GPT Header and point to a valid GPT Partition Entry Array."
> 
> Right, we break the last rule.  If you want to use a partition editor
> that doesn't grok gmirror (because you are using another OS's editor),
> to repair a GPT, it will do the wrong thing.

When we are in the FreeBSD, our loader can detect that device size
is lower than it see and it will work. When primary header is OK, then
other OSes should work with this GPT. When it isn't OK, you just can't
load other OS :)

>>> As I said earlier, I do not think this is appropriate and that instead
>>> gpart should have an appropriate 'recover' command to install just the pmbr 
>>> on 
>>> a disk and also create a correct entry in the MBR if needed while doing so.
>>
>> gpart(8) is only one of several geom(8)' tools to manage objects of a GEOM 
>> class.
>> It only sends control requests to the kernel. If GPT is not detected,
>> there is no geom objects to manage. And we can't write bootcode with 
>> gpart(8).
>> I think that adding such functions to the gpart(8) is not good. Maybe,
>> the boot0cfg is the better tool for that. Also we still haven't any tool to
>> install zfsboot.
> 
> We can't write bootcode with gpart?  What do you think the 'bootcode' command
> does?

`gpart bootcode -b` reads file, creates ioctl request and sends this data to
the GEOM_PART class. GEOM_PART receives the control request, checks the data
and writes it to the provider.
`gpart bootcode -p` works like dd(1) and writes bootcode to the given partition.
gpart(8) haven't any knowledge about specific partitioning scheme.

> Also, there is no reason we can't have a 'recover' command that attempts to
> recover a corrupted table including repairing the PMBR.  gpart(8) already
> generates a full PMBR when you use 'gpart create' to create a GPT even though
> there isn't a GPT object yet.

`gpart create` creates only ioctl control request to the GEOM_PART class.
GEOM_PART class creates new GPT geom object and this objects writes PMBR and its
metadata to the provider.

-- 
WBR, Andrey V. Elsukov


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Pawel Jakub Dawidek
On Wed, Jun 27, 2012 at 08:22:25AM -0400, John Baldwin wrote:
> > I don't think so. Most common case is to configure partitions on top of
> > a mirror. Mirroring partitions is less common. Mostly because of
> > hardware RAIDs being popular. You don't expect hardware RAID vendor to
> > mirror partitions. Partition editors for other OS's won't work, but only
> > because they don't support gmirror. If they wouldn't recognize and
> > support some hardware (or pseudo-hardware) RAIDs there will be the same
> > problem.
> 
> Hardware RAIDs hide the metadata from the disk that the BIOS (and disk
> editors) see.  Thus, putting a GPT on a hardware RAID volume works fine
> as the logical volume is always seen by all OS's consistently. [...]

Only if you won't connect this disk to a different controller.

> [...] The same
> is even true of the "software" RAID that graid supports since the metadata
> is defined by the vendor and thus the logical volume is always seen other
> OS's consistently.

But is it seen without metadata by the boot loader?

What I'm trying to say is that it is fair to expect from the user to not
use gmirror-configured disk on different OS. If the user wants to use
this disk in different OS then he has to use format that is recognized
by both.

Because gmirror is supported by FreeBSD we should improve the support by
teaching boot loader about it. Pretending gmirror is special and
recommending to mirror partitions with it instead of raw disks is not
the solution.

I really can't see how gmirror is different in this regard from any
other software RAID or volume manager. If you try to use disk that
contains unrecognized metadata the behaviour is undefined (but hopefully
not a panic).

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpuYtYuIiw2R.pgp
Description: PGP signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread John Baldwin
On Wednesday, June 27, 2012 10:08:17 am Pawel Jakub Dawidek wrote:
> On Wed, Jun 27, 2012 at 08:22:25AM -0400, John Baldwin wrote:
> > > I don't think so. Most common case is to configure partitions on top of
> > > a mirror. Mirroring partitions is less common. Mostly because of
> > > hardware RAIDs being popular. You don't expect hardware RAID vendor to
> > > mirror partitions. Partition editors for other OS's won't work, but only
> > > because they don't support gmirror. If they wouldn't recognize and
> > > support some hardware (or pseudo-hardware) RAIDs there will be the same
> > > problem.
> > 
> > Hardware RAIDs hide the metadata from the disk that the BIOS (and disk
> > editors) see.  Thus, putting a GPT on a hardware RAID volume works fine
> > as the logical volume is always seen by all OS's consistently. [...]
> 
> Only if you won't connect this disk to a different controller.

Yes, but people do not expect to be able to yank a hardware RAID drive out and 
hook it up to a "raw" disk controller and have it work.

> > [...] The same
> > is even true of the "software" RAID that graid supports since the metadata
> > is defined by the vendor and thus the logical volume is always seen other
> > OS's consistently.
> 
> But is it seen without metadata by the boot loader?

Yes.  The logical volume shows up as a BIOS disk device.

> What I'm trying to say is that it is fair to expect from the user to not
> use gmirror-configured disk on different OS. If the user wants to use
> this disk in different OS then he has to use format that is recognized
> by both.
> 
> Because gmirror is supported by FreeBSD we should improve the support by
> teaching boot loader about it. Pretending gmirror is special and
> recommending to mirror partitions with it instead of raw disks is not
> the solution.
> 
> I really can't see how gmirror is different in this regard from any
> other software RAID or volume manager. If you try to use disk that
> contains unrecognized metadata the behaviour is undefined (but hopefully
> not a panic).

It is not gmirror I am complaining about, it is the non-standard use of GPT.
Note that gmirror + MBR works fine without violating what little standard 
there is for the MBR.  Using a dedicated GPT partition to hold the gmirrror 
metadata would work with GPT (but be a good bit harder to work with in terms 
of GEOM I realize).

But as I said, I won't object to these patches.

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread John Baldwin
On Wednesday, June 27, 2012 8:45:45 am Andrey V. Elsukov wrote:
> On 27.06.2012 16:07, John Baldwin wrote:
> >> • Check the Signature
> >> • Check the Header CRC
> >> • Check that the MyLBA entry points to the LBA that contains the GUID 
> >> Partition Table
> >> • Check the CRC of the GUID Partition Entry Array
> >> If the GPT is the primary table, stored at LBA 1:
> >> • Check the AlternateLBA to see if it is a valid GPT
> >> If the primary GPT is corrupt, software must check the last LBA of the 
> >> device to see if it has a
> >> valid GPT Header and point to a valid GPT Partition Entry Array."
> > 
> > Right, we break the last rule.  If you want to use a partition editor
> > that doesn't grok gmirror (because you are using another OS's editor),
> > to repair a GPT, it will do the wrong thing.
> 
> When we are in the FreeBSD, our loader can detect that device size
> is lower than it see and it will work. When primary header is OK, then
> other OSes should work with this GPT. When it isn't OK, you just can't
> load other OS :)

Ah, yes.  The solution to violating standards is to make sure you never
use standards-compliant software.  That's a great argument. :)

(Although not entirely uncommon.  Standards aren't always perfect, but if
we had a way to not gratuitously violate them it would be nice to avoid
doing so.)

> > We can't write bootcode with gpart?  What do you think the 'bootcode' 
> > command
> > does?
> 
> `gpart bootcode -b` reads file, creates ioctl request and sends this data to
> the GEOM_PART class. GEOM_PART receives the control request, checks the data
> and writes it to the provider.
> `gpart bootcode -p` works like dd(1) and writes bootcode to the given 
> partition.
> gpart(8) haven't any knowledge about specific partitioning scheme.

Correct, but in both cases it writes "bootcode".

> > Also, there is no reason we can't have a 'recover' command that attempts to
> > recover a corrupted table including repairing the PMBR.  gpart(8) already
> > generates a full PMBR when you use 'gpart create' to create a GPT even 
> > though
> > there isn't a GPT object yet.
> 
> `gpart create` creates only ioctl control request to the GEOM_PART class.
> GEOM_PART class creates new GPT geom object and this objects writes PMBR and 
> its
> metadata to the provider.

You can't add a new ioctl?

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Marcel Moolenaar

On Jun 26, 2012, at 10:37 AM, John Baldwin wrote:
> 
> GPT really wants the backup header at the last LBA.  I know you can set it, 
> but I've interpreted that as a way to see if the primary header is correct or 
> not.  It seems to me that GPT tables created in this fashion (inside a GEOM 
> provider) will not work properly with partition editors for other OS's.  I'm 
> hesitant to encourage the use of this as I do think putting GPT inside of a 
> gmirror violates the GPT spec.

Agreed.

While it is a nice trick to use the last sector for meta data, it does
create 2 problems. 1 is mentioned above. The second is that when there's
different metadata in the first *and* the last sector, you can't decide
which is to take precedence without also looking at the other and know
how to interpret it. We have not solved this second problem at all.  We
do get reports about the problems though. At best we're handwaving or
kluging.

I think it's unwise to depend on FreeBSD-specific extensions or features
in industry-standard partitioning schemes and as such make the use of
"foreign" tools hard if not impossible.

A much more flexible approach is to support out-of-band configuration
data. This allows us to mirror GPT disks without having to become non-
standard as it removes the need to use the last sector for meta-data.
The ability to construct GEOM hierarchies unambiguously is very
important and our current approach has proven to not deliver on that.
This is actually impacting existing FreeBSD consumers already, like
Juniper. So, se should not go deeper into this rabbit hole. We should
finally solve this problem for real...

-- 
Marcel Moolenaar
mar...@xcllnt.net


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Marcel Moolenaar

On Jun 26, 2012, at 2:43 PM, Pawel Jakub Dawidek wrote:
> 
> As for sharing disk with other OS. If you share the disk with OS that
> doesn't support gmirror, you shouldn't use gmirror in the first place.
> You probably want to use only formats that are recognized by all your
> OSes.

This statement is ridicuous by virtue of not being in touch with
reality and by making gmirror useless for such wide range of cases
that one can question why we have it at all.

Put differently: a mirroring class is a fairly basic and useful thing
to have. Limiting it's use is nothing but artificial and follows from
having to use the underlying provider to store metadata. This then
changes the view of the underlying providing to consumers above gmirror
in a way that makes the presence or absence of gmirror visible.
Solving the visibility problem makes gmirror useful all the time.
I see that as a better way of looking at it than simply blurting out
that you shouldn't use gmirror when certain awkward and artifical
conditions apply.

-- 
Marcel Moolenaar
mar...@xcllnt.net


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Freeze when running freebsd-update

2012-06-27 Thread Robert Simmons
On Wed, Jun 27, 2012 at 2:33 AM, Dieter BSD  wrote:
>>> Robert writes:
 3) the box is responsive to hitting enter at the console (it produces
 another login: prompt)
>>>
>>> Getty is in memory and can run.
>>>
 5) if I try to login to the console, it lets me enter a username then
 locks up totally, it does not present me with a password: prompt.
>>>
>>> Login(1) is not in memory, and the kernel cannot read it from disk
>>> for some reason.
>>>
>>> I can get this symptom by writing a large file to a disk on a
>>> controller that FreeBSD doesn't support NCQ on. I assume there
>>> is a logjam in the buffer cache. Something trivial like reading
>>> login in from disk that would normally happen in well under a
>>> second can take many minutes.
>>>
>>> Perhaps geli is causing a similar logjam? Does it hang forever or
>>> is it just obscenely slow? If it truely hangs forever it is
>>> probably something else. Is there disk activity after it hangs?
>>> Can you try it without geli? systat -vmstat might provide a clue.
>>
>> Well, it is geli. I'm unable to reproduce the freeze on the same
>> exact system with everything else the same except for no geli. I'm
>> going to move this thread over to geom, and continue it there. Thanks
>> for your help!
>
> It occurs to me that it will need twice as much memory for disk i/o.
> 1 buffer for encrypted and 1 for unencrypted. I know nothing about geli,
> so I don't know if it uses the buffer cache for both, or what.
> Could it be that the kernel isn't keeping enough memory free and
> manages to paint itself into a corner and not have space to store
> the unencrypted version of disk reads, and can't page/swap anything
> out to make space because it doesn't have space to store the encrypted
> version to write?

I think that's probably about what is happening.  I'm still waiting
for an answer on the geom mailing list, but I will do some testing
with increasing memory sizes and see where the problem stops
occurring.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Marcel Moolenaar

On Jun 26, 2012, at 9:50 PM, Andrey V. Elsukov wrote:

> If the primary GPT is corrupt, software must check the last LBA of the device 
> to see if it has a
> valid GPT Header and point to a valid GPT Partition Entry Array."
> 
> For the FreeBSD an each GEOM provider can be treated as disk device.
> So, i don't see anything criminal if we will add some quirks in the our loader
> for the better supporting of our technologies.

You can't just re-interpret standards to match a context you know very well
isn't applicable and consequently redefine what the word "device" means.
You're on a slippery slope and while you may not see it as a problem, you
do make it a problem for FreeBSD users. It's our users we should be keeping
in mind when we solve problems.

> If a user wants modify GPT in the disk editor from the another OS,
> he can do it, and it should work. The result depends only from the partition 
> editor,
> it might overwrite the last sector and might don't.

Right. Another happy user that sees his/her FreeBSD installation destroyed
or degraded (no mirroring, warning messages about corrupted GPT, etc) for
no apparent reason and without any kind of warning that what he/she is doing
is potentially harmful... That's the spirit!

-- 
Marcel Moolenaar
mar...@xcllnt.net


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Pawel Jakub Dawidek
On Wed, Jun 27, 2012 at 10:37:11AM -0700, Marcel Moolenaar wrote:
> 
> On Jun 26, 2012, at 10:37 AM, John Baldwin wrote:
> > 
> > GPT really wants the backup header at the last LBA.  I know you can set it, 
> > but I've interpreted that as a way to see if the primary header is correct 
> > or 
> > not.  It seems to me that GPT tables created in this fashion (inside a GEOM 
> > provider) will not work properly with partition editors for other OS's.  
> > I'm 
> > hesitant to encourage the use of this as I do think putting GPT inside of a 
> > gmirror violates the GPT spec.
> 
> Agreed.

Guys. This doesn't violate the GPT spec in any way. The spec is
narrow-minded if it talks only about raw disks, but you should think
about gmirror as pseudo-hardware RAID. That's all. If putting GPT on top
of RAID array is spec violation, then I guess we just have to live with it.

> While it is a nice trick to use the last sector for meta data, it does
> create 2 problems. 1 is mentioned above. [...]

It doesn't really matter where gmirror puts its metadata. If gmirror
would keep its metadata in the first sector, gpart/gpt will find its
metadata in the last sector and will complain about missing primary
header.

> [...] The second is that when there's
> different metadata in the first *and* the last sector, you can't decide
> which is to take precedence without also looking at the other and know
> how to interpret it. We have not solved this second problem at all.  We
> do get reports about the problems though. At best we're handwaving or
> kluging.

This is different kind of problem. It took me a while to realize that,
but now I know:)

The real problem is that not all metadata formats are suitable for
autodetection. That's all.

The metadata I use in my GEOM classes play nice with autodetection.
The solution is very easy - keep size of the disk device within metadata.
This allows gmirror to figure out if it is configured on raw disk, last
slice or last partition within last slice, etc.
If GPT would keep disk size in its metadata the second problem you
mentioned would not exist. And to be honest GPT kinda does that by having
backup header's LBA stored in the primary header. And this is fine as
long the primary header is valid.

The same problem is with things like UFS labels. There is no way to
properly support them using GEOM autodetection, because there is no
provider size in UFS superblock. UFS superblock contains file system
size, but it is not the same, as one can create smaller file system than
the underlying disk device.

> I think it's unwise to depend on FreeBSD-specific extensions or features
> in industry-standard partitioning schemes and as such make the use of
> "foreign" tools hard if not impossible.

If you plan to use the given disk with FreeBSD only, what's the problem?
Partitioning is not the end of the world. Even if you use
"industry-standard partitioning schemes" what file system are you going
to use to actually access your data? FAT? Of course if you do share your
disk between various OSes then probably your best bet is to use MBR or
GPT on raw disk and FAT file system. But if you use your disk with
FreeBSD only, then I see no reason to not to leverage FreeBSD-specific
features (be it gmirror, geli or zfs).

> A much more flexible approach is to support out-of-band configuration
> data. This allows us to mirror GPT disks without having to become non-
> standard as it removes the need to use the last sector for meta-data.
> The ability to construct GEOM hierarchies unambiguously is very
> important and our current approach has proven to not deliver on that.
> This is actually impacting existing FreeBSD consumers already, like
> Juniper. So, se should not go deeper into this rabbit hole. We should
> finally solve this problem for real...

Marcel, nothing stops anyone from implementing GEOM mirror class that
uses no on-disk metadata. GEOM is not a limiting factor here. GEOM does
provide mechanism for autoconfiguration, but it is totally optional and
GEOM class might choose not to use it.

As an example you can take a look at two other GEOM classes of mine:
gconcat(8) and gstripe(8). You can use 'label' subcommand to store
metadata on component disks, which will take advantage of  GEOM
autodetection and autoconfiguration. You can also use 'create'
subcommand to create ad hoc provider that stores no metadata and makes
use of entire disks, which also means it won't be automatically created
on next boot.

For Juniper it might be more handy to use out-of-band configuration as
you know the hardware you are running on, so you know where the disks
are exactly, etc. My company build appliances too, so I have been there.
For most of our users automatic configuration is simply better, as they
can shuffle disks around and not wonder if the system will boot or not.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes

Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Pawel Jakub Dawidek
On Wed, Jun 27, 2012 at 10:45:35AM -0700, Marcel Moolenaar wrote:
> 
> On Jun 26, 2012, at 2:43 PM, Pawel Jakub Dawidek wrote:
> > 
> > As for sharing disk with other OS. If you share the disk with OS that
> > doesn't support gmirror, you shouldn't use gmirror in the first place.
> > You probably want to use only formats that are recognized by all your
> > OSes.
> 
> This statement is ridicuous by virtue of not being in touch with
> reality and by making gmirror useless for such wide range of cases
> that one can question why we have it at all.
> 
> Put differently: a mirroring class is a fairly basic and useful thing
> to have. Limiting it's use is nothing but artificial and follows from
> having to use the underlying provider to store metadata. This then
> changes the view of the underlying providing to consumers above gmirror
> in a way that makes the presence or absence of gmirror visible.
> Solving the visibility problem makes gmirror useful all the time.
> I see that as a better way of looking at it than simply blurting out
> that you shouldn't use gmirror when certain awkward and artifical
> conditions apply.

I'm sorry, Marcel, but what you describe here has nothing to do with
reality. To be able to implement realiable mirroring you have to use
on-disk metadata. There is no way around that. You can implement
non-redundant GEOM classes without using on-disk metadata, but
out-of-band configuration in case of mirroring is simply naive. How do
you detect that components are out of sync, for example?

And when it comes to visablity. Are you suggesting that gmirror should
present entire underlying provider to upper layers? Including its
metadata? I hope not, because we went through that hell already
(remember skipping first 16 sectors by UFS, as BSDlabel metadata might
be there? The same for swap?).
I think I did pretty good job by making the metadata as simple as
possible - I use exactly one sector at the end of the target device.
I'm really having a hard time to think of a simpler format.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpHuBBkXk10K.pgp
Description: PGP signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Marcel Moolenaar

On Jun 27, 2012, at 11:34 AM, Pawel Jakub Dawidek wrote:
> 
> I'm sorry, Marcel, but what you describe here has nothing to do with
> reality. To be able to implement realiable mirroring you have to use
> on-disk metadata. There is no way around that. You can implement
> non-redundant GEOM classes without using on-disk metadata, but
> out-of-band configuration in case of mirroring is simply naive. How do
> you detect that components are out of sync, for example?

GEOM configuration and per-class runtime state are not to be
treated the same. Out-of-band configuration is trivial.
Per-class runtime state, like whether elements in a mirrored
configuration are in sync or not is more difficult, but does
not a priori require on-disk metadata as it's implemented now.
You can have the configuration tell the GEOM where that state
is being kept, so that you can put it in a partition on the
disks involved, or even keep it independent from the disks,
which then requires disks to be uniquely identifiable, for
sure. But that's what GPT gives you anyway.

But even without identification, you can invert the question
from "how do I detect that components are out of sync" to
"how do I prove they are in fact in sync". That question has
a very simple O(n) answer. So, if time isn't a concern or
your storage is small, you can always scan all sectors as
such prove that the disks are in sync.

The point being: the current implementation isn't the only
one. Granted, it can easily be the simplest one or even the
best one in some cases, but that's besides the point you were
making.

-- 
Marcel Moolenaar
mar...@xcllnt.net


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Christian Laursen

On 06/27/12 16:28, John Baldwin wrote:

On Wednesday, June 27, 2012 8:45:45 am Andrey V. Elsukov wrote:


When we are in the FreeBSD, our loader can detect that device size
is lower than it see and it will work. When primary header is OK, then
other OSes should work with this GPT. When it isn't OK, you just can't
load other OS :)


Ah, yes.  The solution to violating standards is to make sure you never
use standards-compliant software.  That's a great argument. :)

(Although not entirely uncommon.  Standards aren't always perfect, but if
we had a way to not gratuitously violate them it would be nice to avoid
doing so.)


To be standards compliant and allow whole-disk based mirroring to work 
at the same time wouldn't nested GPT work like this?


Whole disk (start)
| GPT header
| GPT partition of type freebsd-geom (start)
| | gmirror device (start)
| | | GPT header
| | | | freebsd-boot
| | | | freebsd-ufs
| | | | freebsd-swap
| | | GPT backup header
| | gmirror metadata
| | gmirror device (end)
| GPT partition of type freebsd-geom (end)
| GPT backup header
Whole disk (end)

Nothing but FreeBSD would understand the freebsd-geom partition type, so 
the inner GPT device should be valid and standards compliant.


The boot loader would of course need to understand this setup but that 
shouldn't be impossible.


Just a thought.

It might be too complicated compared to the non-standards compliant way 
it works now which works quite well in practice though.


--
Christian Laursen


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Dimitry Andric
On 2012-06-26 14:50, Andrey V. Elsukov wrote:
> Some time ago i have started reading the code in the sys/boot.
> Especially i'm interested in the partition tables handling.
> I found several problems:
> 1. There are several copies of the same code in the libi386/biosdisk.c
> and common/disk.c, and partially libpc98/biosdisk.c.
> 2. ZFS probing is very slow, because the ZFS code doesn't know how many
> disks and partitions the system has:
>   http://www.freebsd.org/cgi/query-pr.cgi?pr=148296
>   http://www.freebsd.org/cgi/query-pr.cgi?pr=161897
> 3. The GPT support doesn't check CRC and even doesn't know anything
> about the secondary GPT header/table.
> 
> So, i have created the branch and committed the changes:
>   http://svnweb.freebsd.org/base/user/ae/bootcode/
> The patch is here:
>   http://people.freebsd.org/~ae/boot.diff

FWIW, I verified it compiles OK with clang, and especially boot2's size
isn't increased at all.

It would be nice if you could check it with clang now and again, before
you finally merge this project into head.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Marcel Moolenaar

On Jun 27, 2012, at 11:20 AM, Pawel Jakub Dawidek wrote:

> On Wed, Jun 27, 2012 at 10:37:11AM -0700, Marcel Moolenaar wrote:
>> 
>> On Jun 26, 2012, at 10:37 AM, John Baldwin wrote:
>>> 
>>> GPT really wants the backup header at the last LBA.  I know you can set it, 
>>> but I've interpreted that as a way to see if the primary header is correct 
>>> or 
>>> not.  It seems to me that GPT tables created in this fashion (inside a GEOM 
>>> provider) will not work properly with partition editors for other OS's.  
>>> I'm 
>>> hesitant to encourage the use of this as I do think putting GPT inside of a 
>>> gmirror violates the GPT spec.
>> 
>> Agreed.
> 
> Guys. This doesn't violate the GPT spec in any way. The spec is
> narrow-minded if it talks only about raw disks, but you should think
> about gmirror as pseudo-hardware RAID.

I'm sorry, but this is a contradiction. If it doesn't violate the
spec, then the spec is not narrow-minded on the grounds of what
we're discussing. If the spec *is* narrow-minded then obviously
it doesn't capture our scenario, which means that we're violating
the spec.

Clearly we're not discussing anything that falls well within the
spec, or is undebatable. This makes the whole topic dangerous
anyway. When you're in the grey area (this is only for argument's
sake -- we're in violation for sure) you're opening yourself up to
compatibility problems. Should we deliberately go there?

-- 
Marcel Moolenaar
mar...@xcllnt.net


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread John Baldwin
On Wednesday, June 27, 2012 1:45:35 pm Marcel Moolenaar wrote:
> 
> On Jun 26, 2012, at 2:43 PM, Pawel Jakub Dawidek wrote:
> > 
> > As for sharing disk with other OS. If you share the disk with OS that
> > doesn't support gmirror, you shouldn't use gmirror in the first place.
> > You probably want to use only formats that are recognized by all your
> > OSes.
> 
> This statement is ridicuous by virtue of not being in touch with
> reality and by making gmirror useless for such wide range of cases
> that one can question why we have it at all.
> 
> Put differently: a mirroring class is a fairly basic and useful thing
> to have. Limiting it's use is nothing but artificial and follows from
> having to use the underlying provider to store metadata. This then
> changes the view of the underlying providing to consumers above gmirror
> in a way that makes the presence or absence of gmirror visible.
> Solving the visibility problem makes gmirror useful all the time.
> I see that as a better way of looking at it than simply blurting out
> that you shouldn't use gmirror when certain awkward and artifical
> conditions apply.

I'm not sure we can force gmirror to be anything except FreeBSD-specific,
but it would be nice to not make non-standard GPT tables while we are at it.

The reason the metadata for things like Intel's onboard SATA RAID does work
ok is because the metadata format is enforced by the vendor, so it is
reasonable to assume that metadata format will work across other OS's.

Anyway, I've said my piece and will let the matter drop from my end at this
point.

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Marcel Moolenaar

On Jun 27, 2012, at 12:08 PM, Christian Laursen wrote:

> On 06/27/12 16:28, John Baldwin wrote:
>> On Wednesday, June 27, 2012 8:45:45 am Andrey V. Elsukov wrote:
>> 
>>> When we are in the FreeBSD, our loader can detect that device size
>>> is lower than it see and it will work. When primary header is OK, then
>>> other OSes should work with this GPT. When it isn't OK, you just can't
>>> load other OS :)
>> 
>> Ah, yes.  The solution to violating standards is to make sure you never
>> use standards-compliant software.  That's a great argument. :)
>> 
>> (Although not entirely uncommon.  Standards aren't always perfect, but if
>> we had a way to not gratuitously violate them it would be nice to avoid
>> doing so.)
> 
> To be standards compliant and allow whole-disk based mirroring to work at the 
> same time wouldn't nested GPT work like this?

GPTs don't nest.

> Nothing but FreeBSD would understand the freebsd-geom partition type, so the 
> inner GPT device should be valid and standards compliant.

If it were standards compliant, it would be discoverable by non-FreeBSD.
That clearly isn't the case -- hence it's not standards compliant. What
for example if someone wanted to share the swap partition between Linux
and FreeBSD?

-- 
Marcel Moolenaar
mar...@xcllnt.net


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Andrey V. Elsukov
On 27.06.2012 21:55, Marcel Moolenaar wrote:
> You can't just re-interpret standards to match a context you know very well
> isn't applicable and consequently redefine what the word "device" means.
> You're on a slippery slope and while you may not see it as a problem, you
> do make it a problem for FreeBSD users. It's our users we should be keeping
> in mind when we solve problems.
> 
>> If a user wants modify GPT in the disk editor from the another OS,
>> he can do it, and it should work. The result depends only from the partition 
>> editor,
>> it might overwrite the last sector and might don't.
> 
> Right. Another happy user that sees his/her FreeBSD installation destroyed
> or degraded (no mirroring, warning messages about corrupted GPT, etc) for
> no apparent reason and without any kind of warning that what he/she is doing
> is potentially harmful... That's the spirit!

Ok. Let's return back to my patches. They don't add any new methods to
shoot in the foot. We are talking about the *FreeBSD loader*.
This is the program that starts FreeBSD kernel. It doesn't start other
OS. We already have many users who uses FreeBSD as a single system on
the machine. Many of them use GPT inside of some GEOM provider.
You can just read the lists, articles about installing FreeBSD, forums,
etc. We already have these users and i hope they will use FreeBSD as
before. So, why can't add a simple quirk to make theirs system a bit
more reliable?

As i understand there two parts where we haven't a consensus:

1. You are against from:
Our loader detects that primary GPT header is damaged. It tries to read
backup GPT header from the last LBA and it detects that there is
"GEOM::" signature. It tries to read one previous sector and there is
*valid* GPT header. It is valid, because it's CRC is valid, it's
self_LBA is valid. For the *FreeBSD* users it is better to don't use
this GPT and just complain "i'm sorry, can't boot". The other OSes
can't, and we shouldn't.

2. You are against from having one fake PMBR entry by default in the
/boot/pmbr image. Ok, I can propose several ways to resolve this:
 * remove from the loader's GPT probing code restriction to necessarily
have PMBR partition record in the MBR;
 * teach the boot0cfg command properly write the PMBR;
 * add new condition to mark GPT as corrupt when it has invalid PMBR.
Thus, when you write PMBR with empty partition table with dd(1), the
kernel will complain and you will be forced to run `gpart recover`.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Mark Felder

On Tue, 26 Jun 2012 12:37:11 -0500, John Baldwin  wrote:


I'm
hesitant to encourage the use of this as I do think putting GPT inside  
of a

gmirror violates the GPT spec.


I personally think this use case is a bit ... odd, anyway.

I have only request to those that manage GPT/GEOM/etc -- as I'm used to  
doing multiple mdadm RAID components on Linux for maximum flexibility,  
using gmirror upon multiple GPT partitions upon the same physical device  
is OK with me. My only complaint is that recovery is very, very stupid. We  
should by default detect and only rebuild ONE gmirror device at a time on  
the same physical provider. You get nothing but a smokin' angry head if  
you allow multiple to rebuild at the same time because it's fighting over  
sequential writes all the way across the platters. It would also be nice  
if gmirror rebuild could also be detected by fsck and fsck could either  
hold off or gmirror could be paused until a consistent filesystem state  
exists. It's probably best for the background fsck to go first so you can  
get the system up and running, but then when it's finished gmirror should  
continue.


Otherwise I have no issues with gmirror -- it does exactly the job I need  
it to.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Marcel Moolenaar

On Jun 27, 2012, at 12:27 PM, Andrey V. Elsukov wrote:

> On 27.06.2012 21:55, Marcel Moolenaar wrote:
>> You can't just re-interpret standards to match a context you know very well
>> isn't applicable and consequently redefine what the word "device" means.
>> You're on a slippery slope and while you may not see it as a problem, you
>> do make it a problem for FreeBSD users. It's our users we should be keeping
>> in mind when we solve problems.
>> 
>>> If a user wants modify GPT in the disk editor from the another OS,
>>> he can do it, and it should work. The result depends only from the 
>>> partition editor,
>>> it might overwrite the last sector and might don't.
>> 
>> Right. Another happy user that sees his/her FreeBSD installation destroyed
>> or degraded (no mirroring, warning messages about corrupted GPT, etc) for
>> no apparent reason and without any kind of warning that what he/she is doing
>> is potentially harmful... That's the spirit!
> 
> Ok. Let's return back to my patches. They don't add any new methods to
> shoot in the foot. We are talking about the *FreeBSD loader*.
> This is the program that starts FreeBSD kernel. It doesn't start other
> OS. We already have many users who uses FreeBSD as a single system on
> the machine. Many of them use GPT inside of some GEOM provider.

Your patches are a continuation on a path that we're discussing isn't
necessarily the path we should be on. While you don't make things
worse from a compliance perspective, you make it worse by adding the
non-compliant behaviour to more components.

> As i understand there two parts where we haven't a consensus:
> 
> 1. You are against from:
> Our loader detects that primary GPT header is damaged. It tries to read
> backup GPT header from the last LBA and it detects that there is
> "GEOM::" signature. It tries to read one previous sector and there is
> *valid* GPT header.

How do you know it's valid? It's in a location that is not valid
to begin with. Validity is based on rules and you're violating the
the rules without defining exactly what we call valid given the
new rules. This may seem nitpicking, but having went through the
hassle of dealing with the broken way we created the dangerously
dedicated disk, I appreciate the importance of being anal when it
comes to something that lives on non-volatile storage and gets to
be exposed to a world much larger than FreeBSD.

> 2. You are against from having one fake PMBR entry by default in the
> /boot/pmbr image.

I don't understand what you're saying or what I'm being accused to
be against.

-- 
Marcel Moolenaar
mar...@xcllnt.net


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Andrey V. Elsukov
On 28.06.2012 00:14, Marcel Moolenaar wrote:
>> Our loader detects that primary GPT header is damaged. It tries to read
>> backup GPT header from the last LBA and it detects that there is
>> "GEOM::" signature. It tries to read one previous sector and there is
>> *valid* GPT header.
> 
> How do you know it's valid? It's in a location that is not valid
> to begin with. Validity is based on rules and you're violating the
> the rules without defining exactly what we call valid given the
> new rules. This may seem nitpicking, but having went through the
> hassle of dealing with the broken way we created the dangerously
> dedicated disk, I appreciate the importance of being anal when it
> comes to something that lives on non-volatile storage and gets to
> be exposed to a world much larger than FreeBSD.

So why do you not prevent to attach GEOM_PART_GPT to any providers that
are not the disk drive? This will be the right solution to all our
problems. Just don't create invalid GPT.

-- 
WBR, Andrey V. Elsukov
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Marcel Moolenaar

On Jun 27, 2012, at 1:48 PM, Andrey V. Elsukov wrote:

> On 28.06.2012 00:14, Marcel Moolenaar wrote:
>>> Our loader detects that primary GPT header is damaged. It tries to read
>>> backup GPT header from the last LBA and it detects that there is
>>> "GEOM::" signature. It tries to read one previous sector and there is
>>> *valid* GPT header.
>> 
>> How do you know it's valid? It's in a location that is not valid
>> to begin with. Validity is based on rules and you're violating the
>> the rules without defining exactly what we call valid given the
>> new rules. This may seem nitpicking, but having went through the
>> hassle of dealing with the broken way we created the dangerously
>> dedicated disk, I appreciate the importance of being anal when it
>> comes to something that lives on non-volatile storage and gets to
>> be exposed to a world much larger than FreeBSD.
> 
> So why do you not prevent to attach GEOM_PART_GPT to any providers that
> are not the disk drive? This will be the right solution to all our
> problems. Just don't create invalid GPT.

It's not even the right solution, as it prevents legit nesting
of gpart GEOMs *and* is fundamentally based on a flawed assumption
that any non-disk GEOM underneath gpart yields an invalid GPT.
Think gnop.

-- 
Marcel Moolenaar
mar...@xcllnt.net


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


"magic" crashes - mostly solved but

2012-06-27 Thread Wojciech Puchar

the reason was most probably of of date vbox and fuse kernel modules.

after making everything in sync system boots successfully with WITNESS, 
INVARIANT etc. options enabled.


STILL - mostly at booting i'm getting few messages.

first comes when executing /etc/rc.d/named (at mounting devfs IMHO):

Jun 27 18:32:23 foo kernel: lock order reversal:
Jun 27 18:32:23 foo kernel: 1st 0xff80f5859800 bufwait (bufwait) 
@/usr/src/sys/kern/vfs_bio.c:2636
Jun 27 18:32:24 foo kernel: 
Jun 27 18:32:24 foo kernel: 2nd 0xff0005c82200 dirhash (dirhash) @/usr/src/sys/ufs/ufs/ufs_dirhash.c:285

Jun 27 18:32:24 foo kernel: KDB: stack backtrace:
Jun 27 18:32:24 foo kernel: db_trace_self_wrapper() at 
db_trace_self_wrapper+0x27
Jun 27 18:32:24 foo kernel: em0: link state changed to UP
Jun 27 18:32:24 foo kernel: kdb_backtrace() at kdb_backtrace+0x3e
Jun 27 18:32:24 foo kernel: _witness_debugger() at _witness_debugger+0x24
Jun 27 18:32:24 foo kernel: witness_checkorder() at witness_checkorder+0xae7
Jun 27 18:32:24 foo kernel: _sx_xlock() at _sx_xlock+0xbf 
Jun 27 18:32:24 foo kernel: ufsdirhash_acquire() at ufsdirhash_acquire+0x4f

Jun 27 18:32:24 foo kernel: ufsdirhash_remove() at ufsdirhash_remove+0x1c
Jun 27 18:32:24 foo kernel: ufs_dirremove() at ufs_dirremove+0x12c
Jun 27 18:32:24 foo kernel: ufs_remove() at ufs_remove+0x8f
Jun 27 18:32:24 foo kernel: VOP_REMOVE_APV() at VOP_REMOVE_APV+0xf4
Jun 27 18:32:24 foo kernel: VOP_REMOVE() at VOP_REMOVE+0x45
Jun 27 18:32:24 foo kernel: kern_unlinkat() at kern_unlinkat+0x1ce
Jun 27 18:32:24 foo kernel: kern_unlink() at kern_unlink+0x28
Jun 27 18:32:24 foo kernel: unlink() at unlink+0x25
Jun 27 18:32:24 foo kernel: syscallenter() at syscallenter+0x2e3
Jun 27 18:32:24 foo kernel: amd64_syscall() at amd64_syscall+0x58
Jun 27 18:32:24 foo kernel: 
Jun 27 18:32:24 foo kernel: Xfast_syscall() at Xfast_syscall+0xfc

Jun 27 18:32:24 foo kernel: --- syscall (10, FreeBSD ELF64, unlink), rip = 
0xeede070c, rsp = 0x7fffdb08, rbp = 0x7fffef58 ---
Jun 27 18:32:24 foo kernel: lock order reversal:
Jun 27 18:32:24 foo kernel: 1st 0xff00080a8270 ufs (ufs) 
@/usr/src/sys/kern/vfs_mount.c:1081
Jun 27 18:32:24 foo kernel: 2nd 0xff00085397f8 devfs (devfs) @/ 
/usr/src/sys/kern/vfs_subr.c:2169
Jun 27 18:32:24 foo kernel: KDB: stack backtrace:
Jun 27 18:32:24 foo kernel: db_trace_self_wrapper() atdb_trace_self_wrapper+0x27
Jun 27 18:32:24 foo kernel: kdb_backtrace() at kdb_backtrace+0x3e
Jun 27 18:32:24 foo kernel: _witness_debugger() at _witness_debugger+0x24
Jun 27 18:32:24 foo kernel: witness_checkorder() atwitness_checkorder+0xae7
Jun 27 18:32:24 foo kernel: __lockmgr_args() at __lockmgr_args+0x68d
Jun 27 18:32:24 foo kernel: _lockmgr_args() at _lockmgr_args+0x6f
Jun 27 18:32:24 foo kernel: vop_stdlock() at vop_stdlock+0x67
Jun 27 18:32:24 foo kernel: VOP_LOCK1_APV() at VOP_LOCK1_APV+0xfd
Jun 27 18:32:24 foo kernel: VOP_LOCK1() at VOP_LOCK1+0x4b
Jun 27 18:32:24 foo kernel: _vn_lock() at _vn_lock+0x64
Jun 27 18:32:24 foo kernel: vget() at vget+0xe9
Jun 27 18:32:24 foo kernel: devfs_allocv() at devfs_allocv+0x125
Jun 27 18:32:24 foo kernel: devfs_root() at devfs_root+0x5a
Jun 27 18:32:24 foo kernel: vfs_domount() at vfs_domount+0xcdb
Jun 27 18:32:24 foo kernel: vfs_donmount() at vfs_donmount+0x78e
Jun 27 18:32:24 foo kernel: nmount() at nmount+0x7e
Jun 27 18:32:24 foo kernel: syscallenter() at syscallenter+0x2e3
Jun 27 18:32:24 foo kernel: amd64_syscall() at amd64_syscall+0x58
Jun 27 18:32:24 foo kernel: Xfast_syscall() at Xfast_syscall+0xfc
Jun 27 18:32:24 foo kernel: --- syscall (378, FreeBSD ELF64, nmount), rip= 
0xeee6535c, rsp = 0x7fffdd18, rbp = 0xef206048 ---
Jun 27 18:32:24 foo named[1071]: starting BIND 9.6.-ESV-R7-P1 -t/var/named -u 
bind
Jun 27 18:32:24 foo kernel: Starting named.



few more when mounting or unmounting (i'm not sure) pendrive.

Jun 27 18:57:09 foo kernel: lock order reversal:
Jun 27 18:57:09 foo kernel: 1st 0xff011ec78098 ufs (ufs) @ 
/usr/src/sys/kern/vfs_lookup.c:504
Jun 27 18:57:09 foo kernel: 2nd 0xff80f5e1bb80 bufwait (bufwait) @ 
/usr/src/sys/ufs/ffs/ffs_softdep.c:6193
Jun 27 18:57:09 foo kernel: 3rd 0xff011ead3d80 ufs (ufs) @ 
/usr/src/sys/kern/vfs_subr.c:2169
Jun 27 18:57:09 foo kernel: KDB: stack backtrace:
Jun 27 18:57:09 foo kernel: db_trace_self_wrapper() at 
db_trace_self_wrapper+0x27
Jun 27 18:57:09 foo kernel: kdb_backtrace() at kdb_backtrace+0x3e
Jun 27 18:57:09 foo kernel: _witness_debugger() at _witness_debugger+0x24
Jun 27 18:57:09 foo kernel: witness_checkorder() at witness_checkorder+0xae7
Jun 27 18:57:09 foo kernel: __lockmgr_args() at __lockmgr_args+0x68d
Jun 27 18:57:09 foo kernel: _lockmgr_args() at _lockmgr_args+0x6f
Jun 27 18:57:09 foo kernel: ffs_lock() at ffs_lock+0xaa
Jun 27 18:57:09 foo kernel: VOP_LOCK1_APV() at VOP_LOCK1_APV+0xfd
Jun 27 18:57:09 foo kernel: VOP_LOCK1() at VOP_LOCK1+0x4b
Jun 27 18:57:09 foo kernel: _vn_lock() at _vn_lock+0x64
Jun 27 18:57:09 foo kernel: vget() at vget+0xe9
Jun 27 18:57:

Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Poul-Henning Kamp

I would like to point out that all other operating system which has
had this precise problem, have solved it by adding a bootfs partition
to hold the kernel+modules required to truly understand the disk-layout ?

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: /etc/resolv.conf getting over written with dhcp

2012-06-27 Thread Ian Lepore
On Wed, 2012-06-20 at 13:39 +0530, Varuna wrote:
> Ian Lepore wrote:
> > 
> > Using the 'prepend' or 'supercede' keywords in /etc/dhclient.conf is
> > pretty much the standard way of handling a mix of static and dhcp
> > interfaces where the static config needs to take precedence.  I'm not
> > sure why you dismiss it as essentially good, but somehow not good
> > enough.  It's been working for me for years.
> > 
> > -- Ian
> > 
> The issue that I had indicated that the issue with the /etc/resolv.conf is 
> being 
> caused by an error in /sbin/dhclient-script; hence, I am definitely not 
> looking 
> at solving the issue either with /etc/dhclient.conf or 
> /etc/dhclient-exit-hooks 
> configuration file.
> 
> BTW, resolver(5) / resolv.conf(5) does not mention the usage of 
> /etc/dhclient-exit-hooks file to protect the earlier contents of 
> /etc/resolv.conf file.  Will put this issue in the freebsd-doc mailing list.
> 
> With regards,
> Varuna
> Eudaemonic Systems
> Simple, Specific & Insightful

I have re-read your original message and I think the confusion is here:


> 2***# When resolv.conf is not changed actually, we don't
>  # need to update it.
>  # If /usr is not mounted yet, we cannot use cmp, then
>  # the following test fails.  In such case, we simply
>  # ignore an error and do update resolv.conf.
> 3***if cmp -s $tmpres /etc/resolv.conf; then
>  rm -f $tmpres
>  return 0
>  fi 2>/dev/null
> [...]
> I guess, the 1***, 3*** and 4*** is causing the recreation of 
> /etc/resolv.conf. 
>   Is this correct? I did a small modification to 3*** which is:
>  if !(cmp -s $tmpres /etc/resolv.conf); then
>  rm -f $tmpres
>  return 0
>  fi 2>/dev/null
> This seems to have solved the issue of /etc/resolv.conf getting overwritten 
> with 
> just: nameserver 192.168.98.4.  This ensures that: If there is a difference 
> between $tmpres and /etc/resolv.conf, then it exits post removal of $tmpres.  
> If 
> the execution of 3*** returns a 0, a new file gets created.  I guess the 
> modification get the intent of 3*** working.
> 
> Have I barked up the wrong tree?

I think yes, you have barked up the wrong tree.  The intent of the code
at 3*** is not to exit if there is a difference, it is to exit if there
is NO difference.  In other words, if the old and new files are
identical then there is no need to re-write the file, just cleanup and
exit.  If the files are different then replace the existing file with
the new one.

This is just the (sometimes annoying) way dhcp works.  If the dhcp
server provides new resolver info it completely replaces any existing
resolver info unless you've configured your dhclient.conf to prevent it.
It only does so if the interface being configured is the current
default-route interface, or there is no current default-route interface.

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Freeze when running freebsd-update

2012-06-27 Thread Dieter BSD
 Robert writes:
> 3) the box is responsive to hitting enter at the console (it produces
> another login: prompt)

 Getty is in memory and can run.

> 5) if I try to login to the console, it lets me enter a username then
> locks up totally, it does not present me with a password: prompt.

 Login(1) is not in memory, and the kernel cannot read it from disk
 for some reason.

 I can get this symptom by writing a large file to a disk on a
 controller that FreeBSD doesn't support NCQ on. I assume there
 is a logjam in the buffer cache. Something trivial like reading
 login in from disk that would normally happen in well under a
 second can take many minutes.

 Perhaps geli is causing a similar logjam? Does it hang forever or
 is it just obscenely slow? If it truely hangs forever it is
 probably something else. Is there disk activity after it hangs?
 Can you try it without geli? systat -vmstat might provide a clue.
>>>
>>> Well, it is geli. I'm unable to reproduce the freeze on the same
>>> exact system with everything else the same except for no geli. I'm
>>> going to move this thread over to geom, and continue it there. Thanks
>>> for your help!
>>
>> It occurs to me that it will need twice as much memory for disk i/o.
>> 1 buffer for encrypted and 1 for unencrypted. I know nothing about geli,
>> so I don't know if it uses the buffer cache for both, or what.
>> Could it be that the kernel isn't keeping enough memory free and
>> manages to paint itself into a corner and not have space to store
>> the unencrypted version of disk reads, and can't page/swap anything
>> out to make space because it doesn't have space to store the encrypted
>> version to write?
>
> I think that's probably about what is happening. I'm still waiting
> for an answer on the geom mailing list, but I will do some testing
> with increasing memory sizes and see where the problem stops
> occurring.

Some of the vfs.*buf sysctls might be useful?
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: "magic" crashes - mostly solved but

2012-06-27 Thread Benjamin Kaduk

On Wed, 27 Jun 2012, Wojciech Puchar wrote:


the reason was most probably of of date vbox and fuse kernel modules.

after making everything in sync system boots successfully with WITNESS, 
INVARIANT etc. options enabled.


STILL - mostly at booting i'm getting few messages.

first comes when executing /etc/rc.d/named (at mounting devfs IMHO):

Jun 27 18:32:23 foo kernel: lock order reversal:
Jun 27 18:32:23 foo kernel: 1st 0xff80f5859800 bufwait (bufwait) 
@/usr/src/sys/kern/vfs_bio.c:2636
Jun 27 18:32:24 foo kernel: Jun 27 18:32:24 foo kernel: 2nd 
0xff0005c82200 dirhash (dirhash) @/usr/src/sys/ufs/ufs/ufs_dirhash.c:285



http://ipv4.sources.zabbadoz.net/freebsd/lor/261.html




few more when mounting or unmounting (i'm not sure) pendrive.

Jun 27 18:57:09 foo kernel: lock order reversal:
Jun 27 18:57:09 foo kernel: 1st 0xff011ec78098 ufs (ufs) @ 
/usr/src/sys/kern/vfs_lookup.c:504
Jun 27 18:57:09 foo kernel: 2nd 0xff80f5e1bb80 bufwait (bufwait) @ 
/usr/src/sys/ufs/ffs/ffs_softdep.c:6193

Jun 27 18:57:09 foo kernel: 3rd 0xff011ead3d80 ufs (ufs) @


http://ipv4.sources.zabbadoz.net/freebsd/lor/285.html
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"