from:"Kay Sievers"

Re: [PATCH RESEND 1/3] printk: convert byte-buffer to variable-length record buffer

2012-11-28 Thread Kay Sievers

On Wed, Nov 28, 2012 at 2:33 PM, Michael Kerrisk  wrote:
> On Thu, May 3, 2012 at 2:29 AM, Kay Sievers  wrote:
>> From: Kay Sievers 
> [...]
>> case SYSLOG_ACTION_SIZE_UNREAD:
>> -   error = log_end - log_start;
>> +   raw_spin_lock_irq(&logbuf_lock);
>> +   if (syslog_seq < log_first_seq) {
>> +   /* messages are gone, move to first one */
>> +   syslog_seq = log_first_seq;
>> +   syslog_idx = log_first_idx;
>> +   }
>> +   if (from_file) {
>> +   /*
>> +* Short-cut for poll(/"proc/kmsg") which simply 
>> checks
>> +* for pending data, not the size; return the count 
>> of
>> +* records, not the length.
>> +*/
>> +   error = log_next_idx - syslog_idx;
>> +   } else {
>> +   u64 seq;
>> +   u32 idx;
>> +
>> +   error = 0;
>> +   seq = syslog_seq;
>> +   idx = syslog_idx;
>> +   while (seq < log_next_seq) {
>> +   error += syslog_print_line(idx, NULL, 0);
>> +   idx = log_next(idx);
>> +   seq++;
>> +   }
>> +   }
>> +   raw_spin_unlock_irq(&logbuf_lock);
>> break;
> [...]
>
> It looks as though the changes here have broken SYSLOG_ACTION_SIZE_UNREAD.

Any specifics that it causes actual problems we need to address?

> On a 2.6.31 system, immediately after SYSLOG_ACTION_READ_CLEAR, a
> SYSLOG_ACTION_SIZE_UNREAD returns 0.

Hmm, sounds like the right thing to do.

We have read everything, even cleared the buffer for later queries. So
there is nothing to read anymore for later calls, and they will
actually never return anything if they are called, so returning 0 here
sounds like the right thing. The current SYSLOG_ACTION_SIZE_UNREAD
seems to match properly the expectations one can make for
SYSLOG_ACTION_READ_ALL.

> On 3.5, immediately after SYSLOG_ACTION_READ_CLEAR, the value returned
> by SYSLOG_ACTION_SIZE_UNREAD is unchanged
>
> (i.e., assuming that the
> value returned was non-zero before SYSLOG_ACTION_SIZE_UNREAD, it is
> still nonzero afterward), even though a subsequent
> SYSLOG_ACTION_READ_CLEAR indicates that there are zero bytes to read.

Which sounds at least like weird behaviour, if not "broken".

Any indication that we need to restore the old behaviour to fix some
weird assumptions? To me the current one sounds like the better and
more correct option, and what one would expect from it. But maybe we
cannot get away with it ...

(I hope I understood what you explained correctly, I'm a bit confused
by the issue.)


Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND 1/3] printk: convert byte-buffer to variable-length record buffer

2012-11-28 Thread Kay Sievers

On Wed, Nov 28, 2012 at 5:37 PM, Linus Torvalds
 wrote:
> On Wed, Nov 28, 2012 at 8:22 AM, Kay Sievers  wrote:
>> On Wed, Nov 28, 2012 at 2:33 PM, Michael Kerrisk  
>> wrote:
>>
>>> On a 2.6.31 system, immediately after SYSLOG_ACTION_READ_CLEAR, a
>>> SYSLOG_ACTION_SIZE_UNREAD returns 0.
>>
>> Hmm, sounds like the right thing to do.
>
> Right.
>
> And that's the *OLD* behavior (2.6.31).

Ah, hmm, I read 2.6... as 3.6... :)

> So the new behavior is insane and different. Let's fix it.

Yeah.

> It looks like it is because the new SYSLOG_ACTION_SIZE_UNREAD code
> does not take the new clear_seq code into account. Hmm?

Right, something like that. I'll take a look now ...

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND 1/3] printk: convert byte-buffer to variable-length record buffer

2012-11-28 Thread Kay Sievers

On Wed, 2012-11-28 at 17:49 +0100, Kay Sievers wrote:
> On Wed, Nov 28, 2012 at 5:37 PM, Linus Torvalds
>  wrote:
> > On Wed, Nov 28, 2012 at 8:22 AM, Kay Sievers  wrote:
> >> On Wed, Nov 28, 2012 at 2:33 PM, Michael Kerrisk  
> >> wrote:
> >>
> >>> On a 2.6.31 system, immediately after SYSLOG_ACTION_READ_CLEAR, a
> >>> SYSLOG_ACTION_SIZE_UNREAD returns 0.
> >>
> >> Hmm, sounds like the right thing to do.
> >
> > Right.
> >
> > And that's the *OLD* behavior (2.6.31).
> 
> Ah, hmm, I read 2.6... as 3.6... :)
> 
> > So the new behavior is insane and different. Let's fix it.
> 
> Yeah.
> 
> > It looks like it is because the new SYSLOG_ACTION_SIZE_UNREAD code
> > does not take the new clear_seq code into account. Hmm?
> 
> Right, something like that. I'll take a look now ...

From: Kay Sievers 
Subject: printk: respect SYSLOG_ACTION_READ_CLEAR for SYSLOG_ACTION_SIZE_UNREAD

On Wed, Nov 28, 2012 at 2:33 PM, Michael Kerrisk  wrote:
> It looks as though the changes here have broken SYSLOG_ACTION_SIZE_UNREAD.
> 
> On a 2.6.31 system, immediately after SYSLOG_ACTION_READ_CLEAR, a
> SYSLOG_ACTION_SIZE_UNREAD returns 0.
> 
> On 3.5, immediately after SYSLOG_ACTION_READ_CLEAR, the value returned
> by SYSLOG_ACTION_SIZE_UNREAD is unchanged (i.e., assuming that the
> value returned was non-zero before SYSLOG_ACTION_SIZE_UNREAD, it is
> still nonzero afterward), even though a subsequent
> SYSLOG_ACTION_READ_CLEAR indicates that there are zero bytes to read.

Fix SYSLOG_ACTION_SIZE_UNREAD to return the amount of available
characters by starting to count at the first available record after
the last SYSLOG_ACTION_READ_CLEAR, instead of the first message
record for SYSLOG_ACTION_READ.

Before:
  syslog(SYSLOG_ACTION_SIZE_UNREAD, 0, 0) = 286965
  syslog(SYSLOG_ACTION_READ_CLEAR, "<12>"..., 100) = 24000
  syslog(SYSLOG_ACTION_SIZE_UNREAD, 0, 0) = 286965

After:
  syslog(SYSLOG_ACTION_SIZE_UNREAD, 0, 0) = 90402
  syslog(SYSLOG_ACTION_READ_CLEAR, "<5>"..., 100) = 90402
  syslog(SYSLOG_ACTION_SIZE_UNREAD, 0, 0) = 0

Reported-By: Michael Kerrisk 
Signed-Off-By: Kay Sievers 
---
 printk.c |   15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/kernel/printk.c b/kernel/printk.c
index 2d607f4..35a7f4f 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -1183,12 +1183,10 @@ int do_syslog(int type, char __user *buf, int len, bool 
from_file)
/* Number of chars in the log buffer */
case SYSLOG_ACTION_SIZE_UNREAD:
raw_spin_lock_irq(&logbuf_lock);
-   if (syslog_seq < log_first_seq) {
+   if (clear_seq < log_first_seq) {
/* messages are gone, move to first one */
-   syslog_seq = log_first_seq;
-   syslog_idx = log_first_idx;
-   syslog_prev = 0;
-   syslog_partial = 0;
+   clear_seq = log_first_seq;
+   clear_idx = log_first_idx;
}
if (from_file) {
/*
@@ -1198,9 +1196,9 @@ int do_syslog(int type, char __user *buf, int len, bool 
from_file)
 */
error = log_next_idx - syslog_idx;
} else {
-   u64 seq = syslog_seq;
-   u32 idx = syslog_idx;
-   enum log_flags prev = syslog_prev;
+   u64 seq = clear_seq;
+   u32 idx = clear_idx;
+   enum log_flags prev = 0;
 
error = 0;
while (seq < log_next_seq) {
@@ -1211,7 +1209,6 @@ int do_syslog(int type, char __user *buf, int len, bool 
from_file)
seq++;
prev = msg->flags;
}
-   error -= syslog_partial;
}
raw_spin_unlock_irq(&logbuf_lock);
break;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND 1/3] printk: convert byte-buffer to variable-length record buffer

2012-11-29 Thread Kay Sievers

On Thu, Nov 29, 2012 at 2:18 PM, Michael Kerrisk (man-pages)
 wrote:
> On Wed, Nov 28, 2012 at 6:51 PM, Kay Sievers  wrote:

>> Before:
>>   syslog(SYSLOG_ACTION_SIZE_UNREAD, 0, 0) = 286965
>>   syslog(SYSLOG_ACTION_READ_CLEAR, "<12>"..., 100) = 24000
>>   syslog(SYSLOG_ACTION_SIZE_UNREAD, 0, 0) = 286965
>>
>> After:
>>   syslog(SYSLOG_ACTION_SIZE_UNREAD, 0, 0) = 90402
>>   syslog(SYSLOG_ACTION_READ_CLEAR, "<5>"..., 100) = 90402
>>   syslog(SYSLOG_ACTION_SIZE_UNREAD, 0, 0) = 0

> I'm going to call my report yesterday bogus. Somewhere along the way,
> I got confused while testing something, and my statement about 2.6.31
> behavior is wrong: the 2.6.31 and 3.5 behaviors are the same. As such,
> your patch is unneeded. Sorry for wasting your time.

I think you have been right with your report. The above pasted
before/after from the patch commit text is actually a result of real
testing with current git. And your initial description sounds right,
and the patch seems to produce the expected results here. I just
confused the numbers in your report and wrongly parsed 2.6 > 3.6.

Hmm, at least do far we did not blame anybody else than ourselves as
confused. One of us at least is right, and it looks you have been, and
I also think the patch is at least intended to be right. :)

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND 1/3] printk: convert byte-buffer to variable-length record buffer

2012-11-29 Thread Kay Sievers

On Thu, Nov 29, 2012 at 2:37 PM, Michael Kerrisk (man-pages)
 wrote:
> On Thu, Nov 29, 2012 at 2:28 PM, Kay Sievers  wrote:
>> On Thu, Nov 29, 2012 at 2:18 PM, Michael Kerrisk (man-pages)
>>  wrote:
>>> On Wed, Nov 28, 2012 at 6:51 PM, Kay Sievers  wrote:
>>
>>>> Before:
>>>>   syslog(SYSLOG_ACTION_SIZE_UNREAD, 0, 0) = 286965
>>>>   syslog(SYSLOG_ACTION_READ_CLEAR, "<12>"..., 100) = 24000
>>>>   syslog(SYSLOG_ACTION_SIZE_UNREAD, 0, 0) = 286965
>>>>
>>>> After:
>>>>   syslog(SYSLOG_ACTION_SIZE_UNREAD, 0, 0) = 90402
>>>>   syslog(SYSLOG_ACTION_READ_CLEAR, "<5>"..., 100) = 90402
>>>>   syslog(SYSLOG_ACTION_SIZE_UNREAD, 0, 0) = 0
>>
>>> I'm going to call my report yesterday bogus. Somewhere along the way,
>>> I got confused while testing something, and my statement about 2.6.31
>>> behavior is wrong: the 2.6.31 and 3.5 behaviors are the same. As such,
>>> your patch is unneeded. Sorry for wasting your time.
>>
>> I think you have been right with your report. The above pasted
>> before/after from the patch commit text is actually a result of real
>> testing with current git. And your initial description sounds right,
>> and the patch seems to produce the expected results here. I just
>> confused the numbers in your report and wrongly parsed 2.6 > 3.6.
>>
>> Hmm, at least do far we did not blame anybody else than ourselves as
>> confused. One of us at least is right, and it looks you have been, and
>> I also think the patch is at least intended to be right. :)
>
> Okay -- I'm pretty sure I am right about being wrong ;-).
>
> Could you do some comparative testing please between 3.5 and pre-3.5.
> I have a little test program below. When I rechecked 2.6.31 and 3.5
> using this program I found the behavior was the same, which is why I
> conclude my report is wrong. (And also, your proposed patch in
> response to my bogus report produces different behavior from 2.6.31).

Oh, seems you are right.

The old kernel does not return 0, while it probably should. The
current kernel seems to do the same thing.

But the behaviour with the patch stills seems like the better and the
obvious and expected behaviour to me. :)

Thanks,
Kay

# old fedora kernel (pre-new-logging) ###
# uname -a
Linux nja 3.4.4-3.fc17.x86_64

# echo 1234567890 > /dev/kmsg

# ./syslog-test 9
SYSLOG_ACTION_SIZE_UNREAD
Number of unread bytes: 95299

# ./syslog-test 4
SYSLOG_ACTION_READ_CLEAR
Return value: 29
<4>[   54.933282] 1234567890
Ring buffer cleared

# ./syslog-test 9
SYSLOG_ACTION_SIZE_UNREAD
Number of unread bytes: 95299

# current fedora kernel (new logging) 

# uname -a
Linux nja 3.6.6-3.fc18.x86_64

# echo 1234567890 > /dev/kmsg

# ./syslog-test 9
SYSLOG_ACTION_SIZE_UNREAD
Number of unread bytes: 286822

# ./syslog-test 4
SYSLOG_ACTION_READ_CLEAR
Return value: 31
<12>[259301.067699] 1234567890
Ring buffer cleared

# ./syslog-test 9
SYSLOG_ACTION_SIZE_UNREAD
Number of unread bytes: 286822

# git kernel with the above patch applied #

# uname -a
Linux nja 3.7.0-rc7+

# echo 1234567890 > /dev/kmsg

# ./syslog-test 9
SYSLOG_ACTION_SIZE_UNREAD
Number of unread bytes: 30

# ./syslog-test 4
SYSLOG_ACTION_READ_CLEAR
Return value: 30
<12>[   69.591745] 1234567890
Ring buffer cleared

# ./syslog-test 9
SYSLOG_ACTION_SIZE_UNREAD
Number of unread bytes: 0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND 1/3] printk: convert byte-buffer to variable-length record buffer

2012-11-29 Thread Kay Sievers

On Thu, Nov 29, 2012 at 3:18 PM, Michael Kerrisk (man-pages)
 wrote:
> So, just to be clear: you better not apply your patch; it might break
> something ;-).

Sounds good to me. :)

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 00/23] printk: refactoring

2012-11-29 Thread Kay Sievers

On Thu, Nov 29, 2012 at 11:46 PM, Joe Perches  wrote:

> It's still a scheduling issue with Kay and Jan's patches.
> Does anyone have any idea if/when those patches are going in?

I was expecting that Jan rebases the patches incorporating
the latest fix. Jan?

It should not hold back anything else that is already ready to merge.
We will manage it to rebase, I think.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] init: fix name of root device in /proc/mounts

2013-03-20 Thread Kay Sievers

On Wed, Mar 20, 2013 at 10:11 PM, William Hubbs  wrote:
> On Wed, Mar 20, 2013 at 02:03:20AM -0500, Rob Landley wrote:
>> On 03/19/2013 07:20:17 PM, William Hubbs wrote:
>> > On Tue, Mar 19, 2013 at 04:17:11PM -0700, H. Peter Anvin wrote:
>> > > On 03/19/2013 03:28 PM, William Hubbs wrote:
>> > > > The issue is that /dev/root appears in /proc/mounts if you do not
>> > > > boot with an initramfs, but /dev/root is not a device node. In the
>> > > > past, udev created a symbolic link from /dev/root to the
>> > > > appropriate block device, but it does not do this any longer.
>> > Also,
>> > > > devtmpfs does not create this symbolic link.
>> > > >
>> > > > This is causing bugs with software that depends on the existence
>> > > > of /dev/root [2] for example.
>> > >
>> > > Seems okay to me, although even better would be to use the udev name
>> > > of the device in question.
>> >
>> > I'm not following what you mean.
>> >
>> > The problem is that "/dev/root" should not be in /proc/mounts,
>> > since there is always another entry that points to the root
>> > file system.
>>
>> What gave you that idea?
>>
>> wget http://landley.net/aboriginal/bin/system-image-i686.tar.bz2
>> extract it and ./run-emulator.sh and in there:
>>
>> (i686:1) /home # cat /proc/mounts
>> rootfs / rootfs rw 0 0
>> /dev/root / squashfs ro,relatime 0 0
>> proc /proc proc rw,relatime 0 0
>> sys /sys sysfs rw,relatime 0 0
>> dev /dev devtmpfs rw,relatime,size=63072k,nr_inodes=15768,mode=755 0 0
>> dev/pts /dev/pts devpts rw,relatime,mode=600 0 0
>> /tmp /tmp tmpfs rw,relatime 0 0
>> /home /home tmpfs rw,relatime 0 0
>>
>> Userspace can totally determine what /dev/root points to, I made mdev
>> do it in 2006 (udev started doing so shortly thereafter). Busybox git
>> commit a7e3d052.:4
>
> There are situations where it doesn't work though -- suppose that root
> is btrfs for example.
>
> Also, the other message that answered you is correct, the udev
> maintainers say we should not be relying on /dev/root at all so to make
> it work distro packagers have to add a rule themselves.
>
> Kay,
>
> if you are reading, can you jump in and explain why /dev/root is a bad
> idea?

stat("/") is just the better approach for tools to find out what
"root" is, there is not much point in doing symlinks here just because
the kernel uses that name to mount internally.

/dev/root was never part of the default udev setup, it happened in the
distros init scripts, and only some distributions added that.

Newer filesystems like btrfs do not have an 1:1 fs:device relation
anyway, there cannot be a /dev/root symlink anymore, so tools need to
catch up here anyway, and the sooner the better. /dev/root is a
concept that will probably no longer exist in the future, because
filesystems will work differently than they used to.

As Peter said, the kernel should internally just use the name of the
kernel block device instead of inventing and exporting the name of an
artificial /dev/root node, which does not exist later in the real
system.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] virtio-blk: emit udev event when device is resized

2013-02-25 Thread Kay Sievers

On Mon, Feb 25, 2013 at 11:12 PM, Greg KH  wrote:

> Hm, I thought we were frowning apon running binaries from udev rules
> these days, especially ones that might have big consequences (like
> resizing a disk image) like this.
>
> Kay, am I right?

We removed most of them from the default setups, yes. But there is
nothing wrong if people want to ship that in some package or as custom
rules.

It looks fine to me, we would just not add such things to the default
set of of rules these days.

> We already emit KOBJECT_CHANGE events when block devices change, from
> within the block core code.  Why is the patch below needed instead of
> using these events that are already generated?  How are virtio block
> devices special?

I think we only do that for dm and md and a couple of special cases
like loop and read-only settings.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] virtio-blk: emit udev event when device is resized

2013-02-25 Thread Kay Sievers

On Mon, Feb 25, 2013 at 11:43 PM, Greg KH  wrote:
> On Mon, Feb 25, 2013 at 11:39:52PM +0100, Kay Sievers wrote:
>> On Mon, Feb 25, 2013 at 11:12 PM, Greg KH  wrote:
>>
>> > Hm, I thought we were frowning apon running binaries from udev rules
>> > these days, especially ones that might have big consequences (like
>> > resizing a disk image) like this.
>> >
>> > Kay, am I right?
>>
>> We removed most of them from the default setups, yes. But there is
>> nothing wrong if people want to ship that in some package or as custom
>> rules.
>>
>> It looks fine to me, we would just not add such things to the default
>> set of of rules these days.
>>
>> > We already emit KOBJECT_CHANGE events when block devices change, from
>> > within the block core code.  Why is the patch below needed instead of
>> > using these events that are already generated?  How are virtio block
>> > devices special?
>>
>> I think we only do that for dm and md and a couple of special cases
>> like loop and read-only settings.
>
> What about when we repartition a block device?  I've seen the events
> happen then.

Right, from the common block code we send events for removable media
changes like cdroms, sd cards, when a device is switched to read-only,
and when we re-scan a partition table like on re-partitioning. Most of
the other events are block subsystem-specific like this one. For
things like device-mapper they are used pretty heavily.

> Anyway, if you are ok with this, no objection from my side then Rusty.

Looks fine to me, it should not do any harm if there are not heavy
programs hooked up -- which is nothing the kernel could fix if people
do that. :)

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] pstore: Create a convenient mount point for pstore

2013-02-27 Thread Kay Sievers

On Tue, Feb 12, 2013 at 12:15 AM, Josh Boyer  wrote:
> Using /dev/pstore as a mount point for the pstore filesystem is slightly
> awkward.  We don't normally mount filesystems in /dev/ and the /dev/pstore
> file isn't created automatically by anything.  While this method will
> still work, we can create a persistent mount point in sysfs.  This will
> put pstore on par with things like cgroups and efivarfs.

Mounted by default now, if available:
  
http://cgit.freedesktop.org/systemd/systemd/commit/?id=c06bf414042cd1bf94e0af63e9e2a0c291bfc546

Thanks everybody,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] udevadm-info: Don't access sysfs 'resource' files

2013-03-17 Thread Kay Sievers

On Sun, Mar 17, 2013 at 2:38 PM, Alex Williamson
 wrote:
> I'm assuming that the device only breaks because udevadm is dumping the
> full I/O port register space of the device and that if an actual driver
> was interacting with it through this interface that it would work.  Who
> knows how many devices will have read side-effects by udevadm blindly
> dumping these files.  Thanks,

Sysfs is a too public interface to export things there which make
devices/driver choke on a simple read() of an attribute.

This is nothing specific to udevadm, any tool can do that. Udevadm
will never read any of the files during normal operation. The admin
explicitly asked udevadm with a specific command to dump all the stuff
the device offers.

The kernel driver needs to be fixed to allow that, in the worst case,
the attributes not exported at all. People should take more care what
they export in /sys, it's not a hidden and private ioctl what's
exported there, stuff is very visible and will be looked at.

Telling userspace not to use specific stuff in /sys I would not expect
to work as a strategy; there is too much weird stuff out there that
will always try to do that ...

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] udevadm-info: Don't access sysfs 'resource' files

2013-03-17 Thread Kay Sievers

On Sun, Mar 17, 2013 at 3:20 PM, Myron Stowe  wrote:
> On Sun, 2013-03-17 at 15:00 +0100, Kay Sievers wrote:
>> On Sun, Mar 17, 2013 at 2:38 PM, Alex Williamson
>>  wrote:
>> > I'm assuming that the device only breaks because udevadm is dumping the
>> > full I/O port register space of the device and that if an actual driver
>> > was interacting with it through this interface that it would work.  Who
>> > knows how many devices will have read side-effects by udevadm blindly
>> > dumping these files.  Thanks,
>>
>> Sysfs is a too public interface to export things there which make
>> devices/driver choke on a simple read() of an attribute.
>>
>> This is nothing specific to udevadm, any tool can do that. Udevadm
>> will never read any of the files during normal operation. The admin
>> explicitly asked udevadm with a specific command to dump all the stuff
>> the device offers.
>>
>> The kernel driver needs to be fixed to allow that, in the worst case,
>> the attributes not exported at all. People should take more care what
>> they export in /sys, it's not a hidden and private ioctl what's
>> exported there, stuff is very visible and will be looked at.
>>
>> Telling userspace not to use specific stuff in /sys I would not expect
>> to work as a strategy; there is too much weird stuff out there that
>> will always try to do that ...
>
> Kay - could you comment on Foot Note 3 in
> https://lkml.org/lkml/2013/3/16/168
>
> With respect to 'udev', you are working on the assumption that all files
> in sysfs must be readable with no consequences which may be implied by
> the Documentation's sysfs.txt file's mentioning ASCII.  If we are to
> interpret that as strictly as you seem to want to then why is there
> sysfs support for creating binary files?

They cannot be distinguished from outside, so there is nothing I know
that could make a difference to userspace tools.

Tools -- no matter how useful they are not not, it's that they do that
for many years already -- need to be able to read() the stuff in
there, without causing any damage to the system.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] udevadm-info: Don't access sysfs 'resource' files

2013-03-17 Thread Kay Sievers

On Sun, Mar 17, 2013 at 3:36 PM, Myron Stowe  wrote:
> On Sun, 2013-03-17 at 15:29 +0100, Kay Sievers wrote:
>> On Sun, Mar 17, 2013 at 3:20 PM, Myron Stowe  wrote:
>> > On Sun, 2013-03-17 at 15:00 +0100, Kay Sievers wrote:
>> >> On Sun, Mar 17, 2013 at 2:38 PM, Alex Williamson
>> >>  wrote:
>> >> > I'm assuming that the device only breaks because udevadm is dumping the
>> >> > full I/O port register space of the device and that if an actual driver
>> >> > was interacting with it through this interface that it would work.  Who
>> >> > knows how many devices will have read side-effects by udevadm blindly
>> >> > dumping these files.  Thanks,
>> >>
>> >> Sysfs is a too public interface to export things there which make
>> >> devices/driver choke on a simple read() of an attribute.
>> >>
>> >> This is nothing specific to udevadm, any tool can do that. Udevadm
>> >> will never read any of the files during normal operation. The admin
>> >> explicitly asked udevadm with a specific command to dump all the stuff
>> >> the device offers.
>> >>
>> >> The kernel driver needs to be fixed to allow that, in the worst case,
>> >> the attributes not exported at all. People should take more care what
>> >> they export in /sys, it's not a hidden and private ioctl what's
>> >> exported there, stuff is very visible and will be looked at.
>> >>
>> >> Telling userspace not to use specific stuff in /sys I would not expect
>> >> to work as a strategy; there is too much weird stuff out there that
>> >> will always try to do that ...
>> >
>> > Kay - could you comment on Foot Note 3 in
>> > https://lkml.org/lkml/2013/3/16/168
>> >
>> > With respect to 'udev', you are working on the assumption that all files
>> > in sysfs must be readable with no consequences which may be implied by
>> > the Documentation's sysfs.txt file's mentioning ASCII.  If we are to
>> > interpret that as strictly as you seem to want to then why is there
>> > sysfs support for creating binary files?
>>
>> They cannot be distinguished from outside, so there is nothing I know
>> that could make a difference to userspace tools.
>
> Agreed
>>
>> Tools -- no matter how useful they are not not, it's that they do that
>> for many years already -- need to be able to read() the stuff in
>> there, without causing any damage to the system.
>
> So then, why are certain sysfs files skipped in udevadm-info's parsing
> (./src/udevadm-info.c::skip_attribute())?

Because they are not useful to use in udev rules, or are just not
recommended to use in rules because they break other assumptions and
would encode specific settings, which can rightfully change at
runtime, into rules.

The list is in no way a list to ensure a system/driver/device is not
choking on read().

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] cgroup TODOs

2012-09-14 Thread Kay Sievers

On Fri, Sep 14, 2012 at 9:29 PM, Tejun Heo  wrote:
> On Fri, Sep 14, 2012 at 09:58:30AM -0400, Vivek Goyal wrote:
>> I am little concerned about above and wondering how systemd and libvirt
>> will interact and behave out of the box.
>>
>> Currently systemd does not create its own hierarchy under blkio and
>> libvirt does. So putting all together means there is no way to avoid
>> the overhead of systemd created hierarchy.
>>
>> \
>> |
>> +- system
>>  |
>>  +- libvirtd.service
>>   |
>>   +- virt-machine1
>>   +- virt-machine2
>>
>> So there is now way to avoid the overhead of two levels of hierarchy
>> created by systemd. I really wish that systemd gets rid of "system"
>> cgroup and puts services directly in top level group. Creating deeper
>> hieararchices is expensive.

The idea here is to split equally between the "system" and the "user"s
at that level.

That all can be re-considered and changed if really needed, but it's
not an unintentionally created directory.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

2012-10-03 Thread Kay Sievers

On Wed, Oct 3, 2012 at 12:12 AM, Greg KH  wrote:

> Mauro, what version of udev are you using that is still showing this
> issue?
>
> Kay, didn't you resolve this already?  If not, what was the reason why?

It's the same in the current release, we still haven't wrapped our
head around how to fix it/work around it.

Unlike what the heated and pretty uncivilized and rude emails here
claim, udev does not dead-lock or "break" things, it's just "slow".
The modprobe event handling runs into a ~30 second event timeout.
Everything is still fully functional though, there's only this delay.

Udev ensures full dependency resolution between parent and child
events. Parent events have to finish the event handling and have to
return, before child event handlers are started. We need to ensure
such things so that (among other things) disk events have finished
their operations before the partition events are started, so they can
rely and access their fully set up parent devices.

What happens here is that the module_init() call blocks in a userspace
transaction, creating a child event that is not started until the
parent event has finished. The event handler for modprobe times out
then the child event loads the firmware.

Having kernel module relying on a running and fully functional
userspace to return from module_init() is surely a broken driver
model, at least it's not how things should work. If userspace does not
respond to firmware requests, module_init() locks up until the
firmware timeout happens.

This all is not so much about how probe() should behave, it's about a
fragile dependency on a specific userspace transaction to link a
loadable module into the kernel. Drivers should avoid such loops for
many reasons. Also, it's unclear in many cases how such a model should
work at all if the module is compiled in and initialized when no
userspace is running.

If that unfortunate module_init() lockup can't be solved properly in
the kernel, we need to find out if we need to make the modprobe
handling in udev async, or let firmware events bypass dependency
resolving. As mentioned, we haven't decided as of now which road to
take here.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

2012-10-03 Thread Kay Sievers

On Wed, Oct 3, 2012 at 6:57 PM, Greg KH  wrote:

>> It's the same in the current release, we still haven't wrapped our
>> head around how to fix it/work around it.
>
> Ick, as this is breaking people's previously-working machines, shouldn't
> this be resolved quickly?

Nothing really "breaks", It's "slow" and it will surely be fixed when
we know what's the right fix, which we haven't sorted out at this
moment.

> module_init() can do lots of "bad" things, sleeping, asking for
> firmware, and lots of other things.  To have userspace block because of
> this doesn't seem very wise.

Not saying that it is right or nice, but it's the kernel itself that
blocks. Run init=/bin/sh and do a modprobe of one of these drivers and
it hangs un-interruptible until the kernel's internal firmware loading
request times out, just because userspace is not there.

> But previously this all "just worked" as we ran 'modprobe' in a new
> thread/process right?

No, we used to un-queue events which had a timeout specified in the
environment, that code caused other issues and was removed.

> it can do without worrying about stopping anything else in the system that 
> might
> want to happen at the same time (like load multiple modules in a row).

It should not be an issue, the serialization is strictly parent <->
child, everything else runs in parallel.

>> If that unfortunate module_init() lockup can't be solved properly in
>> the kernel, we need to find out if we need to make the modprobe
>> handling in udev async, or let firmware events bypass dependency
>> resolving. As mentioned, we haven't decided as of now which road to
>> take here.
>
> It's not a lockup, there have never been rules about what a driver could
> and could not do in its module_init() function.  Sure, there are some
> not-nice drivers out there, but don't halt the whole system just because
> of them.

It is a kind of lock up, just try modprobe with the init=/bin/sh boot.

> I recommend making module loading async, like it used to be, and then
> all should be fine, right?

That's the current idea, and Tom is looking into it how it could look like.

I also have no issues at all if the kernel does load the firmware from
the filesystem on its own; it sounds like the simplest and most robust
solution from a general look at the problem. It would also make the
difference between in-kernel firmware and out-of-kernel firmware less
visible, which sounds good.
Honestly, requiring firmware-loading userspace-transactions to
successfully link a module into the kernel sounds like a pretty bad
idea to start with. Unlike module loading which needs the depmod alias
database and userspace configuration; with firmware loading, there is
no policy involved where userspace would add any single additional
value to that step.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

2012-10-03 Thread Kay Sievers

On Wed, Oct 3, 2012 at 10:39 PM, Linus Torvalds
 wrote:
> On Wed, Oct 3, 2012 at 12:50 PM, Greg KH  wrote:
>>>
>>> Ok, like this?
>>
>> This looks good to me.  Having udev do firmware loading and tieing it to
>> the driver model may have not been such a good idea so many years ago.
>> Doing it this way makes more sense.
>
> Ok, I wish this had been getting more testing in Linux-next or
> something, but I suspect that what I'll do is to commit this patch
> asap, and then commit another patch that turns off udev firmware
> loading entirely for the synchronous firmware loading case.
>
> Why? Just to get more testing, and seeing if there are reports of
> breakage. Maybe some udev out there has a different search path (or
> because udev runs in a different filesystem namespace or whatever), in
> which case running udev as a fallback would otherwise hide the fact
> that he direct kernel firmware loading isn't working.

> Ok? Comments?

The current udev directory search order is:
  /lib/firmware/updates/$(uname -r)/
  /lib/firmware/updates/
  /lib/firmware/$(uname -r)/
  /lib/firmware/

There is no commonly known /firmware directory.

http://cgit.freedesktop.org/systemd/systemd/tree/src/udev/udev-builtin-firmware.c#n100
http://cgit.freedesktop.org/systemd/systemd/tree/configure.ac#n548

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

2012-10-03 Thread Kay Sievers

On Wed, Oct 3, 2012 at 11:05 PM, Greg KH  wrote:

> As for the firmware path, maybe we should
> change that to be modified by userspace (much like /sbin/hotplug was) in
> a proc file so that distros can override the location if they need to.

If that's needed, a CONFIG_FIRMWARE_PATH= with the array of locations
would probably be sufficient.

Like udev's defaults here:
  http://cgit.freedesktop.org/systemd/systemd/tree/configure.ac#n550

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

2012-10-03 Thread Kay Sievers

On Thu, Oct 4, 2012 at 12:58 AM, Linus Torvalds
 wrote:
> That said, there's clearly enough variation here that I think that for
> now I won't take the step to disable the udev part. I'll do the patch
> to support "direct filesystem firmware loading" using the udev default
> paths, and that hopefully fixes the particular case people see with
> media modules.

If that approach looks like it works out, please aim for full
in-kernel-*only* support. I would absolutely like to get udev entirely
out of the sick game of firmware loading here. I would welcome if we
are not falling back to the blocking timeouted behaviour again.

The whole story would be contained entirely in the kernel, and we get
rid of the rather fragile "userspace transaction" to execute
module_init(), where the kernel has no idea if userspace is even up to
ever responding to its requests.

There would be no coordination with userspace tools needed, which
sounds like a better fit in the way we develop things with the loosely
coupled kernel <-> udev requirements.

If that works out, it would a bit like devtmpfs which turned out to be
very simple, reliable and absolutely the right thing we could do to
primarily mange /dev content.

The whole dance with the fake firmware struct device, which has a 60
second timeout to wait for userspace, is a long story of weird
failures at various aspects.

It would not only solve the unfortunate modprobe lockup with
init=/bin/sh we see here, also big servers with an insane amount of
devices happen to run into the 60 sec timeout, because udev, which
runs with 4000-8000 threads in parallel handling things like 30.000
disks is not scheduled in time to fulfill network card firmware
requests. It would be nice if we don't have that arbitrary timeout at
all.

Having any timeout at all to answer the simple question if a file
stored in the rootfs exists, should be a hint that there is something
really wrong with the model that stuff is done.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] vlan: set sysfs device_type to 'vlan'

2012-10-23 Thread Kay Sievers

On Tue, Oct 23, 2012 at 8:36 AM, David Miller  wrote:
> From: Doug Goldstein 
> Date: Mon, 22 Oct 2012 00:53:57 -0500
>
>> Sets the sysfs device_type to 'vlan' for udev. This makes it easier for
>> applications that query network information via udev to identify vlans
>> instead of using strrchr().
>>
>> Signed-off-by: Doug Goldstein 
>
> You're extremely misguided.  This change, in fact, makes it ten times
> harder for such applications to query such devices.

That makes not much sense, really. Every new interface would fall into
that category. At least I can't see any mis-guidance here. The other
devtypes for the major netif types are not that much older.

> Because now the application has to decide whether it wants to support
> EVERY EXISTING SYSTEM OUT THERE or not.  Hundreds of millions of Linux
> systems do not provide this attribute.
>
> Applications have to handle the case of not having the 'vlan' device
> type available attribute essentially forever.

Which is an entirely separate issue, and not a technical reason not to
add new interfaces which are already in use for most other types of
netifs.

> So providing it in new kernels provides zero value whatsoever.

It sure does provide a value. The kernel can efficiently filter
uevents in the socket with this available. All other major types of
netdevs support that too, it's just a matter of completeness. For that
reason, it looks useful to me.

> I'm not applying this patch, sorry.

That's just sad. Not that I really care about that functionality, but
your reasoning is absolutely not transparent.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] printk: drop ambiguous LOG_CONT flag

2012-11-01 Thread Kay Sievers

On Mon, Oct 8, 2012 at 9:56 PM, Kay Sievers  wrote:
> On Mon, Oct 8, 2012 at 9:54 PM, "Jan H. Schönherr"

>>> Jan,
>>> any updates, did you try something else?
>>> Or should we merge the first version for now?
>>
>> I'm working on it, though I cannot spend as much time as I want. :)
>>
>> My current version does mostly well for race-printk()s, now. But
>> there's still one issue to resolve and some polishing to do.
>>
>> If we can afford to wait a little longer, we might get a nicer
>> solution (and avoid a possible mostly-revert later).
>
> Sure, no hurry, I was just going through my TODO emails. :)

My TODO is nagging me again. :)

Any updates? No problems if not, but we should merge the current
version then, I think.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG?] staging: zram: Add auto loading of module if user opens /dev/zram.

2013-09-09 Thread Kay Sievers

On Mon, Sep 9, 2013 at 5:58 PM, Tom Gundersen  wrote:
> Hi Konrad,
>
> The above commit (c70bda9 in mainline) doesn't appear to work for me.
> I.e., depmod does not create an entry in modules.devname and hence no
> device node is created on boot.
>
> If I understand correctly, you'd also need to create the correct
> "char-major--" alias. But I don't really see how this is
> meant to work as I thought zram only created /dev/zramX type device
> nodes, and not /dev/zram, or am I missing something?

Please just remove it. "devname" is meant to be used for
single-instance devices with a static dev_t, never for things like
zramX.

It will not do anything useful here, it does nothing really without a
statically assigned dev_t, and it should not be used for devices of
this kind anyway.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] usb: ohci/uhci - add soft dependencies on ehci_hcd

2013-09-10 Thread Kay Sievers

On Tue, Sep 10, 2013 at 7:02 PM, Alan Stern  wrote:

> Where is MODULE_SOFTDEP defined?  It isn't mentioned in any .h files in
> my kernel tree.

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7cb14ba75d57910cc4b62115dd5db7bd83c93684

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 3/3] Send cgroup_path in SCM_CGROUP

2013-09-02 Thread Kay Sievers

On Thu, Aug 29, 2013 at 4:13 PM, Jan Kaluza  wrote:
> Add new SCM type called SCM_CGROUP to send "cgroup_path" in SCM.
> This is useful for journald (systemd logging daemon) to get additional context
> with each log line received using UNIX socket.
>
> Signed-off-by: Jan Kaluza 

In many cases it's generally more useful to explain *why* something is
done, not *where* it is used. It makes it easier for people to match
the described problem to their own use-cases, where it possibly occurs
too. The problem this patch solves is very generic and not so much
specific to logging or the journal. Maybe something like this:

"Server-like processes in many cases need credentials and other
metadata of the peer, to decide if the calling process is allowed to
request a specific action, or the server just wants to log away this
type of information for auditing tasks.

The current practice to retrieve such process metadata is to look that
information up in procfs with the $PID received over SCM_CREDENTIALS.
This is sufficient for long-running tasks, but introduces a race which
cannot be worked around for short-living processes; the calling
process and all the information in /proc/$PID/ is gone before the
receiver of the socket message can look it up.

This introduces a new SCM type called SCM_CGROUP to allow the direct
attaching of "cgroup_path" to SCM, which is significantly more
efficient and will reliably avoid the race with the round-trip over
procfs."

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] printk: drop ambiguous LOG_CONT flag

2012-10-08 Thread Kay Sievers

On Fri, Sep 28, 2012 at 4:56 PM, Kay Sievers  wrote:
> On Fri, Sep 28, 2012 at 4:49 PM, "Jan H. Schönherr"

>> Given that I'm able to fix the racing case, would you be in favor of
>> this approach, or should we stick to the earlier version?
>
> I'm open to everything that makes sense. Let's see how it looks and we
> can decide when we have something that passes the tests.

Jan,
any updates, did you try something else?
Or should we merge the first version for now?

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] printk: drop ambiguous LOG_CONT flag

2012-10-08 Thread Kay Sievers

On Mon, Oct 8, 2012 at 9:54 PM, "Jan H. Schönherr"
 wrote:
> Am 08.10.2012 21:24, schrieb Kay Sievers:
>> On Fri, Sep 28, 2012 at 4:56 PM, Kay Sievers  wrote:
>>> On Fri, Sep 28, 2012 at 4:49 PM, "Jan H. Schönherr"
>>
>>>> Given that I'm able to fix the racing case, would you be in favor of
>>>> this approach, or should we stick to the earlier version?
>>>
>>> I'm open to everything that makes sense. Let's see how it looks and we
>>> can decide when we have something that passes the tests.
>>
>> Jan,
>> any updates, did you try something else?
>> Or should we merge the first version for now?
>
> I'm working on it, though I cannot spend as much time as I want. :)
>
> My current version does mostly well for race-printk()s, now. But
> there's still one issue to resolve and some polishing to do.
>
> If we can afford to wait a little longer, we might get a nicer
> solution (and avoid a possible mostly-revert later).

Sure, no hurry, I was just going through my TODO emails. :)

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/12] printk() fixes, optimizations, and clean ups

2012-11-15 Thread Kay Sievers

On Wed, 2012-11-14 at 00:12 +0100, Jan H. Schönherr wrote:
> Hi Greg, hi Kay, and all other interested people.
> 
> This series aims at cleaning up and fixing some bugs around printk().
> Patches 9 and 11 might require some discussion, see below.

This is how current git looks like:
[1.062953] input: ImExPS/2 Generic Explorer Mouse as 
/devices/platform/i8042/serio1/input/input2
[1.090595] List of all partitions:
[1.091703] 0800   117220824 sda  driver: sd
[1.093054]   0801 1048576 sda1 1fcbc57f-4bfc-4c2b-91a3-9c84fbcd9af1
[1.094662]   08023072 sda2 084917b7-8be2-4e86-838d-f771a9902e08
[1.096624]   08031536 sda3 180053b6-6697-4f4c-b331-4925773197ff
[1.098624]   080454730752 sda4 b67dfc6e-d06f-4b11-bd52-96c09163aca9
[1.100304]   08051536 sda5 6d0d537c-3107-4057-a57b-5379a0cd8e31
[1.101918] No filesystem could mount root, tried:  btrfs
[1.103633] Kernel panic - not syncing: VFS: Unable to mount root fs on 
unknown-block(8,1)


This with all your patches applied:
[1.032804] input: ImExPS/2 Generic Explorer Mouse as 
/devices/platform/i8042/serio1/input/input2
[1.063915] List of all partitions:
[1.065521] 0800   117220824 sda  driver: sd
[1.067444]   0801 1048576 sda1 1fcbc57f-4bfc-4c2b-91a3-9c84fbcd9af1 
 08023072 sda2 084917b7-8be2-4e86-838d-f771a9902e08  0803
1536 sda3 180053b6-6697-4f4c-b331-4925773197ff  080454730752 sda4 
b67dfc6e-d06f-4b11-bd52-96c09163aca9  08051536 sda5 
6d0d537c-3107-4057-a57b-5379a0cd8e31No filesystem could mount root, tried:  
btrfs
[1.077120] Kernel panic - not syncing: VFS: Unable to mount root fs on 
unknown-block(8,1)


Something seems broken in the patches regarding the console or newline logic.

Thanks,
Kay

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/12] printk() fixes, optimizations, and clean ups

2012-11-15 Thread Kay Sievers

On Thu, Nov 15, 2012 at 10:22 PM, "Jan H. Schönherr"
 wrote:
> Am 15.11.2012 17:05, schrieb Kay Sievers:
>> This with all your patches applied:
>> [1.032804] input: ImExPS/2 Generic Explorer Mouse as 
>> /devices/platform/i8042/serio1/input/input2
>> [1.063915] List of all partitions:
>> [1.065521] 0800   117220824 sda  driver: sd
>> [1.067444]   0801 1048576 sda1 
>> 1fcbc57f-4bfc-4c2b-91a3-9c84fbcd9af1  08023072 sda2 
>> 084917b7-8be2-4e86-838d-f771a9902e08  08031536 sda3 
>> 180053b6-6697-4f4c-b331-4925773197ff  080454730752 sda4 
>> b67dfc6e-d06f-4b11-bd52-96c09163aca9  08051536 sda5 
>> 6d0d537c-3107-4057-a57b-5379a0cd8e31No filesystem could mount root, tried:  
>> btrfs
>> [1.077120] Kernel panic - not syncing: VFS: Unable to mount root fs on 
>> unknown-block(8,1)
>>
>> Something seems broken in the patches regarding the console or newline logic.
>
> (Just to mention it: I did do quite some testing, but this case must have
> escaped me.)

Yeah, don't tell me, it was not really fun to discover all the weird
edges here because nobody wants to have rules. :)

> Please, try the patch below on top of everything. If this works, I'll prepare
> a -v2 where everything is sorted into the correct patches.

Looks all fine now.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] devtmpfs: mount with noexec and nosuid

2012-11-16 Thread Kay Sievers

On Sat, Nov 17, 2012 at 1:27 AM, Greg Kroah-Hartman
 wrote:
> On Fri, Nov 16, 2012 at 04:20:16PM -0800, Kees Cook wrote:
>> Since devtmpfs is writable, make the default noexec nosuid as well. This
>> protects from the case of a privileged process having an arbitrary file
>> write flaw and an argumentless arbitrary execution (i.e. it would lack
>> the ability to run "mount -o remount,exec,suid /dev"), with a system
>> that already has nosuid,noexec on all other writable mounts.
>>
>> Cc: ellyjo...@chromium.org
>> Signed-off-by: Kees Cook 
>> ---
>>  drivers/base/devtmpfs.c |6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> Have you tested this to verify that it doesn't break anything?
>
> Kay, could this cause any problems that you could think of?

It breaks all sorts of old, possibly outdated, stuff, that does things
like mapping /dev/mem executable. It for sure used to break X drivers,
that fiddle with the BIOS of cards.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: cgroup: status-quo and userland efforts

2013-07-02 Thread Kay Sievers

On Wed, Jul 3, 2013 at 1:57 AM, Thomas Gleixner  wrote:
> On Sun, 30 Jun 2013, Lennart Poettering wrote:
>> On 29.06.2013 05:05, Tim Hockin wrote:
>> > But that's not my point.  It seems pretty easy to make this cgroup
>> > management (in "native mode") a library that can have either a thin
>> > veneer of a main() function, while also being usable by systemd.  The
>> > point is to solve all of the problems ONCE.  I'm trying to make the
>> > case that systemd itself should be focusing on features and policies
>> > and awesome APIs.
>>
>> You know, getting this all right isn't easy. If you want to do things
>> properly, then you need to propagate attribute changes between the units you
>> manage. You also need something like a scheduler, since a number of
>> controllers can only be configured under certain external conditions (for
>> example: the blkio or devices controller use major/minor parameters for
>> configuring per-device limits. Since major/minor assignments are pretty much
>> unpredictable these days -- and users probably want to configure things with
>> friendly and stable /dev/disk/by-id/* symlinks anyway -- this requires us to
>> wait for devices to show up before we can configure the parameters.) Soo...
>> you need a graph of units, where you can propagate things, and schedule 
>> things
>> based on some execution/event queue. And the propagation and scheduling are
>> closely intermingled.
>
> you are confusing policy and mechanisms.
>
> The access to cgroupfs is mechanism.
>
> The propagation of changes, the scheduling of cgroupfs access and
> the correlation to external conditions are policy.
>
> What Tim is asking for is to have a common interface, i.e. a library
> which implements the low level access to the cgroupfs mechanism
> without imposing systemd defined policies to it (It might implement a
> set of common useful policies, but that's a different discussion).
>
> That's definitely not an unreasonable request, because he wants to
> implement his own set of policies which are not necessarily the same
> as those which are implemented by systemd.
>
> You are simply ignoring the fact, that Linux is used in other ways
> than those which you are focussed on. That's true for Google's way to
> manage its gazillion machines and that's equally true for the other
> end of the spectrum which is deep embedded or any other specialized
> use case. Just face it: running Linux on your laptop and on some RHT
> lab machines is covering about 1% of the use cases.
>
> Nevertheless you repeatedly claim, that systemd is the only way to
> deal with system startup and system management, is covering _ALL_ use
> cases and the interfaces you expose are sufficient.
>
> Did you ever work on specialized embedded or big data use cases? I
> really doubt that, but I might be wrong as usual.
>
> So I invite you to prove that you can beat an existing setup for an
> automotive use case with your magic systemd foo. I refund you fully,
> if you can beat the mark of a functional system less than 800ms after
> reset release on a 200MHz ARM machine. Functional is defined by the
> use case requirements and means:
>
> - Basic cgroups management working
> - GUI up and running
> - Main communication interface (CAN bus) up and running
>
> The rest of the system is starting up after that including a more
> complex cgroup management.
>
> According to your claim that systemd is covering everything and some
> more, this should take you a few hours. I grant you a full week to
> work on that.
>
> The use case Tim is talking about is different, but has similar
> constraints which are completely driven by his particular use case
> scenario. I'm sure, that Tim can persuade his management to setup a
> similar contest to prove your expertise on the other extreme of the
> Linux world.
>
> Before answering please think about the relevance of your statements
> "getting this all right isn't easy", "something like a scheduler",
> "users probably want ..."  and "stable /dev/disk/by-id/* symlinks" in
> those contexts.

I don't think anybody needs your money.

But it's sure an improvement over last time when you wanted to use a
"Kantholz" to make your statement.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] proc: make high precision system boot time available

2013-06-30 Thread Kay Sievers

On Sat, Jun 29, 2013 at 10:53 PM, Sami Kerola  wrote:

> BTW having a way to measure effect of suspend/resume could lead to a
> way to fix time time distortion.

> Perhaps there is better alternative to fix user space programs.
> Unfortunately I do not have either knowledge, or imagination, to come
> up with any. Fix hints, ideas, and other proposal are most welcome.

Stuff should just use:
  clock_gettime(CLOCK_BOOTTIME, &ts);

to get the time in better resolution, and with suspend time taken into account?

As a general note, the kmsg/printk() time in the raw kernel log is the
time of the CPU it runs on, especially during early init this time
might not be in sync across CPUs and stuff jumps event in the very
same stream on events from the kernel itself:
  
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/kernel/sched/clock.c#n329

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Re: [RFC PATCH 0/5] Add a hash value for each line in /dev/kmsg

2013-07-29 Thread Kay Sievers

On Mon, Jul 29, 2013 at 1:54 PM, Hidehiro Kawai
 wrote:

> Also, I heard about the discussion
> at the kernel summit 2 years ago.  According to the article of LWN,
> it seems that Linus objected your approach (i.e. adding random bit as
> message ID).  Were there some agreements on this issue at the kernel summit?

No, there are no further discussions about this.

Pre-allocated, static, randomly created 128-bit IDs are just the
simplest and most robust option to identify messages. It's an
unmanaged namespace that needs no coordination, the IDs are always
stable, never change and are guaranteed to be unique. None of the
hashing-of-the strings solutions can provide that by default.

I would expect that over time, the automatic hashes would end up
becoming static numbers explicitly add to the messages anyway, because
changing the message text will change the hash, which is nothing we
really want to deal with. For that reason, I think that we can add the
ID right away, without any of the hashing; and do that only for a very
tiny fraction of the messages where such IDs make sense and add value.

Message IDs is how userspace logging works today; so from the
userspace side this would fit into the already existing
infrastructure, while possibly changing hashes which require another
type of translation catalog would not.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [systemd-devel] Slow firmware timeouts again (Re: [3.11 regression?] iwlwifi firmware takes two minutes to load)

2013-08-05 Thread Kay Sievers

On Fri, Aug 2, 2013 at 6:28 PM, Zbigniew Jędrzejewski-Szmek
 wrote:
> On Fri, Aug 02, 2013 at 09:04:44AM -0700, Andy Lutomirski wrote:
>> CONFIG_FW_LOADER_USER_HELPER=y
> Do you need this? Unsetting this should help.
>
> """This option enables / disables the invocation of user-helper
> (e.g. udev) for loading firmware files as a fallback after the
> direct file loading in kernel fails. The user-mode helper is
> no longer required unless you have a special firmware file that
> resides in a non-standard path."""

On recent systems, if the kernel configures
CONFIG_FW_LOADER_USER_HELPER=y and a firmware is not found by the
kernel, the kernel will issue a request which is ignored by userspace
and will block in that for 60 seconds.

Udev is no longer in the game of firmware loading, not even as a
fallback, it will just completely ignore all kernel firmware class
events.

The source code in udev to handle firmware requests is disabled by
default, currently still kept around for old kernels without the
in-kernel firmware loader, but it will be deleted in the near future.

Any issues with firmware timeouts should be addressed in the kernel by
disabling CONFIG_FW_LOADER_USER_HELPER or by removing the fallback
code from the in-kernel loader.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: udev: New default rule for autoloading kernel modules matching CPU modalias

2013-07-20 Thread Kay Sievers

On Sat, Jul 20, 2013 at 12:56 PM, Rafael J. Wysocki  wrote:

> After a recent change present in 3.11-rc1 there is a driver, called processor,
> that can be bound to the CPU devices whose sysfs directories are located under
> /sys/devices/system/cpu/.  A side effect of this is that, after the driver has
> been bound to those devices, the kernel adds DRIVER=processor to ENV for CPU
> uevents and they don't match the default rule for autoloading modules matching
> MODALIAS:
>
> DRIVER!="?*", ENV{MODALIAS}=="?*", IMPORT{builtin}="kmod load $env{MODALIAS}"
>
> any more.  However, there are some modules whose module aliases match specific
> CPU features through the modalias string and those modules should be loaded
> automatically if a compatible CPU is present.  Yet, with the processor driver
> bound to the CPU devices the above rule is not sufficient for that, so we need
> a new default udev rule allowing those modules to be autoloaded even if the
> CPU devices have drivers.
>
> On my test systems I added the following rule for that:
>
> ACTION="add", SUBSYSTEM=="cpu", ENV{MODALIAS}=="?*", IMPORT{builtin}="kmod 
> load $env{MODALIAS}"
>
> in a separate file, but I'm not a udev expert, so I guess it may be done in a
> better way.
>
> Can you please consider adding such a rule to the default set of udev rules?

The DRIVER!="?*" is an optimization which made a huge difference at
the time we called out to /sbin/modprobe from udev rules and all the
devices which already had a driver would not cause needless execution
of the rather expensive modprobe binary.

This obviously can't do the right thing for the completely generic,
not bound to a driver-state, CPU modaliases when they have a driver
now. :)

These days we load the entire kmod modalias database into the udev
process, and lookup what we need to load and load the module from
within the udev worker process. It could be, that the optimization is
not measurable on today's systems, if that's the case I'll remove it,
which would be the simplest and most future proof option. Otherwise
I'll add the CPU specific rule.

I'll find that out and let you know.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: udev: New default rule for autoloading kernel modules matching CPU modalias

2013-07-20 Thread Kay Sievers

On Sat, Jul 20, 2013 at 1:47 PM, Kay Sievers  wrote:
> On Sat, Jul 20, 2013 at 12:56 PM, Rafael J. Wysocki  wrote:
>
>> After a recent change present in 3.11-rc1 there is a driver, called 
>> processor,
>> that can be bound to the CPU devices whose sysfs directories are located 
>> under
>> /sys/devices/system/cpu/.  A side effect of this is that, after the driver 
>> has
>> been bound to those devices, the kernel adds DRIVER=processor to ENV for CPU
>> uevents and they don't match the default rule for autoloading modules 
>> matching
>> MODALIAS:
>>
>> DRIVER!="?*", ENV{MODALIAS}=="?*", IMPORT{builtin}="kmod load $env{MODALIAS}"
>>
>> any more.  However, there are some modules whose module aliases match 
>> specific
>> CPU features through the modalias string and those modules should be loaded
>> automatically if a compatible CPU is present.  Yet, with the processor driver
>> bound to the CPU devices the above rule is not sufficient for that, so we 
>> need
>> a new default udev rule allowing those modules to be autoloaded even if the
>> CPU devices have drivers.
>>
>> On my test systems I added the following rule for that:
>>
>> ACTION="add", SUBSYSTEM=="cpu", ENV{MODALIAS}=="?*", IMPORT{builtin}="kmod 
>> load $env{MODALIAS}"
>>
>> in a separate file, but I'm not a udev expert, so I guess it may be done in a
>> better way.
>>
>> Can you please consider adding such a rule to the default set of udev rules?
>
> The DRIVER!="?*" is an optimization which made a huge difference at
> the time we called out to /sbin/modprobe from udev rules and all the
> devices which already had a driver would not cause needless execution
> of the rather expensive modprobe binary.
>
> This obviously can't do the right thing for the completely generic,
> not bound to a driver-state, CPU modaliases when they have a driver
> now. :)
>
> These days we load the entire kmod modalias database into the udev
> process, and lookup what we need to load and load the module from
> within the udev worker process. It could be, that the optimization is
> not measurable on today's systems, if that's the case I'll remove it,
> which would be the simplest and most future proof option. Otherwise
> I'll add the CPU specific rule.
>
> I'll find that out and let you know.

I cannot see any measurable difference here that justifies that
optimization, so I removed it:
  
http://cgit.freedesktop.org/systemd/systemd/commit/?id=bf7f800f2b3e93ccd1229d4717166f3a4d3af72f

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] Drivers: base: memory: Export symbols for onlining memory blocks

2013-07-25 Thread Kay Sievers

On Thu, Jul 25, 2013 at 5:03 PM, Dave Hansen  wrote:
> On 07/25/2013 04:14 AM, KY Srinivasan wrote:
>> As promised, I have sent out the patches for (a) an implementation of an 
>> in-kernel API
>> for onlining  and a consumer for this API. While I don't know the exact 
>> reason why the
>> user mode code is delayed (under some low memory conditions), what is the 
>> harm in having
>> a mechanism to online memory that has been hot added without involving user 
>> space code.
>
> KY, your potential problem, not being able to online more memory because
> of a shortage of memory, is a serious one.
>
> However, this potential issue exists in long-standing code, and
> potentially affects all users of memory hotplug.  The problem has not
> been described in sufficient detail for the rest of the developers to
> tell if you are facing a new problem, or whether *any* proposed solution
> will help the problem you face.
>
> Your propsed solution changes the semantics of existing user/kernel
> interfaces, duplicates existing functionality, and adds code complexity
> to the kernel.

Complexity, well, it's just a bit of code which belongs in the kernel.
The mentioned unconditional hotplug loop through userspace is
absolutely pointless. Such defaults never belong in userspace tools if
they do not involve data that is only available in userspace and
something would make a decision about that. Saying "hello" to
userspace and usrspace has a hardcoded "yes" in return makes no sense
at all. The kernel can just go ahead and do its job, like it does for
all other devices it finds too.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/5] Add a hash value for each line in /dev/kmsg

2013-07-26 Thread Kay Sievers

On Wed, Jul 3, 2013 at 3:46 AM, Hidehiro Kawai
 wrote:

> This patch series adds hash values of printk format strings into
> each line of /dev/kmsg outputs as follows:
>
> 6,154,325061,-,b7db707c@kernel/smp.c:554;Brought up 4 CPUs

/dev/kmsg is to a certain degree a kernel ABI. Having source code
locations in exported log records might cause people / userspace tools
to rely on these strings and expect stability here. The kernel though
cannot make any promises of its source code layout.

The hash is supposed to identify the content of a message, but what if
someone fixes the string? Maybe someone just fixes a one char typo,
the hash will change and the message will not be recognizable any
more.

As much as "automated" hash creation sounds simple; I really think
adding explicit "manually" created random message ids to the bunch of
messages that are interesting is the better option long-term. It
shouldn't be that many messages, most of the printk output is not
really useful for automated inspection or to trigger specific actions.

Messages ideally should not have any direct context to the code that
emits them, they should just identify the message and add a few
structured properties to the message.

This is how userspace identifies log messages and maintains abstract
descriptions of the specific messages to act when they appear:
  http://www.freedesktop.org/wiki/Software/systemd/catalog/

Connecting kernel log message texts and source code locations with
message identifiers looks quite dangerous, hard to keep stable if
things are evolving. It might cause serious problems over time.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] udev: fail firmware loading immediately if no search path is defined

2013-08-10 Thread Kay Sievers

On Sat, Aug 10, 2013 at 11:00 PM, Tom Gundersen  wrote:
> It would be simple enough to add an udev rule to just print 'ignoring
> firmware event' to the logs.

This and I guess:
  SUBSYSTEM=="firmware", ACTION=="add", ATTR{loading}="-1"
would also just cancel the request at the same time without any other
code needed.

The udev firmware support just a configure option, just like the
kernel has them. So distributions should enable it in udev and the
kernel if they need it.

We simply cannot coordinate the defaults of systemd and the kernel
because the rules of the kernel are different. The kernel does "keep
defaults like stuff has been in the past" and udev does "make default
what makes the most sense on current systems".

> We should really ignore the event though, and
> not cancel it. Not sure if this is something we want upstream (after all,
> there are plenty of situations where we don't warn if the recommendations in
> the README file are not followed), or if distros and whoever wants it should
> ship that themselves. I'll leave that for Kay to decide.

The proper fix is that userspace firmware should be disabled in the
kernel for new systems, and kept enabled only for old systems. Old
systems need to enable a new udev version to support firmware loading.

There are currently broken in-kernel mis-users of the firmware
interface that use the firmware interface but disable uevents, they
still pull-in the user interface of the firmware loader. If nobody
wants to fix them, the code for the common users of the firmware
loader should at least get rid of the userspace fallback to call out
to userspace. At that point the udev configure option would not matter
any more.

> Lastly, note that the plan is to drop all the firmware code from udev in the
> not too distant future, so it doesn't really maker much sense to add new
> functionality to that at this point.

Right, I think all is fine. It's something that people can control
with the kernel and udev configuration options. It's just that the
defaults of the kernel and udev don't match at the moment, because
they have different policies of setting default values.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 30/31] driver/base: implement subsys_virtual_register()

2013-03-02 Thread Kay Sievers

On Sat, Mar 2, 2013 at 7:17 PM, Greg Kroah-Hartman
 wrote:
> On Fri, Mar 01, 2013 at 07:24:21PM -0800, Tejun Heo wrote:
>> Kay tells me the most appropriate place to expose workqueues to
>> userland would be /sys/devices/virtual/workqueues/WQ_NAME which is
>> symlinked to /sys/bus/workqueue/devices/WQ_NAME and that we're lacking
>> a way to do that outside of driver core as virtual_device_parent()
>> isn't exported and there's no inteface to conveniently create a
>> virtual subsystem.
>
> I'm almost afraid to ask what you want to export to userspace for a
> workqueue that userspace would care about...
>
> If you create a subsystem, the devices will show up under the virtual
> "bus" if you don't give them a parent, so this patch shouldn't be
> needed, unless you are abusing the driver model.  What am I missing
> here?

Unfortunately, the parent == NULL --> /sys/devices/virtual//
we have only implemented for classes, and not for buses. We should fix
that.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] fs: Limit sys_mount to only request filesystem modules.

2013-03-05 Thread Kay Sievers

On Mon, Mar 4, 2013 at 8:51 AM, Eric W. Biederman  wrote:
>
> Modify the request_module to prefix the file system type with "fs-"
> and add aliases to all of the filesystems that can be built as modules
> to match.
>
> A common practice is to build all of the kernel code and leave code
> that is not commonly needed as modules, with the result that many
> users are exposed to any bug anywhere in the kernel.
>
> Looking for filesystems with a fs- prefix limits the pool of possible
> modules that can be loaded by mount to just filesystems trivially
> making things safer with no real cost.

'-' is a commonly used part of a module name, and does not mix well
with ramdom user provided names.

We usually use ':' as the prefix separator for modaliases, when
user-supplied strings are prefixed with the subsystem.

I think it would be nicer to change that, and I'm sure some creative
guy calls the next filesystem of the month fs-$something :)

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] driver core: add uid and gid to devtmpfs

2013-04-07 Thread Kay Sievers

On Sat, Apr 6, 2013 at 7:58 PM, Greg KH  wrote:
> On Sat, Apr 06, 2013 at 06:45:12PM +0100, Al Viro wrote:
>> On Sat, Apr 06, 2013 at 10:26:12AM -0700, Greg KH wrote:
>>
>> > Why not?  "closed" systems, like Android and other embedded systems,
>> > have "assigned" uid and gid values that never change.  Right now they
>> > have to have a horrible shell-script to set these values in devtmpfs
>> > when the device shows up to the needed numbers.  This tiny patch gets
>> > rid of that shell script entirely, allowing them to specify it from the
>> > kernel as needed.
>>
>> What's to stop them from using static /dev?  Has an extra benefit of
>> getting rid of devtmpfs shite completely...
>
> True, it would, but they like using devtmpfs :)
>
> This change also allows systems that have "control" devices to properly
> be able to pass in the uid for the device they are creating, like rawctl
> (which I know is "obsolete"), and probably dm and lvm.  I thought loop
> devices might also want this, as they can now be created by normal
> users, but I don't think that's needed for them.

It is generally useful to be able to control the uid/gid of
userspace-initiated device nodes, instead of racy post-adjusting this
setting from udev rules with an unpredictable long window between the
request and the adjustment.

The created device node can inherit the ownership of the calling
process, in a similar way like we do for devpts.

We have the same for the mode already, and want to be able to do the
same for the other permissions properties.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] firmware: Fix usermodehelper deadlock at shutdown

2013-05-08 Thread Kay Sievers

On Wed, May 8, 2013 at 6:26 PM, Takashi Iwai  wrote:
> At Thu, 9 May 2013 00:07:17 +0800,
> Ming Lei wrote:

>> On Wed, May 8, 2013 at 2:56 PM, Takashi Iwai  wrote:
>> > Hi,
>> >
>> > this is a series of patches for the issue we faced in the firmware
>> > loader code during debugging the problem with dell_rbu driver with
>> > 3.9 kernel.
>> >
>> > The original problem was that the shutdown gets stuck when DELL BIOS
>> > update is performed.  This turned out to be a problem in the firmware
>> > loader.  Although the reason of dell_rbu driver breakage is still
>>
>> Sorry, from these patchset, I can't see why it is a problem in firmware.
>>
>> > unclear, we should fix the firmware loader side, at least, not to
>> > stall during shutdown.
>>
>> Firstly you need to describe what/why is the stall? In fact, firmware
>> loading can't stall forever and it will timeout, but the current 60sec
>> timeout might be too long.
>
> The timeout check is activated only when uevent flag is set, and
> dell_rbu driver doesn't set it explicitly (because it's not supposed
> to be handled via udev or whatever).

These use the firmware loader not in the way the interface was intended:
  drivers/misc/lattice-ecp3-config.c
  drivers/firmware/dell_rbu.c

They just use the mechanism without any of the usual userspace setup.
It's really a nasty hack to hijack the interface that way.

The commit 6e3eaab02028c4087a92711b20abb9e72cc803a7 is a pretty broken
idea to start with. If something triggers uevents during runtime which
is not uncommon, these on-demand silently created firmware devices
would get really confused and race against the udev firmware loader
which cancels the events.

As if the userspace firmware loading in general wasn't a bad enough
idea already. :)

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

CONFIG_RTC_HCTOSYS lost on x86 with ALWAYS_USE_PERSISTENT_CLOCK changes?

2013-04-23 Thread Kay Sievers

Hey,

what's the intention of:
  e90c83f757fffdacec8b3c5eee5617dcc038338f ?
  x86: Select HAS_PERSISTENT_CLOCK on x86

It unconditionally sets:
  HAS_PERSISTENT_CLOCK
but:
  RTC_SYSTOHC
got a depends on !HAS_PERSISTENT_CLOCK

This makes it impossible to sync the system time from the RTC on x86.
What's going on here?

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CONFIG_RTC_HCTOSYS lost on x86 with ALWAYS_USE_PERSISTENT_CLOCK changes?

2013-04-23 Thread Kay Sievers

On Wed, Apr 24, 2013 at 4:43 AM, John Stultz  wrote:
> On 04/23/2013 06:34 PM, Kay Sievers wrote:
>>
>> Hey,
>>
>> what's the intention of:
>>e90c83f757fffdacec8b3c5eee5617dcc038338f ?
>>x86: Select HAS_PERSISTENT_CLOCK on x86
>>
>> It unconditionally sets:
>>HAS_PERSISTENT_CLOCK
>> but:
>>RTC_SYSTOHC
>> got a depends on !HAS_PERSISTENT_CLOCK
>>
>> This makes it impossible to sync the system time from the RTC on x86.
>> What's going on here?
>
>
> So I suspect just some confusion, but let me know if thats wrong and you're
> actually seeing an issue.
>
> SYSTOHC is what *sets the RTC* to the system time when we're synced with
> NTP.
>
> HCTOSYS is what sets the system time from the RTC.

Right, and RTC_HCTOSYS is not NTP related. It just reads the time from
the RTC_HCTOSYS_DEVICE at bootup so we do not boot in 1970 time mode.
We need that it in all cases, at every bootup on x86. But it's no
longer there with the above commits. :)

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CONFIG_RTC_HCTOSYS lost on x86 with ALWAYS_USE_PERSISTENT_CLOCK changes?

2013-04-23 Thread Kay Sievers

On Wed, Apr 24, 2013 at 5:19 AM, John Stultz  wrote:
> On 04/23/2013 08:05 PM, Kay Sievers wrote:
>>
>> On Wed, Apr 24, 2013 at 4:43 AM, John Stultz 
>> wrote:
>>>
>>> On 04/23/2013 06:34 PM, Kay Sievers wrote:
>>>>
>>>> Hey,
>>>>
>>>> what's the intention of:
>>>> e90c83f757fffdacec8b3c5eee5617dcc038338f ?
>>>> x86: Select HAS_PERSISTENT_CLOCK on x86
>>>>
>>>> It unconditionally sets:
>>>> HAS_PERSISTENT_CLOCK
>>>> but:
>>>> RTC_SYSTOHC
>>>> got a depends on !HAS_PERSISTENT_CLOCK
>>>>
>>>> This makes it impossible to sync the system time from the RTC on x86.
>>>> What's going on here?
>>>
>>>
>>> So I suspect just some confusion, but let me know if thats wrong and
>>> you're
>>> actually seeing an issue.
>>>
>>> SYSTOHC is what *sets the RTC* to the system time when we're synced with
>>> NTP.
>>>
>>> HCTOSYS is what sets the system time from the RTC.
>>
>> Right, and RTC_HCTOSYS is not NTP related. It just reads the time from
>> the RTC_HCTOSYS_DEVICE at bootup so we do not boot in 1970 time mode.
>> We need that it in all cases, at every bootup on x86. But it's no
>> longer there with the above commits. :)
>
> On x86 the persistent_clock() is backed by the CMOS/EFI/kvm-wall/xen/vrtc
> clock (all via x86_platform.get_wallclock) should be present and we'll
> initialize the time in timekeeping_init() there.
>
> Its only systems where there isn't a persistent_clock is where the RTC layer
> and the HCTOSYS is helpful.
>
> Again, if you're having a problem where an x86 system isn't getting its time
> initialized correctly, please let me know the details of the system.

Until the above commits we always needed:
  CONFIG_RTC_HCTOSYS=y
  CONFIG_RTC_HCTOSYS_DEVICE="rtc0"
to get the system time correctly initialized at bootup on x86.

These options are gone now and cannot be selected anymore. You are
saying that this is all fine, that they are gone, but all initial
clock syncing should still work?

Also:
  $ cat /sys/class/rtc/rtc0/hctosys
  0
always carried "1", and this now breaks setups which expect an
automatically created symlink /dev/rtc to the actual "system rtc".

There was also always a line in dmesg that told the rtc_cmos time it
wrote to the system clock. This is also gone?

I haven't looked what goes wrong, I expected the make(1) errors with
"time in the future" after bootup before the network is up, which I
see since a week or two, to be a fallout of that.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CONFIG_RTC_HCTOSYS lost on x86 with ALWAYS_USE_PERSISTENT_CLOCK changes?

2013-04-23 Thread Kay Sievers

On Wed, Apr 24, 2013 at 5:33 AM, Kay Sievers  wrote:

> Also:
>   $ cat /sys/class/rtc/rtc0/hctosys
>   0
> always carried "1", and this now breaks setups which expect an
> automatically created symlink /dev/rtc to the actual "system rtc".

We used to do this in upstream standard udev rules:
  SUBSYSTEM=="rtc", ATTR{hctosys}=="1", MODE="0644"
  
http://cgit.freedesktop.org/systemd/systemd/tree/rules/50-udev-default.rules#n18

If that information is expected to be gone now, we need to update some
tools. Whats' the proper way now to find the "system rtc" to use?

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CONFIG_RTC_HCTOSYS lost on x86 with ALWAYS_USE_PERSISTENT_CLOCK changes?

2013-04-24 Thread Kay Sievers

On Wed, Apr 24, 2013 at 6:07 PM, John Stultz  wrote:

> So summarizing the above, because as much as I'm aware, its always been
> redundant and unnecessary on x86.  Thus being able at build time to mark it
> as unnecessary was attractive, since it reduced the code paths running at
> suspend/resume.
>
> That said, Kay's concerns about userland implications (basically the
> userland side effects of SYSTOHC being enabled) give me pause, so I may
> revert the HAS_PERSISTENT_CLOCK on x86 change.

Thanks a lot for all the missing details!

No, I think that all makes too much sense to revert it. Let's just
find a way to solve it properly. I don't think it is of any pressing
importance to keep the old behaviour, if we can still provide the
functionality today.

I'll continue replying in the later mail ...

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CONFIG_RTC_HCTOSYS lost on x86 with ALWAYS_USE_PERSISTENT_CLOCK changes?

2013-04-24 Thread Kay Sievers

On Wed, Apr 24, 2013 at 6:30 PM, John Stultz  wrote:

>> Until the above commits we always needed:
>>CONFIG_RTC_HCTOSYS=y
>>CONFIG_RTC_HCTOSYS_DEVICE="rtc0"
>> to get the system time correctly initialized at bootup on x86.

> So... always needed to get system time correctly initialized? I'm not sure I
> agree with this. On non-x86 (mostly embedded) platforms that do not have
> persistent_clock support, yes, the above is needed.

Yeah, right, that's an "always" like the "forever" in "we support that
for forever" :)

> But I'm unaware of the above actually being necessary on x86, as its always
> had persistent_clock support.

Maybe it goes back longer, I remember that we needed to run hwclock in
userspace otherwise we had the 1970 state.

Here is the Fedora bug from 2009 to enable it:
  https://bugzilla.redhat.com/show_bug.cgi?id=489494

>> These options are gone now and cannot be selected anymore. You are
>> saying that this is all fine, that they are gone, but all initial
>> clock syncing should still work?
>
> Yes, we're just removing a duplicative initialization of time, and compiling
> out code in the suspend/resume path that would never trigger when
> persistent_clocks were present.

I see, makes sense.

>> Also:
>>$ cat /sys/class/rtc/rtc0/hctosys
>>0
>> always carried "1", and this now breaks setups which expect an
>> automatically created symlink /dev/rtc to the actual "system rtc".
>
>
> Sigh. So just turning off HCTOSYS on those systems causes them to break?

Well, the symlink is no longer there, which is visible. People asked
where it is gone now. That's the "breakage" which might not deserve
that word, if nothing really breaks that way. Stuff we looked at so
far, falls back to /dev/rtc0 which covers that.

> That is sort of obnoxiously fragile.  I've always been somewhat skeptical of
> the multi-rtc configs - as they're all the "system's" RTCs after all. 99%
> probably only have one rtc device, so checking the hctosys in that case is a
> little silly.

Yeah, ARM is as a mess, they often have rtc1 as the "system rtc", that
is why all this symlink game was "invented".

> But the terribly annoying interface breakage when /dev/rtc went to /dev/rtcN
> with the generic rtc layer landing shouldn't have happened, so I won't
> begrudge too much the userland hacks needed to fix that up.

Right.

> So I'll send Thomas a revert for the HCTOSYS optimization. In the kernel
> we'll still avoid using HCTOSYS when the persistent_clock is there, but at
> least userland will still have some /sys/class/rtc/rtcN that has the
> "offical" flag.

So in case you really revert it, x86 should not enable any of that RTC
stuff by default, right?

>> There was also always a line in dmesg that told the rtc_cmos time it
>> wrote to the system clock. This is also gone?
>
> More worrisome, I'll turn the question around: is that an expected interface
> never to break?

No, sure not. I was just noticing that, when looking what was going
on, and I couldn't make sense out of it before you explained the
details.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] kmsg: Honor dmesg_restrict sysctl on /dev/kmsg

2013-04-24 Thread Kay Sievers

On Tue, Apr 9, 2013 at 6:33 PM, Kees Cook  wrote:
> On Tue, Apr 9, 2013 at 8:48 AM, Josh Boyer  wrote:
>> The dmesg_restrict sysctl currently covers the syslog method for access
>> dmesg, however /dev/kmsg isn't covered by the same protections.  Most
>> people haven't noticed because util-linux dmesg(1) defaults to using the
>> syslog method for access in older versions.  With util-linux dmesg(1)
>> defaults to reading directly from /dev/kmsg.
>>
>> Fix this by reworking all of the access methods to use the
>> check_syslog_permissions function and adding checks to devkmsg_open and
>> devkmsg_read.
>>
>> This fixes https://bugzilla.redhat.com/show_bug.cgi?id=903192
>>
>> Reported-by: Christian Kujau 
>> CC: sta...@vger.kernel.org
>> Signed-off-by: Eric Paris 
>> Signed-off-by: Josh Boyer 
>
> Thanks!
>
> Acked-by: Kees Cook 

If that's the version currently in Fedora, we just cannot do this.
   https://bugzilla.redhat.com/show_bug.cgi?id=952655

/dev/kmsg is supposed, and was added, to be a sane alternative to
syslog(). It is already used in dmesg(1) which is now broken with this
patch.

The access rules for /dev/kmsg should follow the access rules of
syslog(), and not be any stricter.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitons

2013-04-24 Thread Kay Sievers

On Wed, Apr 24, 2013 at 8:32 PM, John Stultz  wrote:
> Kay Sievers noted that the ALWAYS_USE_PERSISTENT_CLOCK config,
> which enables some minor compile time optimization to avoid
> uncessary code in mostly the suspend/resume path could cause
> problems for userland.
>
> In particular, the dependency for RTC_HCTOSYS on
> !ALWAYS_USE_PERSISTENT_CLOCK, which avoids setting the time
> twice and simplifies suspend/resume, has the side effect
> of causing the /sys/class/rtc/rtcN/hctosys flag to always be
> zero, and this flag is commonly used by udev to setup the
> /dev/rtc symlink to /dev/rtcN, which can cause pain for
> older applications.

FWIW, in the light of the original change, I've just removed the
/dev/rtc creation from the default udev rules now, so that thing will
be phased out in the future.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitons

2013-04-24 Thread Kay Sievers

On Wed, Apr 24, 2013 at 8:55 PM, John Stultz  wrote:
> On 04/24/2013 11:41 AM, Kay Sievers wrote:
>>
>> On Wed, Apr 24, 2013 at 8:32 PM, John Stultz 
>> wrote:
>>>
>>> Kay Sievers noted that the ALWAYS_USE_PERSISTENT_CLOCK config,
>>> which enables some minor compile time optimization to avoid
>>> uncessary code in mostly the suspend/resume path could cause
>>> problems for userland.
>>>
>>> In particular, the dependency for RTC_HCTOSYS on
>>> !ALWAYS_USE_PERSISTENT_CLOCK, which avoids setting the time
>>> twice and simplifies suspend/resume, has the side effect
>>> of causing the /sys/class/rtc/rtcN/hctosys flag to always be
>>> zero, and this flag is commonly used by udev to setup the
>>> /dev/rtc symlink to /dev/rtcN, which can cause pain for
>>> older applications.
>>
>> FWIW, in the light of the original change, I've just removed the
>> /dev/rtc creation from the default udev rules now, so that thing will
>> be phased out in the future.
>
> Is that actually wanted? What happens to applications that use /dev/rtc?
>
> I think setting up the /dev/rtc link is important. Its just that setting it
> up exclusively by the hctosys flag is maybe more fragile then we'd like.
> Instead the hctosys flag maybe should only be used as a hint if there is
> more then one RTC available.

So far we only created the symlink for an rtc with the hctosys flag
set. It was added as a workaround for ARM, which sometimes has
multiple RTCs. But there is now also logic in udev/systemd anyway to
search for the system's rtc, which does not rely on the symlink. Other
commonly used tools we checked just use /dev/rtc0 directly.

As mentioned in the mail yesterday, I expected that
CONFIG_RTC_HCTOSYS=y was also needed on x86. The current udev logic
would need updating anyway, if we disable CONFIG_RTC_HCTOSYS=y now. So
let's just find out what's really needed here and add it back properly
if something really needs it.

While we are at it, the interface to read and set the persistent clock
from userspace, the clock the kernel internally uses, is still to just
use the /dev/rtc0 device with the ioctls? There is no other native
kernel timer interface or something else for that, right?

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitons

2013-04-24 Thread Kay Sievers

On Wed, Apr 24, 2013 at 8:32 PM, John Stultz  wrote:
> Kay Sievers noted that the ALWAYS_USE_PERSISTENT_CLOCK config,
> which enables some minor compile time optimization to avoid
> uncessary code in mostly the suspend/resume path could cause
> problems for userland.
>
> In particular, the dependency for RTC_HCTOSYS on
> !ALWAYS_USE_PERSISTENT_CLOCK, which avoids setting the time
> twice and simplifies suspend/resume, has the side effect
> of causing the /sys/class/rtc/rtcN/hctosys flag to always be
> zero, and this flag is commonly used by udev to setup the
> /dev/rtc symlink to /dev/rtcN, which can cause pain for
> older applications.

An alternative to reverting this could be to set that flag
unconditionally on the rtc that matches the persistent clock the
kernel uses internally?

I mean regardless of the pretty weird config option
CONFIG_RTC_HCTOSYS_DEVICE, which internally just does a strcmp() with
the given device name when the flag is queried. :)

Can't we just have the same interface and your original optimization
on x86, even without CONFIG_RTC_HCTOSYS_DEVICE enabled at all?

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] kmsg: Honor dmesg_restrict sysctl on /dev/kmsg

2013-04-24 Thread Kay Sievers

On Wed, Apr 24, 2013 at 11:51 PM, Josh Boyer  wrote:

>> In the daemon case, it's nice to be able to drop privileges after
>> setting up resources. The past was open /proc/kmsg with CAP_SYS_ADMIN,
>> then drop CAP_SYS_ADMIN and keep reading. Then later CAP_SYS_LOG was
>> introduced. So if a daemon switched from /proc/kmsg to /dev/kmsg they
>> wouldn't be able to drop the capability. But, it's much saner to carry
>> CAP_SYS_LOG than CAP_SYS_ADMIN on a long-running daemon.
>
> I have no idea on this front.  I'll let Kay speak to that.

The original code checks once at open() only, which would allow to do
do all that privilege dropping. It is how I would expect it to work,
instead of checking the permissions at every read().

> On my
> currently running Fedora 18 system, I actually have systemd-journald
> using /dev/kmsg

That's the recent structured logging stuff.

> and rsyslog using /proc/kmsg.

That's the old plain text syslog daemon stuff.

> Why I have both, I have no friggin idea.

Nobody removed the old syslog dameon by default from the distro. If
you don't want or need the plain text files in /var/log/ anymore, just
uninstall it and use journalctl(1) to see the system logs from then
on.

>> Is there an intention to use /dev/kmsg for the syslog management daemon?

Not that I know.

> Maybe?  I mean, systemd-journald seems to be using it for something.
> Kay?

I doubt that old syslog implementations will be ported to a new kernel
interface. They work just fine the way they are, and the structured
data that is additionally put out on the new interface, they cannot
really store away anyway in their plain text files, so they do not
gain anything really.

What we can probably expect though, is that in the future the default
systems will not install any old syslog daemon, which uses that
interface anymore.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CONFIG_RTC_HCTOSYS lost on x86 with ALWAYS_USE_PERSISTENT_CLOCK changes?

2013-04-25 Thread Kay Sievers

On Thu, Apr 25, 2013 at 9:11 AM, Alexander Holler  wrote:

> Hmm, I thought RTC_SYSTOHC was there to update the used RTC clock with the
> time from NTP (and liked that).

That seems to have the nice self-explaining name CONFIG_GENERIC_CMOS_UPDATE. :)

> Therefor I don't understand why it is
> redundant and unnecessary on x86.

Because the x86 native RTC/cmos is updated with platform code, not
generic rtc code:
  arch/x86/kernel/rtc.c

> Of course, most systems do have something
> in userspace to set the RTC on shutdown, so it isn't really needed.

That is gone on most systems today. Systems without NTP or something
else running have no clue about the time, and should not touch the
hardware clock with a boot cycle. Only if a reliable time source like
NTP is available, it should update the hardware clock accordingly.

> Anyway, thanks a lot for the great overview.

Yeah, thanks John, from my side too.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CONFIG_RTC_HCTOSYS lost on x86 with ALWAYS_USE_PERSISTENT_CLOCK changes?

2013-04-25 Thread Kay Sievers

On Thu, Apr 25, 2013 at 8:33 PM, Jason Gunthorpe
 wrote:
> So, my conclusion is nobody with a RTC looking for space savings, will
> disable CONFIG_RTC, which means we can safely rely on
> CONFIG_RTC_SYSTOHC to do this work. To that end, I would encourage
> everyone who sets CONFIG_GENERIC_CMOS_UPDATE to also set
> CONFIG_RTC_SYSTOHC - that will let update_persistent_clock to be
> ripped out over time without impacting users.

John's original patch forcefully disabled CONFIG_RTC_SYSTOHC on x86,
which is quite the opposite of what you recommend here. :)

Could you guys both sort that out and give an idea what the
recommended setup should look like today, ignoring all space saving
and possible hctosys API changes caused by this. Should
CONFIG_RTC_SYSTOHC be enabled or not?

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitons

2013-04-25 Thread Kay Sievers

On Wed, Apr 24, 2013 at 8:55 PM, John Stultz  wrote:

>> FWIW, in the light of the original change, I've just removed the
>> /dev/rtc creation from the default udev rules now, so that thing will
>> be phased out in the future.
>
> Is that actually wanted? What happens to applications that use /dev/rtc?
>
> I think setting up the /dev/rtc link is important. Its just that setting it
> up exclusively by the hctosys flag is maybe more fragile then we'd like.
> Instead the hctosys flag maybe should only be used as a hint if there is
> more then one RTC available.

Ok, convinced.

I've changed the udev rules now to first "search" for the rtc with
"hctosys" flag set, and if none is found, just fall back to /dev/rtc0.

It should work reliably on most boxes, and still do the right thing in
most cases if none of the rtcs has that flag.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 30/31] driver/base: implement subsys_virtual_register()

2013-03-07 Thread Kay Sievers

On Fri, Mar 8, 2013 at 12:31 AM, Greg Kroah-Hartman
 wrote:
> On Tue, Mar 05, 2013 at 12:43:27PM -0800, Tejun Heo wrote:
>> On Sun, Mar 03, 2013 at 07:42:31AM +0100, Kay Sievers wrote:
>> > On Sat, Mar 2, 2013 at 7:17 PM, Greg Kroah-Hartman
>> >  wrote:
>> > > On Fri, Mar 01, 2013 at 07:24:21PM -0800, Tejun Heo wrote:
>> > >> Kay tells me the most appropriate place to expose workqueues to
>> > >> userland would be /sys/devices/virtual/workqueues/WQ_NAME which is
>> > >> symlinked to /sys/bus/workqueue/devices/WQ_NAME and that we're lacking
>> > >> a way to do that outside of driver core as virtual_device_parent()
>> > >> isn't exported and there's no inteface to conveniently create a
>> > >> virtual subsystem.
>> > >
>> > > I'm almost afraid to ask what you want to export to userspace for a
>> > > workqueue that userspace would care about...
>> > >
>> > > If you create a subsystem, the devices will show up under the virtual
>> > > "bus" if you don't give them a parent, so this patch shouldn't be
>> > > needed, unless you are abusing the driver model.  What am I missing
>> > > here?
>> >
>> > Unfortunately, the parent == NULL --> /sys/devices/virtual//
>> > we have only implemented for classes, and not for buses. We should fix
>> > that.
>>
>> Greg, how should I proceed on this?  As I wrote before, I don't really
>> care about where or how.  As long as I can make workqueues visible to
>> userland, I'm happy.
>
> Sorry for the delay, I'm at a conference all this week, and haven't had
> much time to think about this.
>
> If Kay says this is ok for now, that's good enough for me.

Yes, it looks fine to me. If we provide the unified handling of
classes and buses some day, this can probably go away, but until that
it looks fine and is straight forward to do it that way,

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 30/31] driver/base: implement subsys_virtual_register()

2013-03-10 Thread Kay Sievers

On Sun, Mar 10, 2013 at 5:45 PM, Greg Kroah-Hartman
 wrote:
> On Sun, Mar 10, 2013 at 04:57:02AM -0700, Tejun Heo wrote:
>> Hey, guys.
>>
>> On Fri, Mar 08, 2013 at 01:04:25AM +0100, Kay Sievers wrote:
>> > > Sorry for the delay, I'm at a conference all this week, and haven't had
>> > > much time to think about this.
>> > >
>> > > If Kay says this is ok for now, that's good enough for me.
>> >
>> > Yes, it looks fine to me. If we provide the unified handling of
>> > classes and buses some day, this can probably go away, but until that
>> > it looks fine and is straight forward to do it that way,
>>
>> How should this be routed?  I can take it but Kay needs it too so
>> workqueue tree probably isn't the best fit although I can set up a
>> separate branch if needed.
>
> What patch set does Kay need it for?  I have no objection for you to
> take it through the workqueue tree:

The dbus bus has the same issues and needs the devices put under
virtual/ and not the devices/ root.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 30/31] driver/base: implement subsys_virtual_register()

2013-03-10 Thread Kay Sievers

On Sun, Mar 10, 2013 at 6:24 PM, Greg Kroah-Hartman
 wrote:
> On Sun, Mar 10, 2013 at 06:00:18PM +0100, Kay Sievers wrote:
>> On Sun, Mar 10, 2013 at 5:45 PM, Greg Kroah-Hartman
>>  wrote:
>> > On Sun, Mar 10, 2013 at 04:57:02AM -0700, Tejun Heo wrote:
>> >> Hey, guys.
>> >>
>> >> On Fri, Mar 08, 2013 at 01:04:25AM +0100, Kay Sievers wrote:
>> >> > > Sorry for the delay, I'm at a conference all this week, and haven't 
>> >> > > had
>> >> > > much time to think about this.
>> >> > >
>> >> > > If Kay says this is ok for now, that's good enough for me.
>> >> >
>> >> > Yes, it looks fine to me. If we provide the unified handling of
>> >> > classes and buses some day, this can probably go away, but until that
>> >> > it looks fine and is straight forward to do it that way,
>> >>
>> >> How should this be routed?  I can take it but Kay needs it too so
>> >> workqueue tree probably isn't the best fit although I can set up a
>> >> separate branch if needed.
>> >
>> > What patch set does Kay need it for?  I have no objection for you to
>> > take it through the workqueue tree:
>>
>> The dbus bus has the same issues and needs the devices put under
>> virtual/ and not the devices/ root.
>
> Yes, but I can keep Tejun's patch in my local queue for now, dbus is
> going to not make 3.10, right?

No, sure not. It's just something we will need there too, but there is
no hurry, it's only a cosmetic issue anyway and nothing that matters
functionality-wise.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] printk: add option to print cpu id

2012-08-02 Thread Kay Sievers

On Thu, Aug 2, 2012 at 9:46 PM, Vikram Pandita  wrote:
> From: Vikram Pandita 
>
> Introduce config option to enable CPU id reporting for printk() calls.
>
> Its sometimes very useful to have printk also print the CPU Identifier
> that executed the call. This has helped to debug various SMP issues on 
> shipping
> products.
>
> Known limitation is, if the system gets preempted between function call and
> actual printk, the reported cpu-id might not be accurate. But most of the
> times its seen to give a good feel of how the N cpu's in the system are
> getting loaded.
>
> Signed-off-by: Vikram Pandita 
> Cc: Mike Turquette 
> Cc: Vimarsh Zutshi 
> ---
>  kernel/printk.c   |   20 
>  lib/Kconfig.debug |   13 +
>  2 files changed, 33 insertions(+)
>
> diff --git a/kernel/printk.c b/kernel/printk.c
> index 6a76ab9..50feb82 100644
> --- a/kernel/printk.c
> +++ b/kernel/printk.c
> @@ -855,6 +855,25 @@ static size_t print_time(u64 ts, char *buf)
>(unsigned long)ts, rem_nsec / 1000);
>  }
>
> +#if defined(CONFIG_PRINTK_CPUID)
> +static bool printk_cpuid = 1;
> +#else
> +static bool printk_cpuid;
> +#endif
> +module_param_named(cpuid, printk_cpuid, bool, S_IRUGO | S_IWUSR);
> +
> +static size_t print_cpuid(char *buf)
> +{
> +
> +   if (!printk_cpuid)
> +   return 0;
> +
> +   if (!buf)
> +   return 4;
> +
> +   return sprintf(buf, "[%1d] ", smp_processor_id());
> +}
> +
>  static size_t print_prefix(const struct log *msg, bool syslog, char *buf)
>  {
> size_t len = 0;
> @@ -874,6 +893,7 @@ static size_t print_prefix(const struct log *msg, bool 
> syslog, char *buf)
> }
> }
>
> +   len += print_cpuid(buf ? buf + len : NULL);
> len += print_time(msg->ts_nsec, buf ? buf + len : NULL);
> return len;
>  }

How is that supposed to be useful?

The prefix is added while exporting data from the kmsg buffer, which
is just the CPU that *reads* the data from the buffer, not the one
that has *written* the data it into it.

Do I miss anything here?

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] printk: add option to print cpu id

2012-08-02 Thread Kay Sievers

On Fri, Aug 3, 2012 at 1:50 AM, Pandita, Vikram  wrote:
> On Thu, Aug 2, 2012 at 1:08 PM, Kay Sievers  wrote:

>> How is that supposed to be useful?
>>
>> The prefix is added while exporting data from the kmsg buffer, which
>> is just the CPU that *reads* the data from the buffer, not the one
>> that has *written* the data it into it.
>
> I don't think so.
> I can see the backtrace of the printk() call looks like follows:
>
> print_cpuid
>  print_prefix
>   msg_print_text
>console_unlock
> vprint_emit
>  printk
>
> Now this is a synchronous path, where in the buffer is getting filled
> with cpuid and timer info from the printk() calling context.
> So you should get the right CPU id with the trace - with the exception
> that i pointed out earlier for preemption.
>
>>
>> Do I miss anything here?
>
> This is my understanding of the printk framework.
> At least the print_time and print_cpuid seems to be happening
> synchronously wrt printk

Printk is a store-and-forward-model, and it always was. There is no
guarantee at all, that the data is immediately flushed to the console
by the same CPU, it just happens to be in most cases. It's pretty
common though, that a different task is doing that work when it gets
the console lock, and that is not a matter of preemption, it's normal
and expected operation. The data which CPU has emitted the text is
just not available. It would need to be stored in the records, for
this to work.

Your patch just prints the CPU that writes to the console, not
necessarily the one that has stored the data. I think the second one
is which is what you are looking for, but that is not what the patch
does.

Also dmesg and syslog uses the same logic and would put-out all
entirely wrong CPU information with it, because the original
information is long gone.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] printk: add option to print cpu id

2012-08-03 Thread Kay Sievers

On Fri, Aug 3, 2012 at 11:36 AM, Pandita, Vikram  wrote:
> On Fri, Aug 3, 2012 at 2:32 AM, Venu Byravarasu  
> wrote:

>> As having Macro locally in the file of interest would serve the purpose,
>> Why to change the printk code?
>
> As stated, the logic followed is exactly similar to well proven and
> approved way to handle printk time stamp: CONFIG_PRINTK_TIME
> Its an overhead on the system that enables the config option:
> CONFIG_PRINTK_CPUID
> Otherwise the system remains as is.
>
> To gain insight on SMP system logging behavior, the price to pay is
> the extra 4 chars per printk line,
> just like printk-time adds 15 chars to each line. Both options can be
> disabled as desired.
>
> So i am not sure what kind of option you are proposing?
> Do u imply PRINTK_TIME is not right then?

It's 8 bytes per message for storing the timestamp in the records.
There is never 15 bytes storage space needed, the prefix is
constructed on-the-fly only while exporting the data.

The CPU-ID would need at least two additional bytes (2^16 CPUS) in
every record, unless it's a compile-time option. I can't tell, if
everybody wants to pay the storage space for the the CPU-ID feature.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] printk: add option to print cpu id

2012-08-03 Thread Kay Sievers

On Fri, Aug 3, 2012 at 11:43 AM, Nikunj A Dadhania
 wrote:
> On Fri, 3 Aug 2012 02:16:18 -0700, Vikram Pandita  
> wrote:

>> +static size_t print_cpuid(u8 cpuid, char *buf)
>> +{
>> +
>> + if (!printk_cpuid)
>> + return 0;
>> +
>> + if (!buf)
>> + return 4;
>> +

> Firstly, why this magic number?
> Secondly, if buf is NULL, why should you increment?

The !buffer logic is used when we only calculate the size of the
output buffer, but do not print to it. It's just an optimization. And
yes, the number returned should match the number of chars that would
have been printed, so this does not look right.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] printk: add option to print cpu id

2012-08-03 Thread Kay Sievers

On Fri, Aug 3, 2012 at 11:56 AM, Pandita, Vikram  wrote:
> On Fri, Aug 3, 2012 at 2:48 AM, Kay Sievers  wrote:
>> On Fri, Aug 3, 2012 at 11:36 AM, Pandita, Vikram  
>> wrote:
>>> On Fri, Aug 3, 2012 at 2:32 AM, Venu Byravarasu  
>>> wrote:
>>
>>>> As having Macro locally in the file of interest would serve the purpose,
>>>> Why to change the printk code?
>>>
>>> As stated, the logic followed is exactly similar to well proven and
>>> approved way to handle printk time stamp: CONFIG_PRINTK_TIME
>>> Its an overhead on the system that enables the config option:
>>> CONFIG_PRINTK_CPUID
>>> Otherwise the system remains as is.
>>>
>>> To gain insight on SMP system logging behavior, the price to pay is
>>> the extra 4 chars per printk line,
>>> just like printk-time adds 15 chars to each line. Both options can be
>>> disabled as desired.
>>>
>>> So i am not sure what kind of option you are proposing?
>>> Do u imply PRINTK_TIME is not right then?
>>
>> It's 8 bytes per message for storing the timestamp in the records.
>> There is never 15 bytes storage space needed, the prefix is
>> constructed on-the-fly only while exporting the data.
>
> When i was referring to 15 chars, its coming from here:
> Its the space reserved in each line of output. Corresponding space for
> cpuid is 4 chars: "[x] ":

Just saying, that's just the length of the printed line to the console
or syslog, there is no reservation or space used for that internally.

>> The CPU-ID would need at least two additional bytes (2^16 CPUS) in
>> every record, unless it's a compile-time option.
>
> are u proposing:
> a) to make cpuid a u16?

That would be needed, I guess. We easily have server systems with more
than 255 CPUs. It will only be a matter of time, that the number of
CPUs will increase for everybody, I guess.

> b) to put cpuid in struct cont and struct log - under the #ifdef macro?

As said, I really can't tell how generally useful it is, and if people
think that it should be there unconditionally, should not be there at
all, or as a compile time option. Others might have an opinion on
that.

I personally never missed the CPU-ID in the logs. I personally would
find the PID/task ID more interesting, and even that I never really
missed. :)

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: udev 182: response timeout for request_firmware in module_probe path

2012-08-23 Thread Kay Sievers

On Thu, Aug 23, 2012 at 5:16 PM, Ming Lei  wrote:
> On Tue, Aug 21, 2012 at 1:34 PM, Ming Lei  wrote:

>> I found that udev 182 doesn't response for the request_firmware in
>> module_probe path until 30sec later after the 'ADD' event of firmware
>> device, but no such problem in udev175, sounds like a regression of udev?
>
> Looks udevd is capable of handling the firmware ADD event even though
> the device ADD event is being processed( modprobe is triggered by device
> ADD event).

Calling out from inside the kernel and blocking in a firmware loading
userspace transaction during module_init() is kind of weird.

The firmware loading call should not rely on a fully functional
userspace, and modprobe should not hang and block until the firmware
request is handled.

The firmware should be requested asynchronously or from a different
context as module_init(). It depends on the type of driver/hardware
what's the best approach here.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.25-rc1-sha1: WARNING: at lib/kref.c:43 kref_get+0x20/0x30()

2008-02-19 Thread Kay Sievers

On Feb 18, 2008 1:59 PM, Andrew Morton <[EMAIL PROTECTED]> wrote:
> On Fri, 15 Feb 2008 14:08:53 +0300 Alexey Dobriyan <[EMAIL PROTECTED]> wrote:
>
> > Booting without SYSFS fills dmesg like this
>
> Does the system normally boot without sysfs?  Surprised.
>
>
> > [ cut here ]
> > WARNING: at lib/kref.c:43 kref_get+0x20/0x30()
> > Modules linked in: loop(+)
> > Pid: 994, comm: modprobe Tainted: G   M 2.6.25-rc1 #6
> >
> > Call Trace:
> >  [] warn_on_slowpath+0x64/0x90
> >  [] init_object+0x88/0xa0
> >  [] __slab_alloc+0xbe/0x550
> >  [] kvasprintf+0x58/0x90
> >  [] vsnprintf+0x33c/0x6e0
> >  [] kvasprintf+0x71/0x90
> >  [] hrtick_set+0x77/0x140
> >  [] kref_get+0x20/0x30
> >  [] kobject_get+0x12/0x20
> >  [] kobject_add_internal+0x49/0xf0
> >  [] kobject_add_varg+0x73/0x80
> >  [] kobject_add+0x54/0x80
> >  [] check_object+0x103/0x230
> >  [] init_object+0x88/0xa0
> >  [] kobject_create+0x14/0x40
> >  [] kmem_cache_alloc+0xd3/0xe0
> >  [] kobject_init+0x34/0xb0
> >  [] kobject_create+0x2b/0x40
> >  [] kobject_create_and_add+0x3c/0x80
> >  [] load_module+0x1847/0x1970
> >  [] alloc_pages_current+0x0/0x90
> >  [] sys_init_module+0x62/0x1a0
> >
>
> Is this a regression?

Alexey,
do you always see these messages with every modprobe of a module?

Can you enable CONFIG_DEBUG_KOBJECT=y? It might give a hint where we
miss to initialization of a kobject when sysfs is not compiled in.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.24.1 - kernel does not boot; IRQ trouble?

2008-02-19 Thread Kay Sievers

On Feb 18, 2008 9:06 PM, Stephen Hemminger
<[EMAIL PROTECTED]> wrote:
> On Mon, 18 Feb 2008 19:42:25 + (GMT)
> Chris Rankin <[EMAIL PROTECTED]> wrote:
>
> > --- Stephen Hemminger <[EMAIL PROTECTED]> wrote:
> > > > > sysfs: duplicate filename 'bridge' can not be created
> > > > > WARNING: at fs/sysfs/dir.c:424 sysfs_add_one()
> > > > > Pid: 1, comm: swapper Not tainted 2.6.24.1 #1
> > > > >  [] show_trace_log_lvl+0x1a/0x2f
> > > > >  [] show_trace+0x12/0x14
> > > > >  [] dump_stack+0x6c/0x72
> > > > >  [] sysfs_add_one+0x57/0xbc
> > > > >  [] sysfs_create_link+0xc2/0x10d
> > > > >  [] pci_bus_add_devices+0xbd/0x103
> > > > >  [] pci_legacy_init+0x56/0xe3
> > > > >  [] kernel_init+0x157/0x2c3
> > > > >  [] kernel_thread_helper+0x7/0x10
> > > > >  ===
> > > > > pci :00:01.0: Error creating sysfs bridge symlink, continuing...
> > > > > sysfs: duplicate filename 'bridge' can not be created
> > > > > WARNING: at fs/sysfs/dir.c:424 sysfs_add_one()
> > > > > Pid: 1, comm: swapper Not tainted 2.6.24.1 #1
> > > > >  [] show_trace_log_lvl+0x1a/0x2f
> > > > >  [] show_trace+0x12/0x14
> > > > >  [] dump_stack+0x6c/0x72
> > > > >  [] sysfs_add_one+0x57/0xbc
> > > > >  [] sysfs_create_link+0xc2/0x10d
> > > > >  [] pci_bus_add_devices+0xbd/0x103
> > > > >  [] pci_bus_add_devices+0xa5/0x103
> > > > >  [] pci_legacy_init+0x56/0xe3
> > > > >  [] kernel_init+0x157/0x2c3
> > > > >  [] kernel_thread_helper+0x7/0x10
> > > > >  ===
> > > >
> > > > I have a vague feeling that this was fixed, perhaps in 2.6.24.x?
> > >
> > > Never heard of this, what is the initialization script that causes this?
> > > Also do you have the SYSFS_DEPRECATED option configured? that caused 
> > > issues
> > > with regular network drivers.
> >
> > Yes, SYSFS_DEPRECATED is enabled. And the init scripts are from Fedora 8.
>
> There was a bug (fixed in 2.6.24) that had to do with sysfs_create_link
> and SYSFS_DEPRECATED probably there is a similar problem with directories.

Chris, could you enable CONFIG_DEBUG_KOBJECT=y, it might show what
objects try to claim the same name.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.25-rc1-sha1: WARNING: at lib/kref.c:43 kref_get+0x20/0x30()

2008-02-19 Thread Kay Sievers

On Tue, 2008-02-19 at 15:03 +0300, Alexey Dobriyan wrote:
> On Tue, Feb 19, 2008 at 09:19:25AM +0100, Kay Sievers wrote:
> > On Feb 18, 2008 1:59 PM, Andrew Morton <[EMAIL PROTECTED]> wrote:
> > > On Fri, 15 Feb 2008 14:08:53 +0300 Alexey Dobriyan <[EMAIL PROTECTED]> 
> > > wrote:
> > >
> > > > Booting without SYSFS fills dmesg like this

> > Can you enable CONFIG_DEBUG_KOBJECT=y? It might give a hint where we
> > miss to initialization of a kobject when sysfs is not compiled in.
> 
> It looks like this:

> [ cut here ]
> WARNING: at lib/kref.c:43 kref_get+0x2d/0x30()
> Modules linked in: battery(+) button dock thermal processor sbs ac sbshc 
> af_packet loop
> Pid: 1642, comm: modprobe Not tainted 2.6.25-rc2 #2
> 
> Call Trace:
>  [] warn_on_slowpath+0x5f/0x80
>  [] ? mark_held_locks+0x56/0xa0
>  [] ? __slab_alloc+0xc0/0x4b0
>  [] ? trace_hardirqs_on+0xbf/0x150
>  [] ? kvasprintf+0x57/0x90
>  [] ? vsnprintf+0x328/0x6e0
>  [] ? kvasprintf+0x70/0x90
>  [] kref_get+0x2d/0x30
>  [] kobject_get+0x1a/0x30
>  [] kobject_add_internal+0x59/0x160
>  [] kobject_add_varg+0x6e/0x80
>  [] kobject_add+0x69/0x90
>  [] ? trace_hardirqs_on+0xbf/0x150
>  [] ? kref_init+0xe/0x10
>  [] ? kobject_init+0x37/0xa0
>  [] ? kobject_create+0x33/0x40
>  [] kobject_create_and_add+0x3e/0x80
>  [] sys_init_module+0x19df/0x1b30
>  [] ? __lock_acquire+0x748/0x10b0
>  [] ? acpi_bus_register_driver+0x0/0x40
>  [] ? trace_hardirqs_on+0xbf/0x150
>  [] ? trace_hardirqs_on_thunk+0x35/0x3a
>  [] system_call_after_swapgs+0x7b/0x80
> 
> ---[ end trace 033c2b23880cf02a ]---
> kobject: 'notes' (81017ec370a8): kobject_add_internal: parent: '', 
> set: ''

Ok, seems the "notes" directory should not be created if SYSFS is not
configured. The "notes" kobject tries to reference the module kobject as
a parent, which isn't initialized in that case.

Does that fix it?

Thanks,
Kay


From: Kay Sievers <[EMAIL PROTECTED]>
Subject: modules: do not try to add sysfs attributes if !CONFIG_SYSFS 

Signed-off-by: Kay Sievers <[EMAIL PROTECTED]>
---
 module.c |8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 92595ba..2dd60c8 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -987,6 +987,7 @@ static unsigned long resolve_symbol(Elf_Shdr *sechdrs,
return ret;
 }
 
+#ifdef CONFIG_SYSFS
 
 /*
  * /sys/module/foo/sections stuff
@@ -1169,7 +1170,7 @@ static void remove_notes_attrs(struct module *mod)
free_notes_attrs(mod->notes_attrs, mod->notes_attrs->notes);
 }
 
-#else
+#else /* CONFIG_KALLSYMS */
 
 static inline void add_sect_attrs(struct module *mod, unsigned int nsect,
char *sectstrings, Elf_Shdr *sechdrs)
@@ -1190,7 +1191,6 @@ static inline void remove_notes_attrs(struct module *mod)
 }
 #endif /* CONFIG_KALLSYMS */
 
-#ifdef CONFIG_SYSFS
 int module_add_modinfo_attrs(struct module *mod)
 {
struct module_attribute *attr;
@@ -1231,9 +1231,7 @@ void module_remove_modinfo_attrs(struct module *mod)
}
kfree(mod->modinfo_attrs);
 }
-#endif
 
-#ifdef CONFIG_SYSFS
 int mod_sysfs_init(struct module *mod)
 {
int err;
@@ -1299,7 +1297,7 @@ out_unreg:
kobject_put(&mod->mkobj.kobj);
return err;
 }
-#endif
+#endif /* CONFIG_SYSFS */
 
 static void mod_kobject_remove(struct module *mod)
 {


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.24.1 - kernel does not boot; IRQ trouble?

2008-02-22 Thread Kay Sievers

On Tue, Feb 19, 2008 at 9:47 AM, Kay Sievers <[EMAIL PROTECTED]> wrote:
> On Feb 18, 2008 9:06 PM, Stephen Hemminger <[EMAIL PROTECTED]> wrote:
>  > On Mon, 18 Feb 2008 19:42:25 + (GMT)
>  > Chris Rankin <[EMAIL PROTECTED]> wrote:
>  >
>  > > --- Stephen Hemminger <[EMAIL PROTECTED]> wrote:
>  > > > > > sysfs: duplicate filename 'bridge' can not be created
>  > > > > > WARNING: at fs/sysfs/dir.c:424 sysfs_add_one()
>  > > > > > Pid: 1, comm: swapper Not tainted 2.6.24.1 #1
>  > > > > >  [] show_trace_log_lvl+0x1a/0x2f
>  > > > > >  [] show_trace+0x12/0x14
>  > > > > >  [] dump_stack+0x6c/0x72
>  > > > > >  [] sysfs_add_one+0x57/0xbc
>  > > > > >  [] sysfs_create_link+0xc2/0x10d
>  > > > > >  [] pci_bus_add_devices+0xbd/0x103
>  > > > > >  [] pci_legacy_init+0x56/0xe3
>  > > > > >  [] kernel_init+0x157/0x2c3
>  > > > > >  [] kernel_thread_helper+0x7/0x10
>  > > > > >  ===
>  > > > > > pci :00:01.0: Error creating sysfs bridge symlink, 
> continuing...
>  > > > > > sysfs: duplicate filename 'bridge' can not be created
>  > > > > > WARNING: at fs/sysfs/dir.c:424 sysfs_add_one()

Greg,
it seems that:
  arch/x86/pci/legacy.c :: pci_legacy_init()

tries to create already created "bridge" symlinks in 2.6.24. So we
discover the same devices twice? Can this be a reason for the hang?

I guess in 2.6.25, the warning is gone with:
  commit fd7d1ced29e5beb88c9068801da7a362606d8273
  Author: Greg Kroah-Hartman <[EMAIL PROTECTED]>
  Date:   Tue May 22 22:47:54 2007 -0400

  PCI: make pci_bus a struct device

  This moves the pci_bus class device to be a real struct device and at
  the same time, place it in the device tree in the correct location.
  Note, the old "bridge" symlink is now gone.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.5-rc6 printk formatting problem during oom-kill.

2012-07-09 Thread Kay Sievers

On Mon, Jul 9, 2012 at 8:03 PM, Dave Jones  wrote:
> I noticed that the format of the oom-killer output seems to have changed, and
> now it spews stuff like..
>
> [49461.758070] lowmem_reserve[]:
> [49461.758071]  0
> [49461.758071]  2643
> [49461.758071]  3878
> [49461.758072]  3878
> [49461.758072]
> [49461.758072] Node 0

> Does the oom-killer code need modifying, or the printk code ?
> I know there's been some regressions in this area recently, but this is still
> happening on the current tree (8c84bf4166a4698296342841a549bbee03860ac0)

This likely fixes it:
  
http://git.kernel.org/?p=linux/kernel/git/kay/patches.git;a=blob;f=kmsg-merge-cont.patch;hb=HEAD

Let me check if it does, and if I can reproduce it.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.5-rc6 printk formatting problem during oom-kill.

2012-07-09 Thread Kay Sievers

On Mon, 2012-07-09 at 20:27 +0200, Kay Sievers wrote:
> On Mon, Jul 9, 2012 at 8:03 PM, Dave Jones  wrote:
> > I noticed that the format of the oom-killer output seems to have changed, 
> > and
> > now it spews stuff like..
> >
> > [49461.758070] lowmem_reserve[]:
> > [49461.758071]  0
> > [49461.758071]  2643
> > [49461.758071]  3878
> > [49461.758072]  3878
> > [49461.758072]
> > [49461.758072] Node 0
> 
> > Does the oom-killer code need modifying, or the printk code ?
> > I know there's been some regressions in this area recently, but this is 
> > still
> > happening on the current tree (8c84bf4166a4698296342841a549bbee03860ac0)
> 
> This likely fixes it:
>   
> http://git.kernel.org/?p=linux/kernel/git/kay/patches.git;a=blob;f=kmsg-merge-cont.patch;hb=HEAD
> 
> Let me check if it does, and if I can reproduce it.

It looks fine here with the above mentioned patch:
[0.00] lowmem_reserve[]:
[0.00]  0
[0.00]  0
[0.00]  0
[0.00]  0
[0.00] 
[0.00] DMA: 
[0.00] 1*4kB 
[0.00] 0*8kB 
[0.00] 0*16kB 
[0.00] 1*32kB 
[0.00] 2*64kB 
[0.00] 1*128kB 
[0.00] 1*256kB 
[0.00] 0*512kB 
[0.00] 1*1024kB 
[0.00] 1*2048kB 
[0.00] 3*4096kB 
[0.00] = 15908kB

becomes:
[0.00] lowmem_reserve[]: 0 0 0 0
[0.00] DMA: 1*4kB 0*8kB 0*16kB 1*32kB 2*64kB 1*128kB 1*256kB 0*512kB 
1*1024kB 1*2048kB 3*4096kB = 15908kB

Thanks,
Kay


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.5-rc6 printk formatting problem during oom-kill.

2012-07-09 Thread Kay Sievers

On Mon, Jul 9, 2012 at 10:44 PM, Joe Perches  wrote:

>> > That single patch doesn't apply cleanly to Linus'
>> > 8c84bf4166a4698296342841a549bbee03860ac0
>> >
>> > What else is necessary?
>> >
>> > Your tree seems to have a collection of random patches.
>> >
>> > It might be useful to clone Linus' tree and produce a
>> > branch with all the necessary printk patches in it so
>> > someone else could pull it.

I just worked on top of Greg's tree with the pending stuff for 3.5.

>> They should all now be in my driver-core-next branch that will show up
>> in the next linux-next release, so having a separate tree isn't
>> necessary.
>
> I don't think so.
>
> There are real defects in the existing code.
>
> These are patches that are necessary _now_.
> not for a -next 3.6 future.

Wrong conclusion. They are not in the 3.6 branch, but still go into -next:
  
http://git.kernel.org/?p=linux/kernel/git/gregkh/driver-core.git;a=shortlog;h=refs/heads/driver-core-linus

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3] printk: Have printk() never buffer its data

2012-07-09 Thread Kay Sievers

On Mon, Jul 9, 2012 at 11:42 PM, Joe Perches  wrote:
> On Sun, 2012-07-08 at 19:55 +0200, Kay Sievers wrote:
>
>> At the same time the CPU#2 prints the same warning with a continuation
>> line, but the buffer from CPU#1 can not be flushed to the console, nor
>> can the continuation line printk()s from CPU#2 be merged at this point.
>> The consoles are still locked and busy with replaying the old log
>> messages, so the new continuation data is just stored away in the record
>> buffer as it is coming in.
>> If the console would be registered a bit earlier, or the warning would
>> happen a bit later, we would probably not see any of this.
>>
>> I can fake something like this just by holding the console semaphore
>> over a longer time and printing continuation lines with different CPUs
>> in a row.
>>
>> The patch below seems to work for me. It is also here:
>>   
>> http://git.kernel.org/?p=linux/kernel/git/kay/patches.git;a=blob;f=kmsg-merge-cont.patch;hb=HEAD
>>
>> It only applies cleanly on top of this patch:
>>   
>> http://git.kernel.org/?p=linux/kernel/git/kay/patches.git;a=blob;f=kmsg-syslog-1-byte-read.patch;hb=HEAD
>>
>
> Hi Kay.
>
> I just ran a test with what's in Greg's driver-core -for-linus branch.
>
> One of the differences in dmesg is timestamping of consecutive
> pr_("foo...)
> followed directly by
> pr_cont("bar...")
>
> For instance: (dmesg is 3.4, dmesg.0 is 3.5-rc6+)
>
> # grep MAP /var/log/dm* -A1
> dmesg:[0.781687] ata_piix :00:1f.2: MAP [ P0 P2 P1 P3 ]
> dmesg-[0.781707] ata2: port disabled--ignoring
> --
> dmesg.0:[0.948881] ata_piix :00:1f.2: MAP [
> dmesg.0-[0.948883]  P0 P2 P1 P3 ]
>
> These messages originate starting at
> drivers/ata/ata_piix.c:1354
>
> All the continuations are emitted with pr_cont.
>
> I think this output should still be coalesced without
> timestamp deltas.  Perhaps the timestamping code can
> still be reworked to avoid too small a delta producing
> a new timestamp and another dmesg line.

Hmm, I don't see that.

If I do:
  pr_info("[");
  for (i = 0; i < 4; i++)
   pr_cont("%i ", i);
  pr_cont("]\n");

I get:
  6,173,0;[0 1 2 3 ]

And if I fill the cont buffer and forcefully hold the console sem
during all that, and we can't merge anymore, I get:
  6,167,0;[
  4,168,0;0
  4,169,0;1
  4,170,0;2
  4,171,0;3
  4,172,0;]

But the output is still all fine for both lines:
  [0.00] [0 1 2 3 ]
  [0.00] [0 1 2 3 ]

What do I miss?

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3] printk: Have printk() never buffer its data

2012-07-09 Thread Kay Sievers

On Tue, Jul 10, 2012 at 12:29 AM, Joe Perches  wrote:
> On Tue, 2012-07-10 at 00:10 +0200, Kay Sievers wrote:
>> On Mon, Jul 9, 2012 at 11:42 PM, Joe Perches  wrote:
>> > On Sun, 2012-07-08 at 19:55 +0200, Kay Sievers wrote:
>> >
>> >> At the same time the CPU#2 prints the same warning with a continuation
>> >> line, but the buffer from CPU#1 can not be flushed to the console, nor
>> >> can the continuation line printk()s from CPU#2 be merged at this point.
>> >> The consoles are still locked and busy with replaying the old log
>> >> messages, so the new continuation data is just stored away in the record
>> >> buffer as it is coming in.
>> >> If the console would be registered a bit earlier, or the warning would
>> >> happen a bit later, we would probably not see any of this.
>> >>
>> >> I can fake something like this just by holding the console semaphore
>> >> over a longer time and printing continuation lines with different CPUs
>> >> in a row.
>> >>
>> >> The patch below seems to work for me. It is also here:
>> >>   
>> >> http://git.kernel.org/?p=linux/kernel/git/kay/patches.git;a=blob;f=kmsg-merge-cont.patch;hb=HEAD
>> >>
>> >> It only applies cleanly on top of this patch:
>> >>   
>> >> http://git.kernel.org/?p=linux/kernel/git/kay/patches.git;a=blob;f=kmsg-syslog-1-byte-read.patch;hb=HEAD
>> >>
>> >
>> > Hi Kay.
>> >
>> > I just ran a test with what's in Greg's driver-core -for-linus branch.
>> >
>> > One of the differences in dmesg is timestamping of consecutive
>> > pr_("foo...)
>> > followed directly by
>> > pr_cont("bar...")
>> >
>> > For instance: (dmesg is 3.4, dmesg.0 is 3.5-rc6+)
>> >
>> > # grep MAP /var/log/dm* -A1
>> > dmesg:[0.781687] ata_piix :00:1f.2: MAP [ P0 P2 P1 P3 ]
>> > dmesg-[0.781707] ata2: port disabled--ignoring
>> > --
>> > dmesg.0:[0.948881] ata_piix :00:1f.2: MAP [
>> > dmesg.0-[0.948883]  P0 P2 P1 P3 ]
>> >
>> > These messages originate starting at
>> > drivers/ata/ata_piix.c:1354
>> >
>> > All the continuations are emitted with pr_cont.
>> >
>> > I think this output should still be coalesced without
>> > timestamp deltas.  Perhaps the timestamping code can
>> > still be reworked to avoid too small a delta producing
>> > a new timestamp and another dmesg line.
>>
>> Hmm, I don't see that.
>>
>> If I do:
>>   pr_info("[");
>>   for (i = 0; i < 4; i++)
>>pr_cont("%i ", i);
>>   pr_cont("]\n");
>>
>> I get:
>>   6,173,0;[0 1 2 3 ]
>>
>> And if I fill the cont buffer and forcefully hold the console sem
>> during all that, and we can't merge anymore, I get:
>>   6,167,0;[
>>   4,168,0;0
>>   4,169,0;1
>>   4,170,0;2
>>   4,171,0;3
>>   4,172,0;]
>>
>> But the output is still all fine for both lines:
>>   [0.00] [0 1 2 3 ]
>>   [0.00] [0 1 2 3 ]
>>
>> What do I miss?
>
> In this case the initial line is dev_info not pr_info
> so there are the additional dict descriptors output to
> /dev/kmsg as well.
>
> Maybe that interferes with continuations.  Dunno.

Yes, it does. Annotated records dev_printk() must be reliable in the
data storage and for the consumers. We can not expose them to the racy
continuation printk()s. We need to be able to trust the data they
print and not possibly merge unrelated things into it.

If it's needed, we can try to set the flags accordingly, that they
*look* like a line in the classic byte-stream output, but the
interface in /dev/kmsg must not export them that way, because
continuation lines can never be trusted to be correct.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

pr_cat() + CATSTR(name, size)?

2012-07-11 Thread Kay Sievers

Hey Joe,

what do you think this?

It would make composing continuation lines at the caller side entirely
race-free, and it might fit into the usual pattern.

The more interesting thing, this would allow us to completely race-free
use the dev_printk() calls with continuation content, which we should
avoid otherwise for integrity reasons.

The patch below is just hacked it into the printk.c file to illustrate
the idea. It prints:
  [0.00] Kernel command line: root=/dev/sda2 ...
  [0.00] 12 34 56 78
  [0.00] PID hash table entries: 2048 (order: 2, 16384 bytes)

  5,96,0;Kernel command line: root=/dev/sda2 ...
  4,97,0;12 34 56 78
  6,98,0;PID hash table entries: 2048 (order: 2, 16384 bytes)

Thanks,
Kay


diff --git a/kernel/printk.c b/kernel/printk.c
index dba1821..1fd00b0 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -48,6 +48,32 @@
 #define CREATE_TRACE_POINTS
 #include 
 
+#define CATSTR(name, max)  \
+   char name[max]; \
+   size_t _##name_len = 0; \
+   size_t _##name_max = max;
+
+#define pr_cat(name, fmt, ...) \
+   _catstr(name, &_##name_len, _##name_max, fmt, ##__VA_ARGS__)
+
+ssize_t _catstr(char *s, size_t *len, size_t size, const char *fmt, ...)
+{
+   va_list args;
+   size_t l = *len;
+   size_t r;
+
+   va_start(args, fmt);
+   r = vsnprintf(s + l, size - l, fmt, args);
+   va_end(args);
+
+   if (r >= size - l)
+   return -EINVAL;
+
+   *len += r;
+   s[*len] = '\0';
+   return r;
+}
+
 /*
  * Architectures can override it:
  */
@@ -668,6 +694,12 @@ void __init setup_log_buf(int early)
char *new_log_buf;
int free;
 
+   CATSTR(line, 80);
+   pr_cat(line, "%i ", 12);
+   pr_cat(line, "%i ", 34);
+   pr_cat(line, "%i ", 56);
+   pr_warn("%s%i\n", line, 78);
+
if (!new_log_buf_len)
return;
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pr_cat() + CATSTR(name, size)?

2012-07-11 Thread Kay Sievers

On Wed, 2012-07-11 at 12:33 +0200, Kay Sievers wrote:
> Hey Joe,
> 
> what do you think of this?
> 
> It would make composing continuation lines at the caller side entirely
> race-free, and it might fit into the usual pattern.
> 
> The more interesting thing, this would allow us to completely race-free
> use the dev_printk() calls with continuation content, which we should
> avoid otherwise for integrity reasons.

Better version with better range checking and _INIT() to reset the
string for re-use. It prints:
  [0.00] Kernel command line: root=/dev/sda2 ...
  [0.00] 1:-12 -34 -56 -78 -90
  [0.00] 2:-12 -34 -56 --90
  [0.00] 3:-12 -34 --90
  [0.00] 4:+12 +34 +-90
  [0.00] 5:~12 ~34 ~-90
  [0.00] PID hash table entries: 2048 (order: 2, 16384 bytes)

Thanks,
Kay


diff --git a/kernel/printk.c b/kernel/printk.c
index dba1821..1490153 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -48,6 +48,39 @@
 #define CREATE_TRACE_POINTS
 #include 
 
+#define CATSTR_INIT(name)  \
+   name##_len = 0;
+
+#define CATSTR_DEFINE(name, max)   \
+   char name[max]; \
+   size_t name##_len = 0;  \
+   size_t name##_max = max;
+
+#define pr_cat(name, fmt, ...) \
+   _catstr(name, &name##_len, name##_max, fmt, ##__VA_ARGS__)
+
+ssize_t _catstr(char *s, size_t *len, size_t size, const char *fmt, ...)
+{
+   va_list args;
+   size_t r;
+
+   if (*len == size)
+   return -EINVAL;
+
+   va_start(args, fmt);
+   r = vsnprintf(s + *len, size - *len, fmt, args);
+   va_end(args);
+
+   if (r >= size - *len) {
+   *len = size;
+   return -EINVAL;
+   }
+
+   *len += r;
+   s[*len] = '\0';
+   return r;
+}
+
 /*
  * Architectures can override it:
  */
@@ -668,6 +701,47 @@ void __init setup_log_buf(int early)
char *new_log_buf;
int free;
 
+   CATSTR_DEFINE(line, 24)
+   CATSTR_DEFINE(line2, 16)
+   CATSTR_DEFINE(line3, 12)
+
+   pr_cat(line, "1:");
+   pr_cat(line, "-%i ", 12);
+   pr_cat(line, "-%i ", 34);
+   pr_cat(line, "-%i ", 56);
+   pr_cat(line, "-%i ", 78);
+   pr_warn("%s-%i\n", line, 90);
+
+   pr_cat(line2, "2:");
+   pr_cat(line2, "-%i ", 12);
+   pr_cat(line2, "-%i ", 34);
+   pr_cat(line2, "-%i ", 56);
+   pr_cat(line2, "-%i ", 78);
+   pr_warn("%s-%i\n", line2, 90);
+
+   pr_cat(line3, "3:");
+   pr_cat(line3, "-%i ", 12);
+   pr_cat(line3, "-%i ", 34);
+   pr_cat(line3, "-%i ", 56);
+   pr_cat(line3, "-%i ", 78);
+   pr_warn("%s-%i\n", line3, 90);
+
+   CATSTR_INIT(line3)
+   pr_cat(line3, "4:");
+   pr_cat(line3, "+%i ", 12);
+   pr_cat(line3, "+%i ", 34);
+   pr_cat(line3, "+%i ", 56);
+   pr_cat(line3, "+%i ", 78);
+   pr_warn("%s-%i\n", line3, 90);
+
+   CATSTR_INIT(line3)
+   pr_cat(line3, "5:");
+   pr_cat(line3, "~%i ", 12);
+   pr_cat(line3, "~%i ", 34);
+   pr_cat(line3, "~%i ", 56);
+   pr_cat(line3, "~%i ", 78);
+   pr_warn("%s-%i\n", line3, 90);
+
if (!new_log_buf_len)
return;
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pr_cat() + CATSTR(name, size)?

2012-07-11 Thread Kay Sievers

On Wed, Jul 11, 2012 at 5:01 PM, Joe Perches  wrote:

> On Wed, 2012-07-11 at 12:33 +0200, Kay Sievers wrote:
> Interesting idea, perhaps workable, but it has
> a few defects I can see.
>
> It works for most uses, but it doesn't work for
> when there are multiple function sites like
>
> void dump_info(struct foo *bar)
> {
> if (bar->baz)
> pr_cont("baz...");
> }
>
> ---
>
> pr_info("Some initiator: ")
> dump_info(&descriptor);

Yeah, it's just the common case, not for the "creative" ones. :)

> Another negative is that is uses a local stack
> variable for the entire line which increases
> stack pressure.

The thing is to avoid malloc(), and ~100 bytes are kind of OK for the
time we do the printk, I guess

> It also fails to immediately output after some
> defect unlike your change to output directly to
> console.

Which is really only that important for stuff that causes crashes
between the outputting fragments.

There are _many_ cases the console lock is held, and we don't print
stuff immediately out to the console, and we never ensured that in the
past. There was never a guarantee that stuff ended up on the console,
kmsg was always and needs to be a store+forward model.

> It would require all sites with continuation lines
> be modified.  Because it requires in-situ code
> modifications, I'd prefer a cookie based approach.

Well, it would be mostly for the dev_printk() stuff, which should
ideally never be merged with stuff that could go wrong.

> I think it's more flexible, allows the cookie to be
> passed into extending functions and doesn't demand
> (much) extra stack.
>
> Something like:
> https://lkml.org/lkml/2012/4/3/231
> https://lkml.org/lkml/2011/11/14/349

Hmm, how do we manage memory allocations here? We can get around that
somehow? It's something the common printk() must really avoid.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pr_cat() + CATSTR(name, size)?

2012-07-11 Thread Kay Sievers

On Wed, Jul 11, 2012 at 5:30 PM, Joe Perches  wrote:
> On Wed, 2012-07-11 at 17:14 +0200, Kay Sievers wrote:
>> On Wed, Jul 11, 2012 at 5:01 PM, Joe Perches  wrote:
> []
>> There are _many_ cases the console lock is held, and we don't print
>> stuff immediately out to the console, and we never ensured that in the
>> past. There was never a guarantee that stuff ended up on the console,
>> kmsg was always and needs to be a store+forward model.
>
> I'm less concerned with kmsg than you.
> I think it's more a nicety than anything.

Sure, just saying, that there is never a guarantee that stuff lands
directly on the console  at print time, we never had that. It just
usually gets there directly. There is trylock(console_sem), if we can
not get that, we always leave the data in the buffer and let someone
else push it with the next unlock, and that can be later. That
behaviour was always the case.

>> > It would require all sites with continuation lines
>> > be modified.  Because it requires in-situ code
>> > modifications, I'd prefer a cookie based approach.
>>
>> Well, it would be mostly for the dev_printk() stuff, which should
>> ideally never be merged with stuff that could go wrong.
>
> Perhaps that's ideal, but not practical.
> printk continuations are more prevalent.

We can't have everything. We can not merge the data at least, for
integrity reasons. We can only make it *look* like it belonged
together, like we do when we run into a race or the console is locked
and we can not merge continuation fragments into one record.

>> > I think it's more flexible, allows the cookie to be
>> > passed into extending functions and doesn't demand
>> > (much) extra stack.
>> >
>> > Something like:
>> > https://lkml.org/lkml/2012/4/3/231
>> > https://lkml.org/lkml/2011/11/14/349
>>
>> Hmm, how do we manage memory allocations here? We can get around that
>> somehow? It's something the common printk() must really avoid.
>
> Well, I think the malloc costs are pretty low
> and could devolve pretty easily when OOM.

We need to avoid allocating memory in situations where we want to
printk(), it's just not possible. That's why all the kmsg/printk can
not really do any plain malloc. All printk memory needs to be static,
on the stack or somehow pre-allocated.

> Anyway, interesting idea, keep at it, see what
> comes out of it.

Just depends on us, I guess. :)

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pr_cat() + CATSTR(name, size)?

2012-07-11 Thread Kay Sievers

On Wed, Jul 11, 2012 at 6:55 PM, Joe Perches  wrote:
> On Wed, 2012-07-11 at 17:48 +0200, Kay Sievers wrote:
>> On Wed, Jul 11, 2012 at 5:30 PM, Joe Perches  wrote:
>> > Well, I think the malloc costs are pretty low
>> > and could devolve pretty easily when OOM.
>>
>> We need to avoid allocating memory in situations where we want to
>> printk(), it's just not possible.
>
> "it's just not possible???"  Kay, them's fightin' words. :)

Nah, I meant it. :) It limits the usefulness of these functions. We
can not safely allocate memory, or do not get any memory in some
situations where we want to use printk(). Hey, it might be used to say
printk("out of memory\n").

>> That's why all the kmsg/printk can
>> not really do any plain malloc. All printk memory needs to be static,
>> on the stack or somehow pre-allocated.
>
> Maybe, I was planning to play with it after
> refactoring printk in the next couple releases.

Sounds good.

>> > Anyway, interesting idea, keep at it, see what
>> > comes out of it.
>>
>> Just depends on us, I guess. :)

> If your solution is just for the dev_ messages
> (ie: with vprintk_emit descriptors), then it's not
> too ugly.

Yeah, I thought only about these. But there might be more users where
it makes sense to do that in a more reliable manner, don't know. It
was surely no meant to replace the remaining 99.9% of the other cont
users. :)

> Did you look at the remaining dev_ and printk
> continuations grep pattern?  There really aren't too
> many to fix up.

Yeah, it looks fine to fix these few.

> Maybe in 3.6.  None of them appear particularly urgent.

Right.

> One trivial style note:
>
> Maybe CATSTR could use a struct and a DECLARE_ macro?
>
> struct printk_continuation_buffer {
> size_t length;
> size_t pos;
> char buf[];
> }

Yeah, but then we lose the simplicity of passing the normal string
around, and we need accessor macros to get to the string when we pass
it around later. Maybe it's still OK, but it's surely not so intuitive
anymore.

> It's a pity gcc doesn't allow non-static declarations like:
>
> #define DECLARE_PRINTK_BUF(name, size)  \
> struct printk_continuation_buffer name = {  \
> .length = size; \
> .pos = 0;   \
> .buf[size] = {0};   \
> }

Yeah, when the size changes, we have different type of struct. So we
can not name them all "printk_continuation_buffer", every different
size would conflict with each other.

Also  = {0} on an array forces a memset() of the entire array, nothing
wrong, but it's not really needed.

> So maybe a DECLARE/DESTROY thing could work
> with the appropriate malloc/free.

Hmm, I really don't think we can teach the people, or expect them to
know, that these printk() functions are fragile if used in some
critical code paths. It would at least need the GFP flags and in many
cases GFP_ATOMIC which can easily fail, and we would also need to do
error checking then, and printk() should just never fail, because it
is used to tell that something went wrong. We have the entire kmsg
buffer pre-allocated at bootup for that reason.

I think the only really sane option here is to use the (usually
~50-100 bytes) stack. Or did you have another idea here which I
missed?

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pr_cat() + CATSTR(name, size)?

2012-07-12 Thread Kay Sievers


On Wed, Jul 11, 2012 at 7:51 PM, Joe Perches  wrote:
> On Wed, 2012-07-11 at 19:25 +0200, Kay Sievers wrote:

>> > If your solution is just for the dev_ messages
>> > (ie: with vprintk_emit descriptors), then it's not
>> > too ugly.
>>
>> Yeah, I thought only about these. But there might be more users where
>> it makes sense to do that in a more reliable manner, don't know. It
>> was surely no meant to replace the remaining 99.9% of the other cont
>> users. :)
>
> I believe your current reassembly code only works
> on a maximum of 2 interleaved threads.  Did that change?

No, two threads doing continuation printk at the same time, will still
race against each other, and we can not reconstruct full and correct
lines later. But it's much less likely, because multiple different
continuation prints at the same time are very rare.

>> Yeah, when the size changes, we have different type of struct. So we
>> can not name them all "printk_continuation_buffer", every different
>> size would conflict with each other.
>
> It doesn't work so it doesn't matter no?

Right, it can't work.

>> Hmm, I really don't think we can teach the people, or expect them to
>> know, that these printk() functions are fragile if used in some
>> critical code paths.
>
> Vigilance. (and maybe a checkpatch test :).
> There just aren't many critical code paths.

>> printk() should just never fail, because it
>> is used to tell that something went wrong. We have the entire kmsg
>> buffer pre-allocated at bootup for that reason.
>
> I still think devolving to direct printks when OOM works
> as a fallback just fine.

I don't think we should ever try to call malloc(), it would get us into
too much trouble.

I just played a bit more with the pr_cat() idea. We can completely race
free, and limited to pretty-looking line lengths, put out large lists of
items.

It would usually just use an 80 char buffer for the line on the
stack.

[0.00] Kernel command line: root=/dev/sda2 ...
[0.00] 2:-12 -34 -56 -90
[0.00] 3:-12 -34 -90
[0.00] 4:+12 +34 -90
[0.00] 5:~12 ~34 -90
[0.00] foo: #0 #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 
#17
[0.00] foo+: #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31 #32
[0.00] foo+: #33 #34 #35 #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47
[0.00] foo+: #48 #49
[0.00] PID hash table entries: 2048 (order: 2, 16384 bytes)

diff --git a/kernel/printk.c b/kernel/printk.c
index 177fa49..0f4df08 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -48,6 +48,40 @@
 #define CREATE_TRACE_POINTS
 #include 
 
+#define CATSTR_INIT(name)  \
+   name##_len = 0
+
+#define CATSTR_DEFINE(name, max)   \
+   char name[max]; \
+   size_t name##_len = 0;  \
+   size_t name##_max = max
+
+#define pr_cat(name, fmt, ...) \
+   _pr_cat(name, &name##_len, name##_max, fmt, ##__VA_ARGS__)
+
+bool _pr_cat(char *s, size_t *len, size_t size, const char *fmt, ...)
+{
+   va_list args;
+   size_t r;
+
+   if (*len == size)
+   return false;
+
+   va_start(args, fmt);
+   r = vsnprintf(s + *len, size - *len, fmt, args);
+   va_end(args);
+
+   if (r + 1 >= size - *len) {
+   s[*len] = '\0';
+   *len = size;
+   return false;
+   }
+
+   *len += r;
+   s[*len] = '\0';
+   return true;
+}
+
 /*
  * Architectures can override it:
  */
@@ -587,6 +621,11 @@ static int devkmsg_open(struct inode *inode, struct file 
*file)
struct devkmsg_user *user;
int err;
 
+   console_lock();
+   print_modules();
+   print_modules();
+   console_unlock();
+
/* write-only does not need any file context */
if ((file->f_flags & O_ACCMODE) == O_WRONLY)
return 0;
@@ -671,6 +710,59 @@ void __init setup_log_buf(int early)
unsigned long flags;
char *new_log_buf;
int free;
+   int i;
+
+   CATSTR_DEFINE(line, 64);
+   CATSTR_DEFINE(line2, 16);
+   CATSTR_DEFINE(line3, 12);
+
+   pr_cat(line, "1:");
+   pr_cat(line, "-%i ", 12);
+   pr_cat(line, "-%i ", 34);
+   pr_cat(line, "-%i ", 56);
+   pr_cat(line, "-%i ", 78);
+   pr_info("%s-%i\n", line, 90);
+
+   pr_cat(line2, "2:");
+   pr_cat(line2, "-%i ", 12);
+   pr_cat(line2, "-%i ", 34);
+   pr_cat(line2, "-%i ", 56);
+   pr_cat(line2, "-%i ", 78);
+   pr_info("%s-%i\n", line2, 90);
+
+   pr_cat(line3, "3:");
+   pr_cat(line3, "-%i ", 12);
+

Re: 3.5-rc6 printk formatting problem during oom-kill.

2012-07-12 Thread Kay Sievers

On Thu, Jul 12, 2012 at 2:54 AM, Dave Jones  wrote:
> On Mon, Jul 09, 2012 at 08:48:51PM +0200, Kay Sievers wrote:

>  > It looks fine here with the above mentioned patch:
>
> Now that that patch is in Linus tree, I've hit what's probably a different 
> case.
> Look at the modules list in this oops..
>
> [10016.460020] BUG: soft lockup - CPU#1 stuck for 22s! [trinity-child1:24295]
> [10016.470008]  rose<4>[10016.470008]  ip_set_bitmap_ipmac<4>[10016.470008]

> Also, I have no idea how the hell the 'Modules linked in:' line (9th line) 
> ended up being printed /after/ the
> module listing began (2nd line).

It's the output of 'dmesg' you pasted, right?

I tried to force all sorts of racy print_modules() calls, and kept
your trinity tool from git for hours, it looks all fine here:

[   54.664301] NET: Registered protocol family 24
[   54.684721] Netfilter messages via NETLINK v0.30.
[   63.103847] [ cut here ]
[   63.103857] WARNING: at kernel/futex.c:2452 sys_get_robust_list+0x28e/0x2a0()
[   63.103859] Hardware name: 4174NEG
[   63.103861] deprecated: get_robust_list will be deleted in 2013.
[   63.103862] Modules linked in: binfmt_misc nfnetlink pppoe pppox
ppp_generic slhc atm fuse arc4 i915 hid_generic i2c_algo_bit
drm_kms_helper drm snd_usb_audio snd_usbmidi_lib snd_rawmidi
snd_seq_device bnep bluetooth snd_hda_codec_hdmi
snd_hda_codec_conexant uvcvideo videobuf2_vmalloc videobuf2_memops
videobuf2_core videodev media coretemp kvm_intel kvm crc32c_intel
microcode i2c_i801 i2c_core iwlwifi snd_hda_intel snd_hda_codec
mac80211 snd_hwdep lpc_ich snd_pcm mfd_core sdhci_pci sdhci mmc_core
snd_page_alloc cfg80211 snd_timer thinkpad_acpi snd e1000e soundcore
rfkill video cdc_ncm wmi usbnet mii cdc_wdm cdc_acm
[   63.103921] Pid: 837, comm: trinity-child1 Not tainted
3.5.0-0.rc6.git2.1.fc18.x86_64 #1
[   63.103923] Call Trace:
[   63.103930]  [] warn_slowpath_common+0x7f/0xc0
[   63.103934]  [] warn_slowpath_fmt+0x46/0x50
[   63.103938]  [] ? trace_hardirqs_on_caller+0x10d/0x1a0
[   63.103941]  [] sys_get_robust_list+0x28e/0x2a0
[   63.103946]  [] system_call_fastpath+0x16/0x1b
[   63.103948] ---[ end trace 796993e76a8397be ]---
[   63.126701] NET: Registered protocol family 23
[   63.397208] trinity-child3 (839): Using mlock ulimits for
SHM_HUGETLB is deprecated

Can you easily reproduce the issue you pasted?  If, could you give me
the /dev/kmsg output?

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.5-rc6 printk formatting problem during oom-kill.

2012-07-12 Thread Kay Sievers

On Thu, Jul 12, 2012 at 4:05 PM, Dave Jones  wrote:
> On Thu, Jul 12, 2012 at 03:52:17PM +0200, Kay Sievers wrote:
>  > On Thu, Jul 12, 2012 at 2:54 AM, Dave Jones  wrote:
>  > > On Mon, Jul 09, 2012 at 08:48:51PM +0200, Kay Sievers wrote:
>  >
>  > >  > It looks fine here with the above mentioned patch:
>  > >
>  > > Now that that patch is in Linus tree, I've hit what's probably a 
> different case.
>  > > Look at the modules list in this oops..
>  > >
>  > > [10016.460020] BUG: soft lockup - CPU#1 stuck for 22s! 
> [trinity-child1:24295]
>  > > [10016.470008]  rose<4>[10016.470008]  
> ip_set_bitmap_ipmac<4>[10016.470008]
>  >
>  > > Also, I have no idea how the hell the 'Modules linked in:' line (9th 
> line) ended up being printed /after/ the
>  > > module listing began (2nd line).

They do not belong together. The second line is just another call to
the same print_modules() done from:
arch/x86/kernel/dumpstack_64.c :: show_regs()

While we already called print_modules() a few cycles earlier from:
   kernel/watchdog :: watchdog_timer_fn()

>  > I tried to force all sorts of racy print_modules() calls, and kept
>  > your trinity tool from git for hours, it looks all fine here:
>  >
>  > Can you easily reproduce the issue you pasted?  If, could you give me
>  > the /dev/kmsg output?
>
> I've seen it a few times, always with the soft lockup trace.
>
> You might be able to trigger it using scripts/load-all-modules.sh
> from trinity.git. (Assuming you have a lot of modules built, I'm
> still trying to track down which one seems to be responsible).

Hmm, it does not trigger your pattern. I tried adding an rmmod in that
loop, but that crashes after a few seconds. Some modules are just not
meant to be removed. :)

I forced the watchdog to trigger by setting the timeout to 1s, but all
looks still fine:

[   20.854010] BUG: soft lockup - CPU#0 stuck for 1000s! [trinity:247]
[   20.854010] Modules linked in: nfnetlink xfrm_user xfrm_algo pppoe
pppox ppp_generic slhc atm bluetooth rfkill microcode cirrus ttm
sr_mod cdrom pcspkr drm_kms_helper drm syscopyarea sysfillrect
sysimgblt floppy evdev ipv6
[   20.854010] irq event stamp: 980768
[   20.854010] hardirqs last  enabled at (980767):
[] __slab_alloc.constprop.65+0x3c9/0x408
[   20.854010] hardirqs last disabled at (980768):
[] apic_timer_interrupt+0x6a/0x80
[   20.854010] softirqs last  enabled at (978462):
[] __do_softirq+0x11f/0x170
[   20.854010] softirqs last disabled at (978457):
[] call_softirq+0x1c/0x30
[   20.854010] CPU 0
[   20.854010] Modules linked in: nfnetlink xfrm_user xfrm_algo pppoe
pppox ppp_generic slhc atm bluetooth rfkill microcode cirrus ttm
sr_mod cdrom pcspkr drm_kms_helper drm syscopyarea sysfillrect
sysimgblt floppy evdev ipv6

Could it possibly be that we get some sort of corruption somewhere
else while running trinity and load modules?

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.5-rc6 printk formatting problem during oom-kill.

2012-07-12 Thread Kay Sievers

On Thu, Jul 12, 2012 at 7:11 PM, Linus Torvalds
 wrote:
> On Thu, Jul 12, 2012 at 7:05 AM, Dave Jones  wrote:
>>
>> I've seen it a few times, always with the soft lockup trace.
>
> I bet it's because you have tons of modules, and the line ends up
> being *really* long. And overflows LOG_LINE_MAX. I suspect something
> odd happens.

Straight to the point. That's the issue, combined with the later
safety range checks, we produce somehow "ugly" output.

We already flush the line out of the buffer when we can not add stuff
anymore. The line length is then close to LOG_LINE_MAX, and we want to
add a prefix with the timestamp during output and we reach the limit
of LOG_LINE_MAX.

> There are tons of odd special cases for LOG_LINE_MAX, and I bet Kay
> doesn't see it for the simple reason that he's not totally insane, and
> hasn't loaded hundreds of modules.

Yeah, I just added a loop now that prints over-long continuation
lines, and I can see parts of the effect Dave sees.

> Kay, I suspect the "continuation line" logic could easily have a rule like
>
>   "If the old line is already > 80 characters, do a line break here
> and add TAB to the beginning of the new line"
>
> In fact, that could be really nice for things like stack dumps etc -
> we wouldn't have to worry about line breaks and crap, if the printk
> logic just makes "KERN_CONT" do a line break automatically if it
> doesn't fit on the screen.
>
> People who use KERN_CONT don't do it because they *need* things to be
> on one line (it's not guaranteed anyway), they do it because they want
> the output to be dense and readable. Doing auto-line-break would
> actually *help*. And would mean that you never hit the odd
> LOG_LINE_MAX cases just because somebody is printing lots of modules.

That sounds good. I'll look into that.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.5-rc6 printk formatting problem during oom-kill.

2012-07-12 Thread Kay Sievers

On Thu, 2012-07-12 at 20:25 +0200, Kay Sievers wrote:
> On Thu, Jul 12, 2012 at 7:11 PM, Linus Torvalds
>  wrote:
> > On Thu, Jul 12, 2012 at 7:05 AM, Dave Jones  wrote:
> >>
> >> I've seen it a few times, always with the soft lockup trace.
> >
> > I bet it's because you have tons of modules, and the line ends up
> > being *really* long. And overflows LOG_LINE_MAX. I suspect something
> > odd happens.
> 
> Straight to the point. That's the issue, combined with the later
> safety range checks, we produce somehow "ugly" output.
> 
> We already flush the line out of the buffer when we can not add stuff
> anymore. The line length is then close to LOG_LINE_MAX, and we want to
> add a prefix with the timestamp during output and we reach the limit
> of LOG_LINE_MAX.
> 
> > There are tons of odd special cases for LOG_LINE_MAX, and I bet Kay
> > doesn't see it for the simple reason that he's not totally insane, and
> > hasn't loaded hundreds of modules.
> 
> Yeah, I just added a loop now that prints over-long continuation
> lines, and I can see parts of the effect Dave sees.
> 
> > Kay, I suspect the "continuation line" logic could easily have a rule like
> >
> >   "If the old line is already > 80 characters, do a line break here
> > and add TAB to the beginning of the new line"

This seems to fix the issue for me, which Dave has posted. I was able to
partly reproduce it by printing continuation lines larger than 1000
bytes. The overlong lines did not leave enough room for the later syslog
and timestamp prefixing.

I'll look into the automatic split of lines later, it is not as trivial
as this fix.

While we are at the over-long log line topic, Harald just pointed me to
this unrelated mail, sent an hour ago. :)

  "The "Modules linked in:" lines say only ,
  because the lines become too long and cause netconsole to eat most of
  the message."

  https://lkml.org/lkml/2012/7/12/484


Thanks,
Kay


From: Kay Sievers 
Subject: kmsg - properly print over-long continuation lines

Reserve PREFIX_MAX bytes in the LOG_LINE_MAX line when buffering a
continuation line, to be able to properly prefix the LOG_LINE_MAX
line with the syslog prefix and timestamp when printing it.

Reported-By: Dave Jones 
Signed-off-by: Kay Sievers 
---

 kernel/printk.c |   33 +++--
 1 file changed, 19 insertions(+), 14 deletions(-)

--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -235,7 +235,8 @@ static u32 log_next_idx;
 static u64 clear_seq;
 static u32 clear_idx;
 
-#define LOG_LINE_MAX 1024
+#define PREFIX_MAX 32
+#define LOG_LINE_MAX   1024 - PREFIX_MAX
 
 /* record buffer */
 #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
@@ -876,7 +877,7 @@ static size_t msg_print_text(const struc
 
if (buf) {
if (print_prefix(msg, syslog, NULL) +
-   text_len + 1>= size - len)
+   text_len + 1 >= size - len)
break;
 
if (prefix)
@@ -907,7 +908,7 @@ static int syslog_print(char __user *buf
struct log *msg;
int len = 0;
 
-   text = kmalloc(LOG_LINE_MAX, GFP_KERNEL);
+   text = kmalloc(LOG_LINE_MAX + PREFIX_MAX, GFP_KERNEL);
if (!text)
return -ENOMEM;
 
@@ -930,7 +931,8 @@ static int syslog_print(char __user *buf
 
skip = syslog_partial;
msg = log_from_idx(syslog_idx);
-   n = msg_print_text(msg, syslog_prev, true, text, LOG_LINE_MAX);
+   n = msg_print_text(msg, syslog_prev, true, text,
+  LOG_LINE_MAX + PREFIX_MAX);
if (n - syslog_partial <= size) {
/* message fits into buffer, move forward */
syslog_idx = log_next(syslog_idx);
@@ -969,7 +971,7 @@ static int syslog_print_all(char __user
char *text;
int len = 0;
 
-   text = kmalloc(LOG_LINE_MAX, GFP_KERNEL);
+   text = kmalloc(LOG_LINE_MAX + PREFIX_MAX, GFP_KERNEL);
if (!text)
return -ENOMEM;
 
@@ -1022,7 +1024,8 @@ static int syslog_print_all(char __user
struct log *msg = log_from_idx(idx);
int textlen;
 
-   textlen = msg_print_text(msg, prev, true, text, 
LOG_LINE_MAX);
+   textlen = msg_print_text(msg, prev, true, text,
+LOG_LINE_MAX + PREFIX_MAX);
if (textlen < 0) {
len = textlen;
break;
@@ -1367,15 +1370,15 @@ static struct cont {
bool flushed:1; /* buffer sealed and committed */

Re: [PATCH v2] devtmpfs: mount with noexec and nosuid

2012-11-20 Thread Kay Sievers

On Tue, Nov 20, 2012 at 9:42 PM, Kees Cook  wrote:
> Since devtmpfs is writable, make the default noexec,nosuid as well. This
> protects from the case of a privileged process having an arbitrary file
> write flaw and an argumentless arbitrary execution (i.e. it would lack
> the ability to run "mount -o remount,exec,suid /dev").

This only really applies to systems without an initramfs when the
kernel mounts /dev over the rootfs it has mounted; with an initramfs,
/dev is always mounted by user code.

Just checking, that is the use case you are doing that for?

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] driver-core: Remove dummy 'platform_bus'

2012-11-22 Thread Kay Sievers

On Wed, Nov 21, 2012 at 3:52 PM, Greg Kroah-Hartman
 wrote:
> On Wed, Nov 21, 2012 at 02:44:31PM +, Grant Likely wrote:
>> The "platform_bus" (note: not platform_bus_type) only exists as an empty
>> directory to put platform devices into. However, it really doesn't make
>> sense to segregate all the platform devices into a sub directory when
>> typically they are memory mapped devices that doen't go through any
>> particular bus. Particularly on embedded type platforms the platform_bus
>> directory doesn't add anything.
>>
>> However, this will probably just end up breaking some userspace that
>> depends on the /sys/devices/platform/ path to be present (no matter how
>> much we protest that userspace must not depend on paths in sysfs). So
>> while I'm seriously proposing this change, it may just be unacceptable
>> ABI breakage
>
> If the devices don't show up under platform/ where are they going to be
> at now, virtual/ ?  That doesn't sound like a good plan, they should be
> somewhere "useful".

Just a note to keep in mind: We usually need and want devices to have
a bus or class. Devices without a "subsystem" are invisible to udev,
and do not get proper coldplug support at bootup.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: /proc/kmsg giving eof on blocking read

2012-11-22 Thread Kay Sievers

On Thu, Nov 22, 2012 at 11:44 AM, Jan Schmidt  wrote:
> I'm currently debugging something in btrfs in good old printk style, 
> generating
> around 10MB/min. I'm seeing /proc/kmsg returning eof on a blocking read (and,
> side note, syslog-ng won't reopen it, effectively stopping logging kernel
> messages silently).

Are you sure there is not something else that opens the same file?
Even once might be enough to return 0. The too simple locking logic in
/proc/kmsg cannot support multiple readers properly, it never did.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] driver-core: Remove dummy 'platform_bus'

2012-11-23 Thread Kay Sievers

On Thu, Nov 22, 2012 at 10:20 PM, Grant Likely
 wrote:
> On Thu, Nov 22, 2012 at 7:17 PM, Kay Sievers  wrote:
>> On Wed, Nov 21, 2012 at 3:52 PM, Greg Kroah-Hartman 
>>  wrote:

>>> If the devices don't show up under platform/ where are they going to be
>>> at now, virtual/ ?  That doesn't sound like a good plan, they should be
>>> somewhere "useful".
>>
>> Just a note to keep in mind: We usually need and want devices to have
>> a bus or class. Devices without a "subsystem" are invisible to udev,
>> and do not get proper coldplug support at bootup.
>
> Note: this patch is only about the "platform_bus" dummy device. It has
> nothing to do with platform_bus_type.

Ah, I see now.

Why do you want to remove the "platform_bus" fake-parent, entirely? I
understand and agree that drivers should not fiddle with that
directly. But I don't see a real reason why it should not be private
to the platform_bus_type. It's nothing really wrong with it, I guess.

I have on x86:
  coretemp.0 -> ../../../devices/platform/coretemp.0
  dock.0 -> ../../../devices/platform/dock.0
  dock.1 -> ../../../devices/platform/dock.1
  efifb.0 -> ../../../devices/platform/efifb.0
  i8042 -> ../../../devices/platform/i8042
  iTCO_wdt -> ../../../devices/pci:00/:00:1f.0/iTCO_wdt
  pcspkr -> ../../../devices/platform/pcspkr
  serial8250 -> ../../../devices/platform/serial8250
  thinkpad_acpi -> ../../../devices/platform/thinkpad_acpi
  thinkpad_hwmon -> ../../../devices/platform/thinkpad_hwmon

which look pretty reasonable in a "platform parent" instead of ending
up in virtual/.

We should probably remove the explicit assignment in the drivers like
your patch does, unexport "platform_bus" from the core, so nobody can
use it anymore in the future. The /sys/bus/platform/devices/ directory
can then be created on-demand only, with the first registration of a
device without a parent, we do that in other parts of the core
already.

Wouldn't that solve your problem and still do not touch any other
stuff visible in /sys?

Also, it seems there are devices without any subsystem in the list of
things your patch touches? That is really not how things should work.
All devices should have a bus or class, so userspace receives proper
events, and can enumerate them.
The /sys/devices/ hierarchy is not really meant to be used directly,
it can for some devices even change at runtime. It's not considered a
stable or reasonable predictable ABI, all lookups should start in the
flat device symlink lists of the class/ and bus/, and not in the
hierarchical tree in devices/.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] printk: drop ambiguous LOG_CONT flag

2012-09-27 Thread Kay Sievers

On Thu, Sep 27, 2012 at 12:33 AM, "Jan H. Schönherr"
 wrote:
> Am 26.09.2012 23:15, schrieb Greg Kroah-Hartman:
>> On Wed, Sep 26, 2012 at 07:58:45PM +0200, Jan H. Schönherr wrote:
>>> Against v3.6-rc7, only lightly tested.
>>
>> Well, against linux-next and highly tested would be best.  It's a bit
>> late to get this into linux-next for 3.7, how important is it really?
>
> There are no conflicting commits in linux-next, so it should apply there
> as well.
>
> "Tested" as in: it fixes my use case: multiple printk()s shortly after each
> other -- with KERN_prefix but without a newline at the end. Those were
> sometimes concatenated since that printk-rewrite.

Please provide the output of /dev/kmsg so we can see what's going on here.

> All other printk()s that I come across more often look as usual, before and
> after the patch. (Mostly singular printk()s, but I also checked the output
> from the oom-killer.)
>
> There is no need to include this hastily -- at least not from my point of view
> -- as it is already broken in 3.5 and nobody else seems to notice it
> (... and I have now a fix for my development printk()s). Should I resend the
> patch later?
>
> I was also hoping that Kay might share his opinion, as the LOG_CONT
> flag is rather young, and he might have some different plans for it.

It is a flag that we have not been able to merge a continuation line
in the buffer, because we had a race with another thread, or the
console lock was taken for a long time and we couldn't use the merge
buffer.

LOG_CONT is used to merge (not intended to be) separate records at
time we read them from the record buffer, and print them, it is also
exported as a flag in /dev/kmsg.

I don't think we can just remove it, we can not get that information
from the prefix+newlines, they are not sufficient.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] printk: drop ambiguous LOG_CONT flag

2012-09-27 Thread Kay Sievers

On Thu, Sep 27, 2012 at 5:46 PM, "Jan H. Schönherr"
 wrote:
> Am 27.09.2012 15:39, schrieb Kay Sievers:

>> It is a flag that we have not been able to merge a continuation line
>> in the buffer, because we had a race with another thread, or the
>> console lock was taken for a long time and we couldn't use the merge
>> buffer.
>
> But it is also set, if we don't know yet, whether there is going to
> be a continuation (printk.c, line 1583):
>
> log_store(facility, level, lflags | LOG_CONT, 0,
>   dict, dictlen, text, text_len);
>
> This confuses devkmsg_read() and msg_print_text() later on.

Yeah, I can see that here too. Tested your patch and seems to behave
well with a bunch of other tests I still have available. Looks good
and worth to try:
  Tested-By: Kay Sievers 

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] printk: drop ambiguous LOG_CONT flag

2012-09-28 Thread Kay Sievers

On Fri, Sep 28, 2012 at 10:25 AM, Jan H. Schönherr
 wrote:

>> If really, really everything passes through vprintk_emit()
>> then we could keep all info about the previous message
>> there and definitely decide whether the current message continues
>> the previous one.
>>
>> Then, we wouldn't need to track the previous flags everywhere.
>
> Here is a patch that does just that.
>
> Seems to work. And it makes the code easier to understand.
> A more detailed description is below.

> -   if (!(lflags & LOG_NEWLINE)) {
> -   /*
> -* Flush the conflicting buffer. An earlier newline was 
> missing,
> -* or another task also prints continuation lines.
> -*/
> -   if (cont.len && (lflags & LOG_PREFIX || cont.owner != 
> current))
> -   cont_flush(LOG_NEWLINE);
> +   cont_add(facility, level, lflags, dict, dictlen, text, text_len);

That fails the racing task test, and a cont user that was nicely
merged before is now all in separate records.

It seems, unconditionally using the cont buffer like in your patch,
for all incoming messages just makes the entire cont merge buffer
dance useless when it comes to races.

The current behaviour has the advantage, that non-cont users will not
race against a cont user (which is like 99.x% of the races I expect).
The cont buffer is currently only used when we expect a cont user,
non-cont users happening in the middle of a cont-print will not flush
the and disturb the cont buffer.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] printk: drop ambiguous LOG_CONT flag

2012-09-28 Thread Kay Sievers

On Fri, Sep 28, 2012 at 4:49 PM, "Jan H. Schönherr"
 wrote:
> Am 28.09.2012 16:34, schrieb Kay Sievers:
>> On Fri, Sep 28, 2012 at 10:25 AM, Jan H. Schönherr

>> The current behaviour has the advantage, that non-cont users will not
>> race against a cont user (which is like 99.x% of the races I expect).
>> The cont buffer is currently only used when we expect a cont user,
>> non-cont users happening in the middle of a cont-print will not flush
>> the and disturb the cont buffer.
>
> That should be fixable by using a second set of flags, owner, and so on
> within vprintk... I still think, that getting rid of of remotely tracking
> the flags is worth something.

Sure, sounds fine in theory.

> (Ideally, we should also be able to correctly reassemble multiple
> simultaneous cont users. But that it still a bit out of scope I think.)

Yeah, that's messy, and not really worth it, I guess. We enter per-cpu
data and in_interrupt() territory with that, which is not really worth
the complexity for cont printing. The one separated cont buffer works
pretty well already.

> Given that I'm able to fix the racing case, would you be in favor of
> this approach, or should we stick to the earlier version?

I'm open to everything that makes sense. Let's see how it looks and we
can decide when we have something that passes the tests.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3] printk: Have printk() never buffer its data

2012-07-06 Thread Kay Sievers

On Fri, Jul 6, 2012 at 12:46 PM, Kay Sievers  wrote:
> On Fri, Jul 6, 2012 at 5:47 AM, Michael Neuling  wrote:
>
>>> 4,89,24561;NIP: c0048164 LR: c0048160 CTR: 
>>> 4,90,24576;REGS: c0007e59fb50 TRAP: 0700   Tainted: GW 
>>> (3.5.0-rc4-mikey)
>>> 4,91,24583;MSR: 90021032
>>> 4,92,24586;<
>>> 4,93,24591;SF
>>> 4,94,24596;,HV
>>> 4,95,24601;,ME
>>> 4,96,24606;,IR
>>> 4,97,24611;,DR
>>> 4,98,24616;,RI
>>> 4,99,24619;>
>>> 4,100,24628;  CR: 2842  XER: 2200
>>
>> FWIW, compiling with the parent commit gives this:
>>
>> 4,89,1712;NIP: c0048164 LR: c0048160 CTR: 
>> 4,90,1713;REGS: c0007e59fb50 TRAP: 0700   Tainted: GW 
>> (3.5.0-rc4-mikey)
>> 4,91,1716;MSR: 90021032   CR: 2282  XER: 
>> 0200
>
> Hmm, I don't understand, which parent commit do you mean? You maybe
> mean without 084681d?
>
> I think it's a race of the two CPUs printing continuation lines, and
> the continuation buffer is still occupied with data from one CPU and
> not available to the other one at the same time.
>
> What you see is likely not the direct output to the console (that
> would work) but the replay of the stored buffer when the console is
> registered. Because the cont buffer was still busy with one CPU, the
> other thread needs to store the continuation line prints in individual
> records, which leads to the (unwanted) printed newlines when
> replaying.
>
> The data we store looks all fine, it just looks needlessly separated
> when we replay fromt he buffer on a newly registered boot console. We
> need to merge the lines in the output, so they *look* like they are
> all in one line. I'll work on a fix for that now.

It could be that the console semaphore is still help by the other CPU,
for whatever reason, when your box runs into this situation.

Mind pasting more context (/dev/kmsg) of the log when this happens,
not only the one line that get split-up?

Is this possibly during an oops or backtrace going on when you see
this? Which code calls show_regs() here?

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Regression - /proc/kmsg does not (always) block for 1-byte reads

2012-07-07 Thread Kay Sievers

On Fri, Jul 6, 2012 at 10:30 PM, Linus Torvalds
 wrote:
> Kay, this needs to be fixed.
>
> Suggested fix: just use the 'seq_printf()' interfaces, which do the
> proper buffering, and allow any size reads of various packetized data.

I'll have a look.

> Of course, I'd also suggest that whoever was the genius who thought it
> was a good idea to read things ONE F*CKING BYTE AT A TIME with system
> calls for each byte should be retroactively aborted. Who the f*ck does
> idiotic things like that? How did they noty die as babies, considering
> that they were likely too stupid to find a tit to suck on?

Maybe the bs=1 in the dd call stands for bullshit. :)

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Regression - /proc/kmsg does not (always) block for 1-byte reads

2012-07-07 Thread Kay Sievers

On Sat, Jul 7, 2012 at 12:05 AM, Jukka Ollila  wrote:
> And I did a little digging. According to the Debian package tracking
> system[1] it would seem that the _stable_ distro carries a version
> that doesn't do the dd shuffling at all and probably runs its klogd as
> root, reading /proc/kmsg directly. That may or may not work with
> 3.5-rc kernels, depending on how big its reads are. I'm CCing the
> listed maintainer just in case.
>
> The unstable version does the problematic dd bs=1 trick. Also the
> Ubuntu diff in the PTS has the dd. But I have no idea how Ubuntu does
> it's release management. Not to mention other derivatives.

Just a note, unrelated to fixing the issue:

/proc/kmsg can be opened many times, but it never behaved too well
when this was done.

The processes might wake up at the same time, but only one of them
gets the data. If someone does a "cat /proc/kmsg' in parallel, the
read() in dd might return 0 and dd will exit. And this is not a recent
change, it was always the case.

Using dd here is a pretty silly idea for many reasons.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Regression - /proc/kmsg does not (always) block for 1-byte reads

2012-07-07 Thread Kay Sievers

On Sat, 2012-07-07 at 23:19 +0200, Kay Sievers wrote:
> On Fri, Jul 6, 2012 at 10:30 PM, Linus Torvalds
>  wrote:
> > Kay, this needs to be fixed.
> >
> > Suggested fix: just use the 'seq_printf()' interfaces, which do the
> > proper buffering, and allow any size reads of various packetized data.
> 
> I'll have a look.

Hmm, we need to block in the read() when we have no data, and we need to
support concurrent readers where only one of them sees the data, and we
need O_NONBLOCK support. Maybe I miss something but the seq_file stuff
seems to get complicated, as it takes a mutex internally which gets in
the way of the O_NONBLOCK stuff.

Here is what seems to work for me. If the buffer is to small to fit the
first record, we deliver a partial record, and start from that offset
again with the next read().

I'll need to do more testing tomorrow.

Thanks,
Kay

---
 kernel/printk.c |   28 
 1 file changed, 20 insertions(+), 8 deletions(-)

--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -217,6 +217,7 @@ static DEFINE_RAW_SPINLOCK(logbuf_lock);
 /* the next printk record to read by syslog(READ) or /proc/kmsg */
 static u64 syslog_seq;
 static u32 syslog_idx;
+static size_t syslog_partial;
 
 /* index and sequence number of the first record stored in the buffer */
 static u64 log_first_seq;
@@ -890,22 +891,33 @@ static int syslog_print(char __user *buf
 
while (size > 0) {
size_t n;
+   size_t skip;
 
raw_spin_lock_irq(&logbuf_lock);
if (syslog_seq < log_first_seq) {
/* messages are gone, move to first one */
syslog_seq = log_first_seq;
syslog_idx = log_first_idx;
+   syslog_partial = 0;
}
if (syslog_seq == log_next_seq) {
raw_spin_unlock_irq(&logbuf_lock);
break;
}
+
+   skip = syslog_partial;
msg = log_from_idx(syslog_idx);
n = msg_print_text(msg, true, text, LOG_LINE_MAX);
-   if (n <= size) {
+   if (n - syslog_partial <= size) {
+   /* message fits into buffer, move forward */
syslog_idx = log_next(syslog_idx);
syslog_seq++;
+   n -= syslog_partial;
+   syslog_partial = 0;
+   } else if (!len){
+   /* partial read(), remember position */
+   n = size;
+   syslog_partial += n;
} else
n = 0;
raw_spin_unlock_irq(&logbuf_lock);
@@ -913,17 +925,15 @@ static int syslog_print(char __user *buf
if (!n)
break;
 
-   len += n;
-   size -= n;
-   buf += n;
-   n = copy_to_user(buf - n, text, n);
-
-   if (n) {
-   len -= n;
+   if (copy_to_user(buf, text + skip, n)) {
if (!len)
len = -EFAULT;
break;
}
+
+   len += n;
+   size -= n;
+   buf += n;
}
 
kfree(text);
@@ -1107,6 +1117,7 @@ int do_syslog(int type, char __user *buf
/* messages are gone, move to first one */
syslog_seq = log_first_seq;
syslog_idx = log_first_idx;
+   syslog_partial = 0;
}
if (from_file) {
/*
@@ -1129,6 +1140,7 @@ int do_syslog(int type, char __user *buf
idx = log_next(idx);
seq++;
}
+   error -= syslog_partial;
}
raw_spin_unlock_irq(&logbuf_lock);
break;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Bug 44211 - /proc/kmsg does not (always) block for 1-byte reads

2012-07-08 Thread Kay Sievers

On Fri, Jul 6, 2012 at 7:55 PM, Greg KH  wrote:
> On Fri, Jul 06, 2012 at 08:45:44PM +0300, Jukka Ollila wrote:
>> Hello,
>>
>> A few days ago I filed a kernel regression report concerning a change
>> in /proc/kmsg behaviour with short reads:
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=44211
>>
>> The comments suggest that this is probably intentional, but that it
>> would be best make sure that the current semantics wrt short reads are
>> as intended.
>>
>> The problem appears on a Debian (unstable) system that drains
>> /proc/kmsg into a separate fifo read by klogd(8):
>>
>> /bin/dd bs=1 if=/proc/kmsg  of=/var/run/klogd/kmsg
>>
>> With the recent kernel logging changes this /bin/dd exits immediately,
>> as 1-byte reads are shorter than any log message could possibly be and
>> read() returns 0. No dd feeding the fifo results in no logging and a
>> rather unhappy klogd on the reading end of /var/run/klogd/kmsg.
>>
>> I suppose a safe solution is to only do reads that are big enough for
>> any single kernel message, but this is still a change that affects
>> user space being shipped, so some might find it surprising.
>>
>> I don't know what other distros do. Is it just Debian being the odd one out?
>
> I think we just fixed this, what kernel version are you seeing this
> problem on?
>
> Kay did your other patches that I just accepted resolve this?

No, it's not fixed. We avoided the delivery of partial messages to
userspace, which obviously does not got too well with 1-byte reads. :)

I posted a patch in the other thread:
  https://lkml.org/lkml/2012/7/7/118

Side note:
The patch will not fix the underlying problem, but just make it behave
more like it was and allow partial message reads. This is a years old
problem, the net is full of bugreports of stuff going wrong with
running dd bs=1 on /proc/kmsg. It is a really stupid idea, and can not
work for many other reasons too. The interface can not safely be used
that way, it does not have the usual semantics, it always returned 0
for read() whenever it needed to.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Bug 44211 - /proc/kmsg does not (always) block for 1-byte reads

2012-07-08 Thread Kay Sievers

On Sun, Jul 8, 2012 at 2:59 PM, Alan Cox  wrote:
>> The patch will not fix the underlying problem, but just make it behave
>> more like it was and allow partial message reads. This is a years old
>> problem, the net is full of bugreports of stuff going wrong with
>> running dd bs=1 on /proc/kmsg. It is a really stupid idea, and can not
>> work for many other reasons too. The interface can not safely be used
>> that way, it does not have the usual semantics, it always returned 0
>> for read() whenever it needed to.
>
> If you are breaking the semantics perhaps that should also get fixed ?

I hopefully just restored the old semantics now.

Fixing it properly would be a bigger code change, and it can't use the
far-too-simple tunneling through the syslog() syscall to feed
/proc/kmsg.

If the seq_file interface could be used, that would probably be the
best option, but I have no good idea how to make blocking reads, and
concurrent non-blocks work with the seq_file stuff.

This is how read() in /proc/kmsg works and it is not protected by a
lock or anything, and there is a not too small window between the
check and the action. Things just go wrong if there is more than a
single reader, but that was the case since forever.

static ssize_t kmsg_read(struct file *file, char __user *buf,
 size_t count, loff_t *ppos)
{
if ((file->f_flags & O_NONBLOCK) &&
!do_syslog(SYSLOG_ACTION_SIZE_UNREAD, NULL, 0, SYSLOG_FROM_FILE))
return -EAGAIN;
return do_syslog(SYSLOG_ACTION_READ, buf, count, SYSLOG_FROM_FILE);
}

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3] printk: Have printk() never buffer its data

2012-07-08 Thread Kay Sievers

On Sat, 2012-07-07 at 07:04 +1000, Michael Neuling wrote:
> Whole kmsg below.

I guess I have an idea now what's going on.

> 4,47,0;WARNING: at 
> /scratch/mikey/src/linux-ozlabs/arch/powerpc/sysdev/xics/xics-common.c:105
> 4,51,0;MSR: 90021032   CR: 2442  XER: 2200
> 4,54,0;TASK = c0b2dd80[0] 'swapper/0' THREAD: c0c24000 CPU: 0

This is the warning on CPU#1, all fine, all in one line.

> 6,74,0;console [tty0] enabled
> 6,75,0;console [hvc0] enabled

Now the boot consoles are registered, which replays the whole buffer
that was collected up to this point. During the entire time the console
semaphore needs to be held, and this can be quite a while.

> 4,87,24545;WARNING: at 
> /scratch/mikey/src/linux-ozlabs/arch/powerpc/sysdev/xics/xics-common.c:105
> \4,91,24586;MSR: 90021032 
> 4,92,24590;<
> 4,93,24594;SF
> 4,94,24599;,HV
> 4,95,24604;,ME
> 4,96,24609;,IR
> 4,97,24614;,DR
> 4,98,24619;,RI
> 4,99,24623;>
> 4,104,24661; CPU: 1

At the same time the CPU#2 prints the same warning with a continuation
line, but the buffer from CPU#1 can not be flushed to the console, nor
can the continuation line printk()s from CPU#2 be merged at this point.
The consoles are still locked and busy with replaying the old log
messages, so the new continuation data is just stored away in the record
buffer as it is coming in.
If the console would be registered a bit earlier, or the warning would
happen a bit later, we would probably not see any of this.

I can fake something like this just by holding the console semaphore
over a longer time and printing continuation lines with different CPUs
in a row.

The patch below seems to work for me. It is also here:

http://git.kernel.org/?p=linux/kernel/git/kay/patches.git;a=blob;f=kmsg-merge-cont.patch;hb=HEAD

It only applies cleanly on top of this patch:

http://git.kernel.org/?p=linux/kernel/git/kay/patches.git;a=blob;f=kmsg-syslog-1-byte-read.patch;hb=HEAD

Thanks,
Kay

Subject: kmsg: merge continuation records while printing

In (the unlikely) case our continuation merge buffer is busy, we unfortunately
can not merge further continuation printk()s into a single record and have to
store them separately, which leads to split-up output of these lines when they
are printed.

Add some flags about newlines and prefix existence to these records and try to
reconstruct the full line again, when the separated records are printed.
---
 kernel/printk.c |  119 
 1 file changed, 77 insertions(+), 42 deletions(-)

--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -194,8 +194,10 @@ static int console_may_schedule;
  */

 enum log_flags {
-   LOG_DEFAULT = 0,
-   LOG_NOCONS = 1, /* already flushed, do not print to console */
+   LOG_NOCONS  = 1,/* already flushed, do not print to console */
+   LOG_NEWLINE = 2,/* text ended with a newline */
+   LOG_PREFIX  = 4,/* text started with a prefix */
+   LOG_CONT= 8,/* text is a fragment of a continuation line */
 };

 struct log {
@@ -217,6 +219,7 @@ static DEFINE_RAW_SPINLOCK(logbuf_lock);
 /* the next printk record to read by syslog(READ) or /proc/kmsg */
 static u64 syslog_seq;
 static u32 syslog_idx;
+static enum log_flags syslog_prev;
 static size_t syslog_partial;

 /* index and sequence number of the first record stored in the buffer */
@@ -839,8 +842,8 @@ static size_t print_prefix(const struct
return len;
 }

-static size_t msg_print_text(const struct log *msg, bool syslog,
-char *buf, size_t size)
+static size_t msg_print_text(const struct log *msg, enum log_flags prev,
+bool syslog, char *buf, size_t size)
 {
const char *text = log_text(msg);
size_t text_size = msg->text_len;
@@ -849,6 +852,8 @@ static size_t msg_print_text(const struc
do {
const char *next = memchr(text, '\n', text_size);
size_t text_len;
+   bool prefix = true;
+   bool newline = true;

if (next) {
text_len = next - text;
@@ -858,19 +863,35 @@ static size_t msg_print_text(const struc
text_len = text_size;
}

+   if ((prev & LOG_CONT) && !(msg->flags & LOG_PREFIX))
+   prefix = false;
+
+   if (msg->flags & LOG_CONT) {
+   if ((prev & LOG_CONT) && !(prev & LOG_NEWLINE))
+   prefix = false;
+
+   if (!(msg->flags & LOG_NEWLINE))
+   newline = false;
+   }
+
if (buf) {
if (print_prefix(msg, syslog, NULL) +
text_len + 1>= size - len)
break;

-   len += print_prefix(msg, syslog, buf + len);
+   if (prefix)

1 2 3 4 5 >

1 - 100 of 428 matches

Mail list logo