Hi all, This patch set adds context information to kernel messages from /dev/kmsg.
Two printk messages connected with KERN_CONT can be divided in multiple lines by a different process context message. This induces two problems. a) The owner problem If the different context message seems like the 1st divided message, it is difficult to understand which the 2nd divided message belongs to. This problem can also occur for the situation where multiple message lines without KERN_CONT are broken into by similar messages. When user tools handle printk messages, those mixed messages will create some inconveniences. - Example of the owner problem [110781.736171] sd 2:0:0:0: [sdb] [110781.736170] sd 3:0:0:0: [sdc] Unhandled sense code [110781.736172] sd 3:0:0:0: [sdc] [110781.736175] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [110781.736177] sd 3:0:0:0: [sdc] [110781.736178] Sense Key : Medium Error [current] [110781.736187] Sense Key : Recovered Error b) The reconstruction problem Even if we solve the owner problem, we don't know the consecutive lines owned by a particular context are whether divided messages with KERN_CONT or simply separate lines from the beginning. - Example of the reconstruction problem [110781.736198] Read(10): 28 00 ...[snip]... [110781.736204] 00 [110781.736205] 00 11 e0 00 01 00 00 Which ones should be merged into on line? You may say the right answer, but it's difficult to automate. The expected result is as follows: [110781.736198] Read(10): 28 00 00 00 11 e0 00 01 00 00 We can consider three solutions for these problems: 1. Store continuous messages in a local buffer, and execute printk This does not induce fragment of messages. There are some related work about error messages in a SCSI layer. http://marc.info/?l=linux-scsi&m=135003083808290&w=2 http://comments.gmane.org/gmane.linux.scsi/88768 This approach will take time for acceptance activities because continuous printk is used in a lot of places. Moreover, this approach cannot solve the fragment message problem fundamentally, so kernel developers must always take care about continuous printk. Therefore, this solution takes a maintenance cost. 2. Add context information to all kernel messages This solves a) the owner problem because user tools can separate each owner message from mixed messages by the context information. If multiple message lines without KERN_CONT are broken into by similar messages, this is a good solution. However, this does not indicate which messages with KERN_CONT are fragmented, so this does not solve b) the reconstruction problem for those. 3. Apply 2. and add fragment flag to all fragmented kernel messages This solves a) and b) problems for multiple messages connected with KERN_CONT. The fragmented flag shows which messages with KERN_CONT are fragmented, so user tools can reconstruct to its original state by it after separating each owner message from mixed messages by the context information. This solves the fragment problem fundamentally by merging those fragment messages on userland, so this solution seems to be a right way. We choose this solution. Following description makes mention of only fragmented messages with KERN_CONT. This is because if the problems for fragmented messages with KERN_CONT are solved, the owner problem for the mixed messages without KERN_CONT is also solved. Fragment is induced by different process context messages, so we want to add process context information to each message in order to connect fragmented messages. CPU number is not sufficient because it can change while calling a series of printks in the preemptive kernel. On the other hand, we don't want to change results of dmesg and syslog messages. How do we tackle this problem? The key idea is to change the header of /dev/kmsg. According to the documentation of /dev/kmsg, we can add comma separated values before ';' as the header information of /dev/kmsg. So, we can change the header information based on the rule. This patch set solves this problem by following two approaches: 1. Add process context information (PID/interrupt) Fragment can occur when the process context is different from the previous context stored in the cont buffer. By applying this 1st patch, the context information is indicated to connect fragment messages. Moreover, if the message is output in an interrupt context, the information is also output as s(softIRQ)/h(hardIRQ)/n(NMI). 2. Introduce an exact fragment flag Current kernel tries to add 'c' flag to the 1st fragmented line and '+' to the following fragmented lines. However, these flags are just a hint; it doesn't guarantee that they are always correct because the kernel can't know the right answer without context information. Current kernel adds fragment flags by guess work, but a user tool can connect fragmented lines exactly by utilizing context information. So, current fragment flag is not appropriate after applying this patch, and we introduced new fragment flag 'f' instead of 'c' and '+'. To understand how to connect fragmented messages by changing the header of /dev/kmsg, we assume that each two kernel threads executes the following 3 printk(): pr_info("A1"); printk("A2"); printk("A3\n"); In current implementation, these threads can output as follows: 6,3321,31629088,-;A1 <== Who output this message? 6,3322,31629091,c;A1 | 4,3323,31629094,+;A2 <----+ 4,3324,31629095,+;A3 | Which? 4,3325,31629097,c;A2 <----+ 4,3326,31629099,+;A3 A1, A2, and A3 messages of two threads are clearly fragmented in this result, but we cannot understand who output the 1st line message and which messages were output by whom. So, the relation of divided messages is unclear from this test. Moreover, although the 4th fields of the header indicate fragment flags, the flag in the 1st line shows not 'c' (fragment) but '-' (no fragment). By applying this patch set, we can get a following result from /dev/kmsg: 6,7868,2070756113,f,1185/-;A1 6,7869,2070756116,f,1186/-;A1 4,7870,2070756118,f,1186/-;A2 4,7871,2070756119,-,1186/-;A3 4,7872,2070756120,f,1185/-;A2 4,7873,2070756122,-,1185/-;A3 1st line indicates fragmented line, PID=1185, and no interrupt context. 2nd line indicates fragmented line, PID=1186, and no interrupt context. So, 1st line will connect to 6th line and 7th line by fragment flags and PID. On the other hand, 2nd line will connect to 3rd line and 4th line. A message merging tool can convert these messages as follows: 6,7868,2070756113,-,1185/-;A1A2A3 /* thread1 */ 6,7869,2070756116,-,1186/-;A1A2A3 /* thread2 */ Thanks! --- Yoshihiro YUNOMAE (2): printk: Add context information to the header of /dev/kmsg printk: Simplify fragmented line information of the header of /dev/kmsg Documentation/ABI/testing/dev-kmsg | 34 ++++++++---- kernel/printk/printk.c | 104 +++++++++++++++++++++++++----------- 2 files changed, 95 insertions(+), 43 deletions(-) -- Yoshihiro YUNOMAE Software Platform Research Dept. Linux Technology Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: yoshihiro.yunomae...@hitachi.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/