Folks, after mulling this in my head for quite some time, I'm going to postpone the whole thing for 3.20.
That said, I need to say, that I'm really happy with the outcome of this massive overhaul. I really want to thank all involved people, especially Jiang, for their great work and help so far!!! The hierarchical irq domains really improve the code by distangling the various subsystems and the arm[64] use cases just prove that it was the right decision. We're almost there with x86 but my gut feeling tells me that pushing it now is too risky. I rather prefer quiet holidays for all of us than the nagging fear that the post holiday inbox will be full of obscure bug reports and we then start a chase and bandaid race which will kill the well earned recreation in an instant. This will block other things in that area for a while, but it's the only sane decision at the moment, unless Linus insists on pulling the lot and promises to deal with the fallout. :) The reasons why I decided to do so are: - The bugs we found in the last week. That tells me that there is some more stuff lurking. - The already existing mess in a some areas which got unearthed by this work in the last week. That definitely needs a thorough cleanup and not some more bandaids. - Lack of proper debugging features. Sending out per issue debug patches simply does not scale. - It's not bisectable and unfortunately there are too many fixes to various places to make manual bisection feasible. For 3.20 I want to proceed in the following way: - Apply all bug fixes to x86/apic - Address the issues with the resource management (and elsewhere) proper on top - Add a proper debugging mechanism (the existing irqdomain debugfs interface is completely useless). For the hierarchical domains we really want two things: 1) A debugfs interface which lets us introspect the hierarchy. I was working on that before I got dragged into bug chasing and merge window frenzy. For proper introspection down to the hardware level this requires either domain/irq_chip specific callbacks or some unified way to track the current state. The latter is painful as it requires to store information redundantly. So having domain/chip callbacks to retrieve the state is the right solution. Most chip/domain implementations cache their [hardware] state already, so providing an accessor to convert that into a common data format is the best way. If the callback is not implemented then the information is not available or maybe not relevant. I'm not going to have a per domain/chip seqfile print function as this is just a complete waste. Pretty printing obscure hardware information does not help much for the general user. We rather have the raw data and proper post processing tools which can provide that pretty print information than bloating the kernel binary with randomized and possibly useless seq_print functions. Another reason why I want just raw binary data is that I want to use exactly the same mechanism for tracing. See below. After looking at the various new domain/chip implementations its sufficient to have 16 bytes of storage space for this, but that's a minor detail. To provide a proper translation into pretty printed values we can do the following: Create a new section for storing such data and have a data structure there which describes the content of the buffer. That section goes into a seperate file and not linked into the kernel binary. Simple enough for tools to pick up and for bug reporters to use/provide. If the stupid file is not available we still can recreate it from source and translate the hex dump. And in the most cases the pure hexdump will be sufficient for the people who need actually to look at this. 2) Proper trace point support so we can actually track allocation and the hardware access at the various domain levels because some of these issues cannot be decoded by looking at a state snapshot in debugfs. With some of them we even can't access debugfs at all. Though one issue with that is, that for the early boot process there is no way to store that information as the tracer gets enabled way after init_IRQ(). But there is no reason why the tracer could not be enabled before that. All it needs is a working memory allocator. Steven? Now there is another class of problems which might be hard to debug. When the machine just boots into a hang, so we dont get a ftrace output neither from an oops nor from a console. It would be nice if we could have a command line option which prints enabled trace points via (early_)printk. That would avoid sending out ad hoc printk debug patches which will basically provide the same information as the trace_points. That would be useful for other hard to debug boot hangs as well. Steven? I think the above can be solved, so we need to agree on a proper set of tracepoints. I came up with the following list: - trace_irqdomain_create(domain->id, domain->name, ...) - trace_irqdomain_destroy(domain->id) - trace_irqdomain_alloc(irq_data) struct irq_data contains all relevant information for assigning the tracepoint data. __entry->virq = irq_data->virq; __entry->domainid = irq_data->domain; __entry->hwirq = irq_data->hwirq; TP_STORE_DATA(__entry->data, irq_data); Where TP_STORE_DATA checks for the above callback and uses it if available, otherwise we just clear the data field. So this reuses the callback which we want for debugfs anyway. The print format is just hexdump. See my above rationale for that. - trace_irqdomain_free(virq, domain->id) - trace_irqdomain_hw_access(irqdata) Same "data" and pretty printing argument as for trace_irqdomain_alloc() The obvious place to put such a trace point is e.g. irq_chip_write_msi_msg() where the callback records the currently written msi msg. Once we have sorted that, I'll push x86/apic into a seperate git repository so the history is preserved. After that I'll redo x86/apic from scratch with proper ordering and all fixes folded to the right places so the whole thing becomes bisectable. Thoughts? Thanks, Thomas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/