Re: sctp: hang in sctp_remaddr_seq_show
On Fri, Mar 15, 2013 at 12:34:00PM -0400, Sasha Levin wrote: > Hi all, > > While fuzzing with trinity inside a KVM tools guest running latest -next > kernel, > I've stumbled on an interesting hang related to sctp. > > After some fuzzing, I get a hang message about procfs not able to grab a hold > of a mutex for one of the files: > > [ 375.900309] INFO: task trinity-child21:7178 blocked for more than 120 > seconds. > [ 375.901767] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 375.903022] trinity-child21 D 88009b9f74a8 5336 7178 7121 > 0x0004 > [ 375.904211] 88009ba79cb8 0002 8800abf48690 > 8800abf48690 > [ 375.905972] 880075308000 88009c798000 88009ba79cb8 > 001d7140 > [ 375.907263] 88009c798000 88009ba79fd8 001d7140 > 001d7140 > [ 375.908987] Call Trace: > [ 375.909415] [] __schedule+0x2e9/0x3b0 > [ 375.910795] [] schedule+0x55/0x60 > [ 375.911611] [] schedule_preempt_disabled+0x13/0x20 > [ 375.912644] [] __mutex_lock_common+0x36d/0x5a0 > [ 375.913986] [] ? seq_read+0x3a/0x3d0 > [ 375.914838] [] ? seq_read+0x3a/0x3d0 > [ 375.916187] [] ? seq_lseek+0x110/0x110 > [ 375.917075] [] mutex_lock_nested+0x3f/0x50 > [ 375.918005] [] seq_read+0x3a/0x3d0 > [ 375.919124] [] ? delay_tsc+0xdd/0x110 > [ 375.920916] [] ? seq_lseek+0x110/0x110 > [ 375.921794] [] ? seq_lseek+0x110/0x110 > [ 375.922966] [] proc_reg_read+0x201/0x230 > [ 375.923870] [] ? proc_reg_write+0x230/0x230 > [ 375.924820] [] vfs_read+0xb5/0x180 > [ 375.925668] [] SyS_read+0x50/0xa0 > [ 375.926476] [] tracesys+0xe1/0xe6 > [ 375.940223] 1 lock held by trinity-child21/7178: > [ 375.940985] #0: (&p->lock){+.+.+.}, at: [] > seq_read+0x3a/0x3d > > Digging deeper, there's another thread that's stuck holding that lock: > > [ 381.880804] trinity-child97 R running task 4856 9490 7121 > 0x0004 > [ 381.882064] 880084cad708 0002 8800bbbd71f8 > 8800bbbd71f8 > [ 381.883341] 8800b949 88009a5e8000 880084cad708 > 001d7140 > [ 381.884652] 88009a5e8000 880084cadfd8 001d7140 > 001d7140 > [ 381.885977] Call Trace: > [ 381.886392] [] __schedule+0x2e9/0x3b0 > [ 381.887238] [] preempt_schedule+0x44/0x70 > [ 381.888175] [] ? sctp_remaddr_seq_show+0x2da/0x2f0 > [ 381.889314] [] local_bh_enable+0x128/0x140 > [ 381.890292] [] sctp_remaddr_seq_show+0x2da/0x2f0 > [ 381.891397] [] ? sctp_remaddr_seq_show+0x27/0x2f0 > [ 381.892448] [] ? seq_read+0x3a/0x3d0 > [ 381.893308] [] traverse+0xe0/0x1f0 > [ 381.894135] [] ? seq_lseek+0x110/0x110 > [ 381.895100] [] ? seq_lseek+0x110/0x110 > [ 381.896074] [] seq_read+0x5c/0x3d0 > [ 381.896912] [] ? delay_tsc+0xdd/0x110 > [ 381.897796] [] ? seq_lseek+0x110/0x110 > [ 381.898734] [] ? seq_lseek+0x110/0x110 > [ 381.899629] [] proc_reg_read+0x201/0x230 > [ 381.900592] [] ? proc_reg_write+0x230/0x230 > [ 381.901614] [] ? proc_reg_write+0x230/0x230 > [ 381.902543] [] do_loop_readv_writev+0x4b/0x90 > [ 381.903480] [] do_readv_writev+0xf6/0x1d0 > [ 381.904456] [] ? kvm_clock_read+0x38/0x70 > [ 381.905460] [] vfs_readv+0x3e/0x60 > [ 381.906306] [] default_file_splice_read+0x1e1/0x320 > [ 381.907365] [] ? deactivate_slab+0x7d6/0x820 > [ 381.908412] [] ? deactivate_slab+0x7d6/0x820 > [ 381.909404] [] ? __lock_release+0xf1/0x110 > [ 381.910435] [] ? deactivate_slab+0x7d6/0x820 > [ 381.911483] [] ? do_raw_spin_unlock+0xc8/0xe0 > [ 381.912478] [] ? _raw_spin_unlock+0x30/0x60 > [ 381.913438] [] ? deactivate_slab+0x7d6/0x820 > [ 381.914478] [] ? alloc_pipe_info+0x3e/0xa0 > [ 381.915504] [] ? alloc_pipe_info+0x3e/0xa0 > [ 381.916461] [] ? alloc_pipe_info+0x3e/0xa0 > [ 381.917418] [] ? __slab_alloc.isra.34+0x2ed/0x31f > [ 381.918533] [] ? trace_hardirqs_on_caller+0x168/0x1a0 > [ 381.919637] [] ? debug_check_no_locks_freed+0xf5/0x130 > [ 381.920799] [] ? page_cache_pipe_buf_release+0x20/0x20 > [ 381.921963] [] do_splice_to+0x83/0xb0 > [ 381.922849] [] splice_direct_to_actor+0xce/0x1c0 > [ 381.923874] [] ? do_splice_from+0xb0/0xb0 > [ 381.924855] [] do_splice_direct+0x48/0x60 > [ 381.925865] [] do_sendfile+0x165/0x310 > [ 381.926763] [] ? trace_hardirqs_on+0xd/0x10 > [ 381.927722] [] SyS_sendfile64+0x8a/0xc0 > [ 381.928702] [] ? tracesys+0x7e/0xe6 > [ 381.929556] [] tracesys+0xe1/0xe6 > > It has gone to sleep while holding the proc mutex we're trying to acquire and > never woke up! > > Looking at the code, we see: > > rcu_read_unlock(); > read_unlock(&head->lock); > sctp_local_bh_enable(); <--- here > > It hangs on local_bh_enable(). > > It seems that local_bh_enable() calls preempt_check_resched() which never > gets woken up. Why? I have no clue. > > It's also pretty reproducible with sctp. > I'm not sure why the process would never get back to the schedule, but looking at the sct
Re: sctp: hang in sctp_remaddr_seq_show
On Mon, Mar 18, 2013 at 11:31:06AM -0400, Vlad Yasevich wrote: > On 03/18/2013 11:25 AM, Eric Dumazet wrote: > >On Mon, 2013-03-18 at 07:04 -0400, Neil Horman wrote: > > > >>I'm not sure why the process would never get back to the schedule, but > >>looking > >>at the sctp_remaddr_seq_show function, I think that we should convert this > >>sequence: > >>sctp_local_bh_disable(); > >>read_lock(&head->lock); > >>rcu_read_lock(); > >> > >>to this: > >>read_lock(&head->lock); > >>rcu_read_lock_bh(); > >> > >>Neil > > > >I dont think so. > > > >BH needs to be disabled before read_lock(&head->lock); > > > >or else, write_lock() could deadlock (assuming it can be called from BH) > > > > > > If anything, this should probably be done like this: > > rcu_read_lock(); > read_lock_bh(&head->lock) > ... > > read_unlock_bh(&head->lock) > rcu_read_unlock(); > Vlads, right. We need to grab the rcu lock before the read lock, but we should probably use the rcu_read_lock_bh variant, since we're going to disable bottom halves anyway. Neil > -vlad > -- > To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [char] random: fix priming of last_data
On Tue, Mar 19, 2013 at 12:18:09PM -0400, Jarod Wilson wrote: > Commit ec8f02da9e added priming of last_data per fips requirements. > Unfortuantely, it did so in a way that can lead to multiple threads all > incrementing nbytes, but only one actually doing anything with the extra > data, which leads to some fun random corruption and panics. > > The fix is to simply do everything needed to prime last_data in a single > shot, so there's no window for multiple cpus to increment nbytes -- in > fact, we won't even increment or decrement nbytes anymore, we'll just > extract the needed EXTRACT_BYTES one time per pool and then carry on with > the normal routine. > > All these changes have been tested across multiple hosts and architectures > where panics were previously encoutered. The code changes are are strictly > limited to areas only touched when when booted in fips mode. > > This change should also go into 3.8-stable, to make the myriads of fips > users on 3.8.x happy. > > Tested-by: Jan Stancek > Tested-by: Jan Stodola > CC: Herbert Xu > CC: Neil Horman > CC: "David S. Miller" > CC: Matt Mackall > CC: "Theodore Ts'o" > CC: linux-cry...@vger.kernel.org > CC: sta...@vger.kernel.org > Signed-off-by: Jarod Wilson > --- > drivers/char/random.c | 30 +++--- > 1 files changed, 15 insertions(+), 15 deletions(-) > > diff --git a/drivers/char/random.c b/drivers/char/random.c > index 85e81ec..15e5a2b 100644 > --- a/drivers/char/random.c > +++ b/drivers/char/random.c > @@ -953,10 +953,23 @@ static ssize_t extract_entropy(struct entropy_store *r, > void *buf, > { > ssize_t ret = 0, i; > __u8 tmp[EXTRACT_SIZE]; > + unsigned long flags; > > /* if last_data isn't primed, we need EXTRACT_SIZE extra bytes */ > - if (fips_enabled && !r->last_data_init) > - nbytes += EXTRACT_SIZE; > + if (fips_enabled) { > + spin_lock_irqsave(&r->lock, flags); > + if (!r->last_data_init) { > + r->last_data_init = true; > + spin_unlock_irqrestore(&r->lock, flags); > + trace_extract_entropy(r->name, EXTRACT_SIZE, > + r->entropy_count, _RET_IP_); > + xfer_secondary_pool(r, EXTRACT_SIZE); > + extract_buf(r, tmp); > + spin_lock_irqsave(&r->lock, flags); > + memcpy(r->last_data, tmp, EXTRACT_SIZE); > + } > + spin_unlock_irqrestore(&r->lock, flags); > + } > > trace_extract_entropy(r->name, nbytes, r->entropy_count, _RET_IP_); > xfer_secondary_pool(r, nbytes); > @@ -966,19 +979,6 @@ static ssize_t extract_entropy(struct entropy_store *r, > void *buf, > extract_buf(r, tmp); > > if (fips_enabled) { > - unsigned long flags; > - > - > - /* prime last_data value if need be, per fips 140-2 */ > - if (!r->last_data_init) { > - spin_lock_irqsave(&r->lock, flags); > - memcpy(r->last_data, tmp, EXTRACT_SIZE); > - r->last_data_init = true; > - nbytes -= EXTRACT_SIZE; > - spin_unlock_irqrestore(&r->lock, flags); > - extract_buf(r, tmp); > - } > - > spin_lock_irqsave(&r->lock, flags); > if (!memcmp(tmp, r->last_data, EXTRACT_SIZE)) > panic("Hardware RNG duplicated output!\n"); > -- > 1.7.1 > > Acked-by: Neil Horman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] net: remove redundant ifdef CONFIG_CGROUPS
On Thu, Mar 21, 2013 at 10:54:51AM +0800, Li Zefan wrote: > The cgroup code has been surrounded by ifdef CONFIG_NET_CLS_CGROUP > and CONFIG_NETPRIO_CGROUP. > > Signed-off-by: Li Zefan > --- > net/core/sock.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/net/core/sock.c b/net/core/sock.c > index b261a79..a19e728 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -1298,7 +1298,6 @@ static void sk_prot_free(struct proto *prot, struct > sock *sk) > module_put(owner); > } > > -#ifdef CONFIG_CGROUPS > #if IS_ENABLED(CONFIG_NET_CLS_CGROUP) > void sock_update_classid(struct sock *sk, struct task_struct *task) > { > @@ -1321,7 +1320,6 @@ void sock_update_netprioidx(struct sock *sk, struct > task_struct *task) > } > EXPORT_SYMBOL_GPL(sock_update_netprioidx); > #endif > -#endif > > /** > * sk_alloc - All socket objects are allocated here > -- > 1.8.0.2 > > Acked-by: Neil Horman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 60/62] sctp: convert to idr_alloc()
On Sat, Feb 02, 2013 at 05:21:01PM -0800, Tejun Heo wrote: > Convert to the much saner new idr interface. > > Only compile tested. > > Signed-off-by: Tejun Heo > Cc: Vlad Yasevich > Cc: Sridhar Samudrala > Cc: Neil Horman > Cc: linux-s...@vger.kernel.org > --- > This patch depends on an earlier idr changes and I think it would be > best to route these together through -mm. Please holler if there's > any objection. Thanks. > Makes sense to me Acked-by: Neil Horman > net/sctp/associola.c | 27 +++ > 1 file changed, 11 insertions(+), 16 deletions(-) > > diff --git a/net/sctp/associola.c b/net/sctp/associola.c > index b45ed1f..a9962e4 100644 > --- a/net/sctp/associola.c > +++ b/net/sctp/associola.c > @@ -1592,32 +1592,27 @@ int sctp_assoc_lookup_laddr(struct sctp_association > *asoc, > /* Set an association id for a given association */ > int sctp_assoc_set_id(struct sctp_association *asoc, gfp_t gfp) > { > - int assoc_id; > - int error = 0; > + int ret; > > /* If the id is already assigned, keep it. */ > if (asoc->assoc_id) > - return error; > -retry: > - if (unlikely(!idr_pre_get(&sctp_assocs_id, gfp))) > - return -ENOMEM; > + return 0; > > + idr_preload(gfp); > spin_lock_bh(&sctp_assocs_id_lock); > - error = idr_get_new_above(&sctp_assocs_id, (void *)asoc, > - idr_low, &assoc_id); > - if (!error) { > - idr_low = assoc_id + 1; > + ret = idr_alloc(&sctp_assocs_id, asoc, idr_low, 0, GFP_NOWAIT); > + if (ret >= 0) { > + idr_low = ret + 1; > if (idr_low == INT_MAX) > idr_low = 1; > } > spin_unlock_bh(&sctp_assocs_id_lock); > - if (error == -EAGAIN) > - goto retry; > - else if (error) > - return error; > + idr_preload_end(); > + if (ret < 0) > + return ret; > > - asoc->assoc_id = (sctp_assoc_t) assoc_id; > - return error; > + asoc->assoc_id = (sctp_assoc_t)ret; > + return 0; > } > > /* Free the ASCONF queue */ > -- > 1.8.1 > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 60/62] sctp: convert to idr_alloc()
On Mon, Feb 04, 2013 at 08:42:38AM -0800, Tejun Heo wrote: > Convert to the much saner new idr interface. > > Only compile tested. > > v2: Don't preload if @gfp doesn't contain __GFP_WAIT as the function > may be being called from non-process ocntext. Also, add a comment > explaining @idr_low never becomes zero. > > Signed-off-by: Tejun Heo > Acked-by: Neil Horman > Cc: Vlad Yasevich > Cc: Sridhar Samudrala > Cc: linux-s...@vger.kernel.org > --- > net/sctp/associola.c | 31 +++ > 1 file changed, 15 insertions(+), 16 deletions(-) > I dont' think the documentation was really needed, but the interrupt context fix was a good catch Acked-by: Neil Horman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 76/77] sctp: convert to idr_alloc()
On Wed, Feb 06, 2013 at 11:40:48AM -0800, Tejun Heo wrote: > Convert to the much saner new idr interface. > > Only compile tested. > > v2: Don't preload if @gfp doesn't contain __GFP_WAIT as the function > may be being called from non-process ocntext. Also, add a comment > explaining @idr_low never becomes zero. > > Signed-off-by: Tejun Heo > Acked-by: Neil Horman > Cc: Vlad Yasevich > Cc: Sridhar Samudrala > Cc: linux-s...@vger.kernel.org > --- > net/sctp/associola.c | 31 +++ > 1 file changed, 15 insertions(+), 16 deletions(-) > > diff --git a/net/sctp/associola.c b/net/sctp/associola.c > index b45ed1f..0c9a791 100644 > --- a/net/sctp/associola.c > +++ b/net/sctp/associola.c > @@ -1592,32 +1592,31 @@ int sctp_assoc_lookup_laddr(struct sctp_association > *asoc, > /* Set an association id for a given association */ > int sctp_assoc_set_id(struct sctp_association *asoc, gfp_t gfp) > { > - int assoc_id; > - int error = 0; > + bool preload = gfp & __GFP_WAIT; > + int ret; > > /* If the id is already assigned, keep it. */ > if (asoc->assoc_id) > - return error; > -retry: > - if (unlikely(!idr_pre_get(&sctp_assocs_id, gfp))) > - return -ENOMEM; > + return 0; > > + if (preload) > + idr_preload(gfp); > spin_lock_bh(&sctp_assocs_id_lock); > - error = idr_get_new_above(&sctp_assocs_id, (void *)asoc, > - idr_low, &assoc_id); > - if (!error) { > - idr_low = assoc_id + 1; > + /* 0 is not a valid id, idr_low is always >= 1 */ > + ret = idr_alloc(&sctp_assocs_id, asoc, idr_low, 0, GFP_NOWAIT); > + if (ret >= 0) { > + idr_low = ret + 1; > if (idr_low == INT_MAX) > idr_low = 1; > } > spin_unlock_bh(&sctp_assocs_id_lock); > - if (error == -EAGAIN) > - goto retry; > - else if (error) > - return error; > + if (preload) > + idr_preload_end(); > + if (ret < 0) > + return ret; > > - asoc->assoc_id = (sctp_assoc_t) assoc_id; > - return error; > + asoc->assoc_id = (sctp_assoc_t)ret; > + return 0; > } > > /* Free the ASCONF queue */ > -- > 1.8.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Acked-by: Neil Horman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] irq: add quirk for broken interrupt remapping on 55XX chipsets
On Wed, Apr 03, 2013 at 05:53:27PM -0600, Bjorn Helgaas wrote: > [+cc David and iommu list, Yinghai, Jiang] > > On Mon, Mar 4, 2013 at 12:04 PM, Neil Horman wrote: > > A few years back intel published a spec update: > > http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf > > > > For the 5520 and 5500 chipsets which contained an errata (specificially > > errata > > 53), which noted that these chipsets can't properly do interrupt remapping, > > and > > as a result the recommend that interrupt remapping be disabled in bios. > > While > > many vendors have a bios update to do exactly that, not all do, and of > > course > > not all users update their bios to a level that corrects the problem. As a > > result, occasionally interrupts can arrive at a cpu even after affinity for > > that > > interrupt has be moved, leading to lost or spurrious interrupts (usually > > characterized by the message: > > kernel: do_IRQ: 7.71 No irq handler for vector (irq -1) > > > > There have been several incidents recently of people seeing this error, and > > investigation has shown that they have system for which their BIOS level is > > such > > that this feature was not properly turned off. As such, it would be good to > > give them a reminder that their systems are vulnurable to this problem. > > > > Signed-off-by: Neil Horman > > CC: Prarit Bhargava > > CC: Don Zickus > > CC: Don Dutile > > CC: Bjorn Helgaas > > CC: Asit Mallick > > CC: linux-...@vger.kernel.org > > > > --- > > > > Change notes: > > > > v2) > > > > * Moved the quirk to the x86 arch, since consensus seems to be that the 55XX > > chipset series is x86 only. I decided however to keep the quirk as a > > regular > > quirk, not an early_quirk. Early quirks have no way currently to determine > > if > > BIOS has properly disabled the feature in the iommu, at least not without > > significant hacking, and since its quite possible this will be a short lived > > quirk, should Don Z's workaround code prove successful (and it looks like > > it may > > well), I don't think that necessecary. > > > > * Removed the WARNING banner from the quirk, and added the HW_ERR token to > > the > > string, I opted to leave the newlines in place however, as I really couldnt > > find a way to keep the text on a single line is still legible from a code > > perspective. I think theres enough language in there that using cscope on > > just > > about any substring however will turn it up, and again, this may be a short > > lived quirk. > > --- > > arch/x86/kernel/quirks.c | 18 ++ > > include/linux/pci_ids.h | 2 ++ > > 2 files changed, 20 insertions(+) > > > > diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c > > index 26ee48a..a718ea2 100644 > > --- a/arch/x86/kernel/quirks.c > > +++ b/arch/x86/kernel/quirks.c > > @@ -5,6 +5,7 @@ > > #include > > > > #include > > +#include "../../../drivers/iommu/irq_remapping.h" > > > > #if defined(CONFIG_X86_IO_APIC) && defined(CONFIG_SMP) && > > defined(CONFIG_PCI) > > > > @@ -567,3 +568,20 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, > > PCI_DEVICE_ID_AMD_15H_NB_F5, > > quirk_amd_nb_node); > > > > #endif > > + > > +static void intel_remapping_check(struct pci_dev *dev) > > +{ > > + u8 revision; > > + > > + pci_read_config_byte(dev, PCI_REVISION_ID, &revision); > > + > > + if ((revision == 0x13) && irq_remapping_enabled) { > > +pr_warn(HW_ERR "This system BIOS has enabled interrupt > > remapping\n" > > +"on a chipset that contains an errata making > > that\n" > > +"feature unstable. Please reboot with > > nointremap\n" > > +"added to the kernel command line and contact\n" > > +"your BIOS vendor for an update"); > > + } > > +} > > +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, > > PCI_DEVICE_ID_INTEL_5520_IOHUB, intel_remapping_check); > > +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, > > PCI_DEVICE_ID_INTEL_5500_IOHUB, intel_remapping_check); > > This started as an IOMMU change, and I'm not an expert in that area, > so I added David and the IOM
[PATCH v3] irq: add quirk for broken interrupt remapping on 55XX chipsets
A few years back intel published a spec update: http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf For the 5520 and 5500 chipsets which contained an errata (specificially errata 53), which noted that these chipsets can't properly do interrupt remapping, and as a result the recommend that interrupt remapping be disabled in bios. While many vendors have a bios update to do exactly that, not all do, and of course not all users update their bios to a level that corrects the problem. As a result, occasionally interrupts can arrive at a cpu even after affinity for that interrupt has be moved, leading to lost or spurrious interrupts (usually characterized by the message: kernel: do_IRQ: 7.71 No irq handler for vector (irq -1) There have been several incidents recently of people seeing this error, and investigation has shown that they have system for which their BIOS level is such that this feature was not properly turned off. As such, it would be good to give them a reminder that their systems are vulnurable to this problem. Signed-off-by: Neil Horman CC: Prarit Bhargava CC: Don Zickus CC: Don Dutile CC: Bjorn Helgaas CC: Asit Mallick CC: David Woodhouse CC: linux-...@vger.kernel.org --- Change notes: v2) * Moved the quirk to the x86 arch, since consensus seems to be that the 55XX chipset series is x86 only. I decided however to keep the quirk as a regular quirk, not an early_quirk. Early quirks have no way currently to determine if BIOS has properly disabled the feature in the iommu, at least not without significant hacking, and since its quite possible this will be a short lived quirk, should Don Z's workaround code prove successful (and it looks like it may well), I don't think that necessecary. * Removed the WARNING banner from the quirk, and added the HW_ERR token to the string, I opted to leave the newlines in place however, as I really couldnt find a way to keep the text on a single line is still legible from a code perspective. I think theres enough language in there that using cscope on just about any substring however will turn it up, and again, this may be a short lived quirk. v3) * removed defines from pci_ids.h, and used direct id values as per request from Bjorn. --- arch/x86/kernel/quirks.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c index 26ee48a..7c99675 100644 --- a/arch/x86/kernel/quirks.c +++ b/arch/x86/kernel/quirks.c @@ -5,6 +5,7 @@ #include #include +#include "../../../drivers/iommu/irq_remapping.h" #if defined(CONFIG_X86_IO_APIC) && defined(CONFIG_SMP) && defined(CONFIG_PCI) @@ -567,3 +568,20 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_15H_NB_F5, quirk_amd_nb_node); #endif + +static void intel_remapping_check(struct pci_dev *dev) +{ + u8 revision; + + pci_read_config_byte(dev, PCI_REVISION_ID, &revision); + + if ((revision == 0x13) && irq_remapping_enabled) { +pr_warn(HW_ERR "This system BIOS has enabled interrupt remapping\n" +"on a chipset that contains an errata making that\n" +"feature unstable. Please reboot with nointremap\n" +"added to the kernel command line and contact\n" +"your BIOS vendor for an update"); + } +} +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x3406, intel_remapping_check); +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x3403, intel_remapping_check); -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] irq: add quirk for broken interrupt remapping on 55XX chipsets
On Thu, Apr 04, 2013 at 03:27:29PM +0100, David Woodhouse wrote: > On Wed, 2013-04-03 at 17:53 -0600, Bjorn Helgaas wrote: > > ); > > > + > > > + if ((revision == 0x13) && irq_remapping_enabled) { > > > +pr_warn(HW_ERR "This system BIOS has enabled interrupt > > > remapping\n" > > > +"on a chipset that contains an errata making > > > that\n" > > > +"feature unstable. Please reboot with > > > nointremap\n" > > > +"added to the kernel command line and contact\n" > > > +"your BIOS vendor for an update"); > > This should be WARN_TAINT(TAINT_FIRMWARE_WORKAROUND). And 'an erratum'. > Ok, copy that. I'll repost shortly > -- > dwmw2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4] irq: add quirk for broken interrupt remapping on 55XX chipsets
A few years back intel published a spec update: http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf For the 5520 and 5500 chipsets which contained an errata (specificially errata 53), which noted that these chipsets can't properly do interrupt remapping, and as a result the recommend that interrupt remapping be disabled in bios. While many vendors have a bios update to do exactly that, not all do, and of course not all users update their bios to a level that corrects the problem. As a result, occasionally interrupts can arrive at a cpu even after affinity for that interrupt has be moved, leading to lost or spurrious interrupts (usually characterized by the message: kernel: do_IRQ: 7.71 No irq handler for vector (irq -1) There have been several incidents recently of people seeing this error, and investigation has shown that they have system for which their BIOS level is such that this feature was not properly turned off. As such, it would be good to give them a reminder that their systems are vulnurable to this problem. Signed-off-by: Neil Horman CC: Prarit Bhargava CC: Don Zickus CC: Don Dutile CC: Bjorn Helgaas CC: Asit Mallick CC: David Woodhouse CC: linux-...@vger.kernel.org --- Change notes: v2) * Moved the quirk to the x86 arch, since consensus seems to be that the 55XX chipset series is x86 only. I decided however to keep the quirk as a regular quirk, not an early_quirk. Early quirks have no way currently to determine if BIOS has properly disabled the feature in the iommu, at least not without significant hacking, and since its quite possible this will be a short lived quirk, should Don Z's workaround code prove successful (and it looks like it may well), I don't think that necessecary. * Removed the WARNING banner from the quirk, and added the HW_ERR token to the string, I opted to leave the newlines in place however, as I really couldnt find a way to keep the text on a single line is still legible from a code perspective. I think theres enough language in there that using cscope on just about any substring however will turn it up, and again, this may be a short lived quirk. v3) * Removed defines from pci_ids.h, and used direct id values as per request from Bjorn. v4) * Converted pr_warn to WARN_TAINT(TAINT_FIRMWARE_WORKAROUND) as per David Woodhouse --- arch/x86/kernel/quirks.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c index 26ee48a..eb0785d 100644 --- a/arch/x86/kernel/quirks.c +++ b/arch/x86/kernel/quirks.c @@ -5,6 +5,7 @@ #include #include +#include "../../../drivers/iommu/irq_remapping.h" #if defined(CONFIG_X86_IO_APIC) && defined(CONFIG_SMP) && defined(CONFIG_PCI) @@ -567,3 +568,20 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_15H_NB_F5, quirk_amd_nb_node); #endif + +static void intel_remapping_check(struct pci_dev *dev) +{ + u8 revision; + + pci_read_config_byte(dev, PCI_REVISION_ID, &revision); + + WARN_TAINT(((revision == 0x13) && irq_remapping_enabled), + TAINT_FIRMWARE_WORKAROUND, + "This system BIOS has enabled interrupt remapping\n" + "on a chipset that contains an erratum making that\n" + "feature unstable. Please reboot with nointremap\n" + "added to the kernel command line and contact\n" + "your BIOS vendor for an update"); +} +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x3406, intel_remapping_check); +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x3403, intel_remapping_check); -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] irq: add quirk for broken interrupt remapping on 55XX chipsets
On Thu, Apr 04, 2013 at 08:57:06AM -0600, Bjorn Helgaas wrote: > On Thu, Apr 4, 2013 at 8:50 AM, Neil Horman wrote: > > On Thu, Apr 04, 2013 at 03:27:29PM +0100, David Woodhouse wrote: > >> On Wed, 2013-04-03 at 17:53 -0600, Bjorn Helgaas wrote: > >> > ); > >> > > + > >> > > + if ((revision == 0x13) && irq_remapping_enabled) { > >> > > +pr_warn(HW_ERR "This system BIOS has enabled > >> > > interrupt remapping\n" > >> > > +"on a chipset that contains an errata making > >> > > that\n" > >> > > +"feature unstable. Please reboot with > >> > > nointremap\n" > >> > > +"added to the kernel command line and > >> > > contact\n" > >> > > +"your BIOS vendor for an update"); > >> > >> This should be WARN_TAINT(TAINT_FIRMWARE_WORKAROUND). And 'an erratum'. > >> > > Ok, copy that. I'll repost shortly > > When you do, please include URLs for any problem reports or bugzillas you > have. > Well, those are going to be vendor specific, so I'm not sure we can really do that, at least not in any meaningful way. > I assume Windows "just works" in this situation? No more or less than linux does in this case. The Intel provided errata indicates that the only acceptable workaround is to disable remapping in the BIOS, so I would presume that if a windows system has a BIOS that doesn't implement this fix, its just as exposed as we are. Neil -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] irq: add quirk for broken interrupt remapping on 55XX chipsets
On Thu, Apr 04, 2013 at 11:14:30AM -0600, Bjorn Helgaas wrote: > On Thu, Apr 4, 2013 at 9:39 AM, Neil Horman wrote: > > On Thu, Apr 04, 2013 at 08:57:06AM -0600, Bjorn Helgaas wrote: > >> On Thu, Apr 4, 2013 at 8:50 AM, Neil Horman wrote: > >> > On Thu, Apr 04, 2013 at 03:27:29PM +0100, David Woodhouse wrote: > >> >> On Wed, 2013-04-03 at 17:53 -0600, Bjorn Helgaas wrote: > >> >> > ); > >> >> > > + > >> >> > > + if ((revision == 0x13) && irq_remapping_enabled) { > >> >> > > +pr_warn(HW_ERR "This system BIOS has enabled > >> >> > > interrupt remapping\n" > >> >> > > +"on a chipset that contains an errata > >> >> > > making that\n" > >> >> > > +"feature unstable. Please reboot with > >> >> > > nointremap\n" > >> >> > > +"added to the kernel command line and > >> >> > > contact\n" > >> >> > > +"your BIOS vendor for an update"); > >> >> > >> >> This should be WARN_TAINT(TAINT_FIRMWARE_WORKAROUND). And 'an erratum'. > >> >> > >> > Ok, copy that. I'll repost shortly > >> > >> When you do, please include URLs for any problem reports or bugzillas you > >> have. > >> > > Well, those are going to be vendor specific, so I'm not sure we can really > > do > > that, at least not in any meaningful way. > > Sorry, I don't understand your point. It's useful to know who > reported it (e.g., for future testers) and what happened and what > bugzillas it solved. Of course it applies only to machines with this > chipset and certain BIOS revisions. > Oh, you want the bug report that I'm fixing this against? Sure, I can do that. I thought you wanted me to include a url in the WARN_TAINT, with which user could report occurances of this bug. Yeah, the bug that this is reported in is: https://bugzilla.redhat.com/show_bug.cgi?id=887006 Its standing in for about a dozen or so variants of this issue we've seen > >> I assume Windows "just works" in this situation? > > No more or less than linux does in this case. The Intel provided errata > > indicates that the only acceptable workaround is to disable remapping in the > > BIOS, so I would presume that if a windows system has a BIOS that doesn't > > implement this fix, its just as exposed as we are. > > It sounds like the effect of this bug is that on Linux, devices may > not work at all because of lost interrupts. Either Windows must never > enable remapping (so it never sees the bug), or it must be designed in > a way that tolerates the problem. I can't believe these machines > shipped with Windows certification if devices didn't work correctly. > I don't know what to tell you here. We explicitly asked intel about this, and there was an effort to recode the interrupt migration code to properly manage this situation, and intels response was "No, just disable it in BIOS", so we're telling people to disable it in BIOS. You'd have to ask somebody at Microsoft what Windows did or did not do about this problem. > Either way, I don't understand why we can't make the quirk just fix > this. Booting with "nointremap" only sets disable_irq_remap, which is > only used by irq_remapping_supported(). Early quirks are run before > irq_remapping_supported () is ever called, so an early quirk ought to > be just as effective as the command line option. Here's the relevant > call tree I see: > > start_kernel > setup_arch > parse_early_param > early_quirks > rest_init > ... > > > The x86 setup_arch() does call generic_apic_probe(), but as far as I > can tell, none of the APIC .probe() methods reference > disable_irq_remap, so that doesn't look like a problem. > Well, I can give it a try, but I'm sure something went wrong last time I did. Regardless, theres also the security issue to consider here - namely that disabling irq remapping opens up users of virt to a possible security bug (potential irq injection). Some users may wish to live with the remapping error, given that error typically leads to devices that need to be restarted/reset to start working again, rather than live with the security hole. I rather like the warning, that gives users a choice, but I'll spin up a version that just disables it if you would rather. Neil -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] irq: add quirk for broken interrupt remapping on 55XX chipsets
On Thu, Apr 04, 2013 at 12:41:27PM -0600, Bjorn Helgaas wrote: > On Thu, Apr 4, 2013 at 11:51 AM, Neil Horman wrote: > > > Oh, you want the bug report that I'm fixing this against? Sure, I can do > > that. > > I thought you wanted me to include a url in the WARN_TAINT, with which user > > could report occurances of this bug. Yeah, the bug that this is reported > > in is: > > https://bugzilla.redhat.com/show_bug.cgi?id=887006 > > > > Its standing in for about a dozen or so variants of this issue we've seen > > Exactly -- I'm just hoping for something in the changelog. BTW, this > particular bugzilla is not public. > Ok, that I can do, I'll get the bz fixed up to be public in a bit. > > Regardless, theres also the security issue to consider here - namely that > > disabling irq remapping opens up users of virt to a possible security bug > > (potential irq injection). Some users may wish to live with the remapping > > error, given that error typically leads to devices that need to be > > restarted/reset to start working again, rather than live with the security > > hole. > > I rather like the warning, that gives users a choice, but I'll spin up a > > version > > that just disables it if you would rather. > > I don't believe users will want to make a choice like that or even be > sophisticated enough to do it, at least not based on something in > dmesg. I'm pretty sure I'm not :) > > The only supportable thing I can imagine doing would be: > > - Disable interrupt remapping if this chipset defect is present, so > devices work reliably (they don't need whatever restart/reset you > referred to above). > - Disable virt functionality when interrupt remapping is disabled to > avoid the security problem (I don't know the details of this.) > - Add a command-line option to enable interrupt remapping (I think > "intremap=on" is currently parsed too early, but maybe this could be > reworked so the option could override the quirk disable). > - Add release notes saying "boot with 'intremap=on' if you want the > virt functionality and can accept unreliable devices." > > That way the default behavior is safe and reliable (though perhaps > lacking some functionality), and you have told the user a way to get > safe and unreliable operation if he's willing to accept that. At > least, that's what I think I would want if I were in RH's shoes. > Theres a long argument behind this, and I'm with you. At the least I don't see a problem with disabling it upstream, at least not a policy-oriented reason. That said, having looked back at my notes, I do now recall a technical impediment behind disabling interrupt remapping. If we were to do this in an early quirk, we would need to determine the status of the interrupt remapping capability flag in the iommu in question. Looking at the interrupt remapping code, it appears that the capability flag isn't directly contained in the pci config space, but rather in a memory mapped address range who's base address is parsed out of an acpi table. Since we check the irq_remapping_enabled flag in the current quirk (which is set after the early quirks run), to do this in an early quirk, we would need to somehow parse that base address register out of the available acpi tables ourselves, and I'm not at all sure how to do that. Do you know if theres some available parsing mechanism that early in boot? If so, I can probably make this happen. Neil > Bjorn > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4] irq: add quirk for broken interrupt remapping on 55XX chipsets
On Thu, Apr 04, 2013 at 10:40:07AM -0700, Yinghai Lu wrote: > On Thu, Apr 4, 2013 at 10:27 AM, Don Dutile wrote: > >> You need to move the quirk to early_quirk to append nointremap to > >> avoid extra rebooting. > >> > > The pci-dev's of all the (minimally, root, 5500-chipset) pci-dev's are > > known/scanned/created by that time? > > in arch/x86/kernel/early-quirk.c > > and on top of > https://git.kernel.org/cgit/linux/kernel/git/yinghai/linux-yinghai.git/commit/?h=for-x86-early-quirk-usb&id=de38757e964cfee20e6da1977572a2191d7f4aa0 > > You could add one entry in early_qrk[]. > > Some one already try to use that path to disable x2apic on some thinkpad. > > So it should work on nointrremap too. > See my last email to Bjorn. Doing this in early-quirks in such a way that we can detect an iommu that has interrupt remapping enabled (so we don't just unilaterally print this quirk all the time) requires that we be able to parse acpi tables very early in the boot. If you know of how to do that, I can make this happen. If not, I suppose another alternative would be to have the early quirk set a flag that tells us this is a bogus chip, and if we try to enable irq remapping with that flag set, we should fail, and report the error at that time, but I'm not sure I like that solution. Neil > Thanks > > Yinghai > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4] irq: add quirk for broken interrupt remapping on 55XX chipsets
On Thu, Apr 04, 2013 at 02:11:54PM -0700, Yinghai Lu wrote: > On Thu, Apr 4, 2013 at 1:33 PM, Bjorn Helgaas wrote: > >> See my last email to Bjorn. Doing this in early-quirks in such a way that > >> we > >> can detect an iommu that has interrupt remapping enabled (so we don't just > >> unilaterally print this quirk all the time) requires that we be able to > >> parse > >> acpi tables very early in the boot. If you know of how to do that, I can > >> make > >> this happen. If not, I suppose another alternative would be to have the > >> early > >> quirk set a flag that tells us this is a bogus chip, and if we try to > >> enable irq > >> remapping with that flag set, we should fail, and report the error at that > >> time, > >> but I'm not sure I like that solution. > > > > I like that solution :) It seems very simple -- you don't have to > > parse any tables or anything. > > You are right, we don't need to parse any acpi tables. > > just add one quirk in early-quirk.c to set >disable_irq_remap = 1; > Well, I can't just do that. We need to issue a warning to the user as well, and to do so conditionally (we don't want to warn users who have prorperly updated BIOSes), I would need to know if irq remapping is actually on or not, which would require parsing ACPI tables But, as noted above, I can just set a flag, and defer the printing of the warning until later in the boot process, when we know that information already. Bjorn seems on board with that idea, so I'll spin up a patch for it in the AM. Thanks! Neil > Thanks > > Yinghai > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v5] irq: add quirk for broken interrupt remapping on 55XX chipsets
A few years back intel published a spec update: http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf For the 5520 and 5500 chipsets which contained an errata (specificially errata 53), which noted that these chipsets can't properly do interrupt remapping, and as a result the recommend that interrupt remapping be disabled in bios. While many vendors have a bios update to do exactly that, not all do, and of course not all users update their bios to a level that corrects the problem. As a result, occasionally interrupts can arrive at a cpu even after affinity for that interrupt has be moved, leading to lost or spurrious interrupts (usually characterized by the message: kernel: do_IRQ: 7.71 No irq handler for vector (irq -1) There have been several incidents recently of people seeing this error, and investigation has shown that they have system for which their BIOS level is such that this feature was not properly turned off. As such, it would be good to give them a reminder that their systems are vulnurable to this problem. Signed-off-by: Neil Horman CC: Prarit Bhargava CC: Don Zickus CC: Don Dutile CC: Bjorn Helgaas CC: Asit Mallick CC: David Woodhouse CC: linux-...@vger.kernel.org --- Change notes: v2) * Moved the quirk to the x86 arch, since consensus seems to be that the 55XX chipset series is x86 only. I decided however to keep the quirk as a regular quirk, not an early_quirk. Early quirks have no way currently to determine if BIOS has properly disabled the feature in the iommu, at least not without significant hacking, and since its quite possible this will be a short lived quirk, should Don Z's workaround code prove successful (and it looks like it may well), I don't think that necessecary. * Removed the WARNING banner from the quirk, and added the HW_ERR token to the string, I opted to leave the newlines in place however, as I really couldnt find a way to keep the text on a single line is still legible from a code perspective. I think theres enough language in there that using cscope on just about any substring however will turn it up, and again, this may be a short lived quirk. v3) * Removed defines from pci_ids.h, and used direct id values as per request from Bjorn. v4) * Converted pr_warn to WARN_TAINT(TAINT_FIRMWARE_WORKAROUND) as per David Woodhouse v5) * Moved check to an early quirk, and flagged the broken chip, so we could reasonably disable irq remapping during bootup. --- arch/x86/kernel/early-quirks.c | 25 + arch/x86/kernel/quirks.c | 2 ++ drivers/iommu/irq_remapping.c | 12 drivers/iommu/irq_remapping.h | 1 + 4 files changed, 40 insertions(+) diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c index 3755ef4..bfa3139 100644 --- a/arch/x86/kernel/early-quirks.c +++ b/arch/x86/kernel/early-quirks.c @@ -192,6 +192,27 @@ static void __init ati_bugs_contd(int num, int slot, int func) } #endif +#ifdef CONFIG_IRQ_REMAP +static void __init intel_remapping_check(int num, int slot, int func) +{ + u8 revision; + + revision = pci_read_config_byte(num, slot, func , PCI_REVISION_ID); + + /* +* Revision 0x13 of this chipset supports irq remapping +* but has an erratum that breaks its behavior, flag it as such +*/ + if (revision == 0x13) + irq_remap_broken = 1; + +} +#else +static void __init intel_remapping_check(int num, int slot, int func) +{ +} +#endif + #define QFLAG_APPLY_ONCE 0x1 #define QFLAG_APPLIED 0x2 #define QFLAG_DONE (QFLAG_APPLY_ONCE|QFLAG_APPLIED) @@ -221,6 +242,10 @@ static struct chipset early_qrk[] __initdata = { PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs }, { PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_SBX00_SMBUS, PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs_contd }, + { PCI_VENDOR_ID_INTEL, 0x3403, PCI_CLASS_BRIDGE_HOST, + PCI_BASE_CLASS_BRIDGE, 0, intel_remapping_check }, + { PCI_VENDOR_ID_INTEL, 0x3406, PCI_CLASS_BRIDGE_HOST, + PCI_BASE_CLASS_BRIDGE, 0, intel_remapping_check }, {} }; diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c index 26ee48a..2136844 100644 --- a/arch/x86/kernel/quirks.c +++ b/arch/x86/kernel/quirks.c @@ -5,6 +5,7 @@ #include #include +#include "../../../drivers/iommu/irq_remapping.h" #if defined(CONFIG_X86_IO_APIC) && defined(CONFIG_SMP) && defined(CONFIG_PCI) @@ -567,3 +568,4 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_15H_NB_F5, quirk_amd_nb_node); #endif + diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c index d56f8c1..2b56e92 100644 --- a/drivers/iommu/irq_remapping.c +++ b/drivers/iommu/irq_remapping.c @@ -19,6 +19,7 @@ int irq_remapping_enabled; int disable_irq_remap; +int irq_remap_bro
Re: [PATCH v5] irq: add quirk for broken interrupt remapping on 55XX chipsets
On Fri, Apr 05, 2013 at 03:25:54PM -0400, Neil Horman wrote: > A few years back intel published a spec update: > http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf > > For the 5520 and 5500 chipsets which contained an errata (specificially errata > 53), which noted that these chipsets can't properly do interrupt remapping, > and > as a result the recommend that interrupt remapping be disabled in bios. While > many vendors have a bios update to do exactly that, not all do, and of course > not all users update their bios to a level that corrects the problem. As a > result, occasionally interrupts can arrive at a cpu even after affinity for > that > interrupt has be moved, leading to lost or spurrious interrupts (usually > characterized by the message: > kernel: do_IRQ: 7.71 No irq handler for vector (irq -1) > > There have been several incidents recently of people seeing this error, and > investigation has shown that they have system for which their BIOS level is > such > that this feature was not properly turned off. As such, it would be good to > give them a reminder that their systems are vulnurable to this problem. > > Signed-off-by: Neil Horman > CC: Prarit Bhargava > CC: Don Zickus > CC: Don Dutile > CC: Bjorn Helgaas > CC: Asit Mallick > CC: David Woodhouse > CC: linux-...@vger.kernel.org > --- > > Change notes: > > v2) > > * Moved the quirk to the x86 arch, since consensus seems to be that the 55XX > chipset series is x86 only. I decided however to keep the quirk as a regular > quirk, not an early_quirk. Early quirks have no way currently to determine if > BIOS has properly disabled the feature in the iommu, at least not without > significant hacking, and since its quite possible this will be a short lived > quirk, should Don Z's workaround code prove successful (and it looks like it > may > well), I don't think that necessecary. > > * Removed the WARNING banner from the quirk, and added the HW_ERR token to the > string, I opted to leave the newlines in place however, as I really couldnt > find a way to keep the text on a single line is still legible from a code > perspective. I think theres enough language in there that using cscope on > just > about any substring however will turn it up, and again, this may be a short > lived quirk. > > v3) > > * Removed defines from pci_ids.h, and used direct id values as per request > from > Bjorn. > > v4) > > * Converted pr_warn to WARN_TAINT(TAINT_FIRMWARE_WORKAROUND) as per David > Woodhouse > > v5) > > * Moved check to an early quirk, and flagged the broken chip, so we could > reasonably disable irq remapping during bootup. Self NAK, sorry, I've got extra water in the quirks file left over from my move. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v6] irq: add quirk for broken interrupt remapping on 55XX chipsets
A few years back intel published a spec update: http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf For the 5520 and 5500 chipsets which contained an errata (specificially errata 53), which noted that these chipsets can't properly do interrupt remapping, and as a result the recommend that interrupt remapping be disabled in bios. While many vendors have a bios update to do exactly that, not all do, and of course not all users update their bios to a level that corrects the problem. As a result, occasionally interrupts can arrive at a cpu even after affinity for that interrupt has be moved, leading to lost or spurrious interrupts (usually characterized by the message: kernel: do_IRQ: 7.71 No irq handler for vector (irq -1) There have been several incidents recently of people seeing this error, and investigation has shown that they have system for which their BIOS level is such that this feature was not properly turned off. As such, it would be good to give them a reminder that their systems are vulnurable to this problem. Signed-off-by: Neil Horman CC: Prarit Bhargava CC: Don Zickus CC: Don Dutile CC: Bjorn Helgaas CC: Asit Mallick CC: David Woodhouse CC: linux-...@vger.kernel.org --- Change notes: v2) * Moved the quirk to the x86 arch, since consensus seems to be that the 55XX chipset series is x86 only. I decided however to keep the quirk as a regular quirk, not an early_quirk. Early quirks have no way currently to determine if BIOS has properly disabled the feature in the iommu, at least not without significant hacking, and since its quite possible this will be a short lived quirk, should Don Z's workaround code prove successful (and it looks like it may well), I don't think that necessecary. * Removed the WARNING banner from the quirk, and added the HW_ERR token to the string, I opted to leave the newlines in place however, as I really couldnt find a way to keep the text on a single line is still legible from a code perspective. I think theres enough language in there that using cscope on just about any substring however will turn it up, and again, this may be a short lived quirk. v3) * Removed defines from pci_ids.h, and used direct id values as per request from Bjorn. v4) * Converted pr_warn to WARN_TAINT(TAINT_FIRMWARE_WORKAROUND) as per David Woodhouse v5) * Moved check to an early quirk, and flagged the broken chip, so we could reasonably disable irq remapping during bootup. v6) * Clean up of stupid extra thrash in quirks.c --- arch/x86/kernel/early-quirks.c | 25 + drivers/iommu/irq_remapping.c | 12 drivers/iommu/irq_remapping.h | 1 + 3 files changed, 38 insertions(+) diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c index 3755ef4..bfa3139 100644 --- a/arch/x86/kernel/early-quirks.c +++ b/arch/x86/kernel/early-quirks.c @@ -192,6 +192,27 @@ static void __init ati_bugs_contd(int num, int slot, int func) } #endif +#ifdef CONFIG_IRQ_REMAP +static void __init intel_remapping_check(int num, int slot, int func) +{ + u8 revision; + + revision = pci_read_config_byte(num, slot, func , PCI_REVISION_ID); + + /* +* Revision 0x13 of this chipset supports irq remapping +* but has an erratum that breaks its behavior, flag it as such +*/ + if (revision == 0x13) + irq_remap_broken = 1; + +} +#else +static void __init intel_remapping_check(int num, int slot, int func) +{ +} +#endif + #define QFLAG_APPLY_ONCE 0x1 #define QFLAG_APPLIED 0x2 #define QFLAG_DONE (QFLAG_APPLY_ONCE|QFLAG_APPLIED) @@ -221,6 +242,10 @@ static struct chipset early_qrk[] __initdata = { PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs }, { PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_SBX00_SMBUS, PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs_contd }, + { PCI_VENDOR_ID_INTEL, 0x3403, PCI_CLASS_BRIDGE_HOST, + PCI_BASE_CLASS_BRIDGE, 0, intel_remapping_check }, + { PCI_VENDOR_ID_INTEL, 0x3406, PCI_CLASS_BRIDGE_HOST, + PCI_BASE_CLASS_BRIDGE, 0, intel_remapping_check }, {} }; diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c index d56f8c1..2b56e92 100644 --- a/drivers/iommu/irq_remapping.c +++ b/drivers/iommu/irq_remapping.c @@ -19,6 +19,7 @@ int irq_remapping_enabled; int disable_irq_remap; +int irq_remap_broken; int disable_sourceid_checking; int no_x2apic_optout; @@ -216,6 +217,17 @@ int irq_remapping_supported(void) if (disable_irq_remap) return 0; + if (irq_remap_broken) { + WARN_TAINT(1, TAIN_FIRMWARE_WORKAROUND, + "This system BIOS has enabled interrupt remapping\n" + "on a chipset that contains an erratum making that\n" + "feature un
Re: [PATCH v6] irq: add quirk for broken interrupt remapping on 55XX chipsets
I'm sorry. Forgot to change the wording of the error for the new model that I'm following here. Although the message is mostly right as bios is responsible for setting and clearing the IRQ remapping feature bit in the chips capabilities register. I'll fix and repost Monday Neil Yinghai Lu wrote: >On Fri, Apr 5, 2013 at 12:31 PM, Neil Horman wrote: >> A few years back intel published a spec update: >> http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf >> >> For the 5520 and 5500 chipsets which contained an errata (specificially >> errata >> 53), which noted that these chipsets can't properly do interrupt remapping, >> and >> as a result the recommend that interrupt remapping be disabled in bios. >> While >> many vendors have a bios update to do exactly that, not all do, and of course >> not all users update their bios to a level that corrects the problem. As a >> result, occasionally interrupts can arrive at a cpu even after affinity for >> that >> interrupt has be moved, leading to lost or spurrious interrupts (usually >> characterized by the message: >> kernel: do_IRQ: 7.71 No irq handler for vector (irq -1) >> >> There have been several incidents recently of people seeing this error, and >> investigation has shown that they have system for which their BIOS level is >> such >> that this feature was not properly turned off. As such, it would be good to >> give them a reminder that their systems are vulnurable to this problem. >> >> Signed-off-by: Neil Horman >> CC: Prarit Bhargava >> CC: Don Zickus >> CC: Don Dutile >> CC: Bjorn Helgaas >> CC: Asit Mallick >> CC: David Woodhouse >> CC: linux-...@vger.kernel.org >> --- >> >> Change notes: >> >> v2) >> >> * Moved the quirk to the x86 arch, since consensus seems to be that the 55XX >> chipset series is x86 only. I decided however to keep the quirk as a regular >> quirk, not an early_quirk. Early quirks have no way currently to determine >> if >> BIOS has properly disabled the feature in the iommu, at least not without >> significant hacking, and since its quite possible this will be a short lived >> quirk, should Don Z's workaround code prove successful (and it looks like it >> may >> well), I don't think that necessecary. >> >> * Removed the WARNING banner from the quirk, and added the HW_ERR token to >> the >> string, I opted to leave the newlines in place however, as I really couldnt >> find a way to keep the text on a single line is still legible from a code >> perspective. I think theres enough language in there that using cscope on >> just >> about any substring however will turn it up, and again, this may be a short >> lived quirk. >> >> v3) >> >> * Removed defines from pci_ids.h, and used direct id values as per request >> from >> Bjorn. >> >> v4) >> >> * Converted pr_warn to WARN_TAINT(TAINT_FIRMWARE_WORKAROUND) as per David >> Woodhouse >> >> v5) >> >> * Moved check to an early quirk, and flagged the broken chip, so we could >> reasonably disable irq remapping during bootup. >> >> v6) >> * Clean up of stupid extra thrash in quirks.c >> --- >> arch/x86/kernel/early-quirks.c | 25 + >> drivers/iommu/irq_remapping.c | 12 >> drivers/iommu/irq_remapping.h | 1 + >> 3 files changed, 38 insertions(+) >> >> diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c >> index 3755ef4..bfa3139 100644 >> --- a/arch/x86/kernel/early-quirks.c >> +++ b/arch/x86/kernel/early-quirks.c >> @@ -192,6 +192,27 @@ static void __init ati_bugs_contd(int num, int slot, >> int func) >> } >> #endif >> >> +#ifdef CONFIG_IRQ_REMAP >> +static void __init intel_remapping_check(int num, int slot, int func) >> +{ >> + u8 revision; >> + >> + revision = pci_read_config_byte(num, slot, func , PCI_REVISION_ID); >> + >> + /* >> +* Revision 0x13 of this chipset supports irq remapping >> +* but has an erratum that breaks its behavior, flag it as such >> +*/ >> + if (revision == 0x13) >> + irq_remap_broken = 1; >> + >> +} >> +#else >> +static void __init intel_remapping_check(int num, int slot, int func) >> +{ >> +} >> +#endif >> + >> #define QFLAG_APPLY_ONCE 0x1 >> #define QFL
Re: [PATCH v2] sctp: set association state to established in dupcook_a handler
On Thu, Jan 24, 2013 at 10:44:34AM +0800, xufengzhang.m...@gmail.com wrote: > From: Xufeng Zhang > > While sctp handling a duplicate COOKIE-ECHO and the action is > 'Association restart', sctp_sf_do_dupcook_a() will processing > the unexpected COOKIE-ECHO for peer restart, but it does not set > the association state to SCTP_STATE_ESTABLISHED, so the association > could stuck in SCTP_STATE_SHUTDOWN_PENDING state forever. > This violates the sctp specification: > RFC 4960 5.2.4. Handle a COOKIE ECHO when a TCB Exists > Action > A) In this case, the peer may have restarted. . > After this, the endpoint shall enter the ESTABLISHED state. > > To resolve this problem, adding a SCTP_CMD_NEW_STATE cmd to the > command list before SCTP_CMD_REPLY cmd, this will set the restart > association to SCTP_STATE_ESTABLISHED state properly and also avoid > I-bit being set in the DATA chunk header when COOKIE_ACK is bundled > with DATA chunks. > > Signed-off-by: Xufeng Zhang > --- > v2: > - Put the SCTP_CMD_NEW_STATE command before SCTP_CMD_REPLY and after > SCTP_CMD_EVENT_ULP > suggested by Vlad and Neil > - Improve the last paragraph of the commit header > > net/sctp/sm_statefuns.c |4 +++- > 1 files changed, 3 insertions(+), 1 deletions(-) > > diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c > index 618ec7e..5131fcf 100644 > --- a/net/sctp/sm_statefuns.c > +++ b/net/sctp/sm_statefuns.c > @@ -1779,8 +1779,10 @@ static sctp_disposition_t sctp_sf_do_dupcook_a(struct > net *net, > > /* Update the content of current association. */ > sctp_add_cmd_sf(commands, SCTP_CMD_UPDATE_ASSOC, SCTP_ASOC(new_asoc)); > - sctp_add_cmd_sf(commands, SCTP_CMD_REPLY, SCTP_CHUNK(repl)); > sctp_add_cmd_sf(commands, SCTP_CMD_EVENT_ULP, SCTP_ULPEVENT(ev)); > + sctp_add_cmd_sf(commands, SCTP_CMD_NEW_STATE, > + SCTP_STATE(SCTP_STATE_ESTABLISHED)); > + sctp_add_cmd_sf(commands, SCTP_CMD_REPLY, SCTP_CHUNK(repl)); > return SCTP_DISPOSITION_CONSUME; > > nomem_ev: > -- > 1.7.0.2 > > Acked-by: Neil Horman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] VSOCK: Introduce VM Sockets
On Fri, Jan 25, 2013 at 09:37:50AM -0800, ack...@vmware.com wrote: > From: Andy King > > VM Sockets allows communication between virtual machines and the hypervisor. > User level applications both in a virtual machine and on the host can use the > VM Sockets API, which facilitates fast and efficient communication between > guest virtual machines and their host. A socket address family, designed to > be > compatible with UDP and TCP at the interface level, is provided. > > Today, VM Sockets is used by various VMware Tools components inside the guest > for zero-config, network-less access to VMware host services. In addition to > this, VMware's users are using VM Sockets for various applications, where > network access of the virtual machine is restricted or non-existent. Examples > of this are VMs communicating with device proxies for proprietary hardware > running as host applications and automated testing of applications running > within virtual machines. > > The VMware VM Sockets are similar to other socket types, like Berkeley UNIX > socket interface. The VM Sockets module supports both connection-oriented > stream sockets like TCP, and connectionless datagram sockets like UDP. The VM > Sockets protocol family is defined as "AF_VSOCK" and the socket operations > split for SOCK_DGRAM and SOCK_STREAM. > > For additional information about the use of VM Sockets, please refer to the VM > Sockets Programming Guide available at: > > https://www.vmware.com/support/developer/vmci-sdk/ > > Signed-off-by: George Zhang > Signed-off-by: Dmitry Torokhov > Signed-off-by: Andy king > index 000..95e2568 > --- /dev/null > +++ b/net/vmw_vsock/Kconfig > @@ -0,0 +1,14 @@ > +# > +# Vsock protocol > +# > + > +config VMWARE_VSOCK > + tristate "Virtual Socket protocol" > + depends on VMWARE_VMCI What is CONFIG_VMWARE_VMCI? I don't find that in any Kconfig in the tree? I''m still looking over the rest, but I get build issues if I just remove the dependency. Neil -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Pv-drivers] [PATCH 1/1] VSOCK: Introduce VM Sockets
On Fri, Jan 25, 2013 at 04:15:19PM -0800, Dmitry Torokhov wrote: > Hi Neil, > > On Friday, January 25, 2013 06:59:53 PM Neil Horman wrote: > > On Fri, Jan 25, 2013 at 09:37:50AM -0800, ack...@vmware.com wrote: > > > + > > > +config VMWARE_VSOCK > > > + tristate "Virtual Socket protocol" > > > + depends on VMWARE_VMCI > > > > What is CONFIG_VMWARE_VMCI? I don't find that in any Kconfig in the tree? > > > > I''m still looking over the rest, but I get build issues if I just remove > > the dependency. > > VMCI is in linux-next at the moment. > > Thanks, > Dmitry > Thanks, I'll pull that in. Neil -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] i2c: iSMT SMBus patch for Intel Avoton DeviceIDs
On Tue, Feb 26, 2013 at 02:08:35AM +, Heasley, Seth wrote: > >> The difference here is that the S1200 name is already publicly released, > >whereas the naming for Avoton hasn't been announced. > >> > >Do you know when the announcement will be? Is it sufficiently soon that we > >can just wait for it? > > I don't, and actually couldn't tell you if I did. The codename usage is > consistent with how we deal with all products since the patches have to go in > well in advance of product launch. > Meh, ok, I guess we cal always fix it up when the product has an actual name Acked-by: Neil Horman > -Seth > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] i2c: iSMT SMBus patch for Intel Avoton DeviceIDs
On Thu, Feb 21, 2013 at 02:30:43PM -0800, Seth Heasley wrote: > This patch adds the iSMT SMBus Controller DeviceIDs for the Intel Avoton SOC. > > Signed-off-by: Seth Heasley Is there a part number (I.e. S1200) that we can use in the name, rather than the project code name? That would seem to be more in keeping with the previous naming scheeme directly above it Neil -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] i2c: iSMT SMBus patch for Intel Avoton DeviceIDs
On Thu, Feb 21, 2013 at 10:19:48PM +, Heasley, Seth wrote: > Hi Neil, > > >> This patch adds the iSMT SMBus Controller DeviceIDs for the Intel Avoton > >SOC. > >> > >> Signed-off-by: Seth Heasley > >Is there a part number (I.e. S1200) that we can use in the name, rather than > >the project code name? That would seem to be more in keeping with the > >previous naming scheeme directly above it Neil > > The difference here is that the S1200 name is already publicly released, > whereas the naming for Avoton hasn't been announced. > Do you know when the announcement will be? Is it sufficiently soon that we can just wait for it? Neil > Regards, > -Seth > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sctp: don't break the loop while meeting the active_path so as to find the matched transport
On Tue, Mar 12, 2013 at 10:24:02AM +0800, Xufeng Zhang wrote: > >> > >> Thanks for your review, Neil! > >> > >> I know what you mean, yes, it's most probably that the searched TSN was > >> transmitted in the currently active_path, but what should we do if not. > >> > >> Check the comment in sctp_assoc_lookup_tsn() function: > >> /* Let's be hopeful and check the active_path first. */ > >> /* If not found, go search all the other transports. */ > >> > >> It has checked the active_path first and then traverse all the other > >> transports in > >> the transport_addr_list except the active_path. > >> > >> So what I want to fix here is the inconsistency between the function > >> should do and > >> the code actually does. > >> > > I understand what you're doing, and I agree that the fix is functional > > (Hence > > my "This works" statement in my last note). What I'm suggesting is that, > > since > > you're messing about in that code anyway that you clean it up while your at > > it, > > so that we don't need to have the if (transport == active) check at all. > > We > > trade in some extra work in a non-critical path (sctp_assoc_set_primary), > > for > > the removal of an extra for loop operation and a conditional check in a > > much > > hotter path. Something like this (completely untested), is what I was > > thinking > > Aha, seems I have some misunderstanding previously, now I got your point. > Yeah, it's better to do the clean up by this way, and this fix looks fine to > me, > but I didn't have a test case to test this, actually this problem was detected > by code review, so I would like to leave the rest of this work to > determine by you. > > Thank you very much for your clarification! > Ok, I'll try set up a test for this today Neil > > Thanks, > Xufeng > > > > > > > diff --git a/net/sctp/associola.c b/net/sctp/associola.c > > index 43cd0dd..8ae873c 100644 > > --- a/net/sctp/associola.c > > +++ b/net/sctp/associola.c > > @@ -505,6 +505,9 @@ void sctp_assoc_set_primary(struct sctp_association > > *asoc, > > > > asoc->peer.primary_path = transport; > > > > + list_del_rcu(&transport->transports); > > + list_add_rcu(&transport->transports, &asoc->peer.transport_addr_list); > > + > > /* Set a default msg_name for events. */ > > memcpy(&asoc->peer.primary_addr, &transport->ipaddr, > >sizeof(union sctp_addr)); > > @@ -1040,7 +1043,6 @@ struct sctp_chunk *sctp_get_ecne_prepend(struct > > sctp_association *asoc) > > struct sctp_transport *sctp_assoc_lookup_tsn(struct sctp_association > > *asoc, > > __u32 tsn) > > { > > - struct sctp_transport *active; > > struct sctp_transport *match; > > struct sctp_transport *transport; > > struct sctp_chunk *chunk; > > @@ -1057,29 +1059,16 @@ struct sctp_transport *sctp_assoc_lookup_tsn(struct > > sctp_association *asoc, > > * The general strategy is to search each transport's transmitted > > * list. Return which transport this TSN lives on. > > * > > -* Let's be hopeful and check the active_path first. > > -* Another optimization would be to know if there is only one > > -* outbound path and not have to look for the TSN at all. > > +* Note, that sctp_assoc_set_primary does a move to front operation > > +* on the active_path transport, so this code implicitly checks > > +* the active_path first, as we most commonly expect to find our TSN > > +* there. > > * > > */ > > > > - active = asoc->peer.active_path; > > - > > - list_for_each_entry(chunk, &active->transmitted, > > - transmitted_list) { > > - > > - if (key == chunk->subh.data_hdr->tsn) { > > - match = active; > > - goto out; > > - } > > - } > > - > > - /* If not found, go search all the other transports. */ > > list_for_each_entry(transport, &asoc->peer.transport_addr_list, > > transports) { > > > > - if (transport == active) > > - break; > > list_for_each_entry(chunk, &transport->transmitted, > > transmitted_list) { > > if (key == chunk->subh.data_hdr->tsn) { > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 18/18] net: sctp: remove cast for kmalloc/kzalloc return value
On Tue, Mar 12, 2013 at 01:39:47PM +0800, Zhang Yanfei wrote: > remove cast for kmalloc/kzalloc return value. > > Signed-off-by: Zhang Yanfei > Cc: Vlad Yasevich > Cc: Sridhar Samudrala > Cc: Neil Horman > Cc: Andrew Morton > Cc: linux-s...@vger.kernel.org > --- > include/net/sctp/sctp.h |2 +- > net/sctp/protocol.c |2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h > index df85a0c..cd89510 100644 > --- a/include/net/sctp/sctp.h > +++ b/include/net/sctp/sctp.h > @@ -576,7 +576,7 @@ for (pos = chunk->subh.fwdtsn_hdr->skip;\ > #define WORD_ROUND(s) (((s)+3)&~3) > > /* Make a new instance of type. */ > -#define t_new(type, flags) (type *)kzalloc(sizeof(type), flags) > +#define t_new(type, flags) kzalloc(sizeof(type), flags) > > /* Compare two timevals. */ > #define tv_lt(s, t) \ > diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c > index 1c2e46c..eaee00c 100644 > --- a/net/sctp/protocol.c > +++ b/net/sctp/protocol.c > @@ -1403,7 +1403,7 @@ SCTP_STATIC __init int sctp_init(void) > > /* Allocate and initialize the endpoint hash table. */ > sctp_ep_hashsize = 64; > - sctp_ep_hashtable = (struct sctp_hashbucket *) > + sctp_ep_hashtable = > kmalloc(64 * sizeof(struct sctp_hashbucket), GFP_KERNEL); > if (!sctp_ep_hashtable) { > pr_err("Failed endpoint_hash alloc\n"); > -- > 1.7.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Acked-by: Neil Horman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sctp: don't break the loop while meeting the active_path so as to find the matched transport
On Tue, Mar 12, 2013 at 08:11:34AM -0400, Vlad Yasevich wrote: > On 03/11/2013 09:31 AM, Neil Horman wrote: > >On Mon, Mar 11, 2013 at 10:14:50AM +0800, Xufeng Zhang wrote: > >>On 3/8/13, Neil Horman wrote: > >>>On Fri, Mar 08, 2013 at 03:39:37PM +0800, Xufeng Zhang wrote: > >>>>sctp_assoc_lookup_tsn() function searchs which transport a certain TSN > >>>>was sent on, if not found in the active_path transport, then go search > >>>>all the other transports in the peer's transport_addr_list, however, we > >>>>should continue to the next entry rather than break the loop when meet > >>>>the active_path transport. > >>>> > >>>>Signed-off-by: Xufeng Zhang > >>>>--- > >>>> net/sctp/associola.c |2 +- > >>>> 1 files changed, 1 insertions(+), 1 deletions(-) > >>>> > >>>>diff --git a/net/sctp/associola.c b/net/sctp/associola.c > >>>>index 43cd0dd..d2709e2 100644 > >>>>--- a/net/sctp/associola.c > >>>>+++ b/net/sctp/associola.c > >>>>@@ -1079,7 +1079,7 @@ struct sctp_transport *sctp_assoc_lookup_tsn(struct > >>>>sctp_association *asoc, > >>>> transports) { > >>>> > >>>> if (transport == active) > >>>>- break; > >>>>+ continue; > >>>> list_for_each_entry(chunk, &transport->transmitted, > >>>> transmitted_list) { > >>>> if (key == chunk->subh.data_hdr->tsn) { > >>>>-- > >>>>1.7.0.2 > >>>> > >>>> > >>> > >>>This works, but what might be better would be if we did a move to front > >>>heuristic in sctp_assoc_set_primary. E.g. when we set the active_path, > >>>move > >>>the > >>>requisite transport to the front of the transport_addr_list. If we did > >>>that, > >>>then we could just do one for loop in sctp_assoc_lookup_tsn and wind up > >>>implicitly check the active path first without having to check it seprately > >>>and > >>>skip it in the second for loop. > >> > >>Thanks for your review, Neil! > >> > >>I know what you mean, yes, it's most probably that the searched TSN was > >>transmitted in the currently active_path, but what should we do if not. > >> > >>Check the comment in sctp_assoc_lookup_tsn() function: > >>/* Let's be hopeful and check the active_path first. */ > >>/* If not found, go search all the other transports. */ > >> > >>It has checked the active_path first and then traverse all the other > >>transports in > >>the transport_addr_list except the active_path. > >> > >>So what I want to fix here is the inconsistency between the function > >>should do and > >>the code actually does. > >> > >I understand what you're doing, and I agree that the fix is functional (Hence > >my "This works" statement in my last note). What I'm suggesting is that, > >since > >you're messing about in that code anyway that you clean it up while your at > >it, > >so that we don't need to have the if (transport == active) check at all. We > >trade in some extra work in a non-critical path (sctp_assoc_set_primary), for > >the removal of an extra for loop operation and a conditional check in a much > >hotter path. Something like this (completely untested), is what I was > >thinking > > > > > >diff --git a/net/sctp/associola.c b/net/sctp/associola.c > >index 43cd0dd..8ae873c 100644 > >--- a/net/sctp/associola.c > >+++ b/net/sctp/associola.c > >@@ -505,6 +505,9 @@ void sctp_assoc_set_primary(struct sctp_association > >*asoc, > > > > asoc->peer.primary_path = transport; > > > >+list_del_rcu(&transport->transports); > >+list_add_rcu(&transport->transports, &asoc->peer.transport_addr_list); > >+ > > /* Set a default msg_name for events. */ > > memcpy(&asoc->peer.primary_addr, &transport->ipaddr, > >sizeof(union sctp_addr)); > >@@ -1040,7 +1043,6 @@ struct sctp_chunk *sctp_get_ecne_prepend(struct > >sctp_association *asoc) > > struct sctp_transport *sctp_assoc_lookup_tsn(struct sctp_association *asoc, > >
Re: [PATCH] sctp: don't break the loop while meeting the active_path so as to find the matched transport
On Fri, Mar 08, 2013 at 03:39:37PM +0800, Xufeng Zhang wrote: > sctp_assoc_lookup_tsn() function searchs which transport a certain TSN > was sent on, if not found in the active_path transport, then go search > all the other transports in the peer's transport_addr_list, however, we > should continue to the next entry rather than break the loop when meet > the active_path transport. > > Signed-off-by: Xufeng Zhang > --- > net/sctp/associola.c |2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/net/sctp/associola.c b/net/sctp/associola.c > index 43cd0dd..d2709e2 100644 > --- a/net/sctp/associola.c > +++ b/net/sctp/associola.c > @@ -1079,7 +1079,7 @@ struct sctp_transport *sctp_assoc_lookup_tsn(struct > sctp_association *asoc, > transports) { > > if (transport == active) > - break; > + continue; > list_for_each_entry(chunk, &transport->transmitted, > transmitted_list) { > if (key == chunk->subh.data_hdr->tsn) { > -- > 1.7.0.2 > Based on discussion with Vlad, it seems theres arguably some work to do on access to the transport_addr_list before my solution is viable, so until I get to that I'm acking this patch. Acked-by: Neil Horman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] netprio_cgroup: Optimize the priomap copy loop slightly
On Tue, Sep 11, 2012 at 12:42:23PM +0100, David Laight wrote: > > - for (i = 0; > > -old_priomap && (i < old_priomap->priomap_len); > > -i++) > > - new_priomap->priomap[i] = old_priomap->priomap[i]; > > + if (old_priomap) { > > + old_len = old_priomap->priomap_len; > > + > > + for (i = 0; i < old_len; i++) > > + new_priomap->priomap[i] = old_priomap->priomap[i]; > > + } > > Or: > memcpy(new_priomap->priomap, old_priomap->priomap, > old_priomap->priomap_len * sizeof old_priomap->priomap[0]); > > David > Yes, the memcpy would be better here. Neil -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm] coredump: add support for %d=__get_dumpable() in core name
On Thu, Sep 13, 2012 at 07:28:17PM +0200, Oleg Nesterov wrote: > https://bugzilla.redhat.com/show_bug.cgi?id=787135 > > Some coredump handlers want to create a core file in a way compatible > with standard behavior. Standard behavior with fs.suid_dumpable = 2 > is to create core file with uid=gid=0. However, there was no way for > coredump handler to know that the process being dumped was suid'ed. > > This patch adds the new %d specifier for format_corename() which > simply reports __get_dumpable(mm->flags), this is compatible with > /proc/sys/fs/suid_dumpable we already have. > > By-discussion-with: Denys Vlasenko > Signed-off-by: Oleg Nesterov > --- > fs/coredump.c | 10 +++--- > 1 files changed, 7 insertions(+), 3 deletions(-) > > diff --git a/fs/coredump.c b/fs/coredump.c > index 1935b4d..aad8715 100644 > --- a/fs/coredump.c > +++ b/fs/coredump.c > @@ -149,7 +149,7 @@ put_exe_file: > * name into corename, which must have space for at least > * CORENAME_MAX_SIZE bytes plus one byte for the zero terminator. > */ > -static int format_corename(struct core_name *cn, long signr) > +static int format_corename(struct core_name *cn, struct coredump_params > *cprm) > { > const struct cred *cred = current_cred(); > const char *pat_ptr = core_pattern; > @@ -194,9 +194,13 @@ static int format_corename(struct core_name *cn, long > signr) > case 'g': > err = cn_printf(cn, "%d", cred->gid); > break; > + case 'd': > + err = cn_printf(cn, "%d", > + __get_dumpable(cprm->mm_flags)); > + break; > /* signal that caused the coredump */ > case 's': > - err = cn_printf(cn, "%ld", signr); > + err = cn_printf(cn, "%ld", cprm->signr); > break; > /* UNIX time of coredump */ > case 't': { > @@ -524,7 +528,7 @@ void do_coredump(long signr, int exit_code, struct > pt_regs *regs) >*/ > clear_thread_flag(TIF_SIGPENDING); > > - ispipe = format_corename(&cn, signr); > + ispipe = format_corename(&cn, &cprm); > > if (ispipe) { > int dump_count; > -- > 1.5.5.1 > > > Looks reasonable Acked-by: Neil Horman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH], issue EOI to APIC prior to calling crash_kexec in die_nmi path
> > Neil, is it possible to do some serial console debugging to find out > where exactly we are hanging? Beats me, what's that operation which can > not be executed while being in NMI handler and makes system to hang. I am > also curious to know if it is nested NMI case. > > Thanks > Vivek > Hey- Some intermediate results: I've instrumented head.S in the kernel with the following code: #define SEROUT(z) \ mov $0x3F8,%dx;\ movb z,%al;\ outb %dx And peppered different ascii characters throughout the startup code from startup_32 to right before the jump to start_kernel. When I panic the system via an: echo c > /proc/sysrq_trigger I see an appropriate sequence of characters on the serial console When I panic the box by forcing an NMI watchdog timeout however, I see nothing. The machine will either hang, or reset into the bios. I think this is reasonably conclusive in its indication that we're not getting into the second kernel when this problem occurs. Next I'll instrument the purgatory code in a simmilar way. Regards Neil > ___ > kexec mailing list > [EMAIL PROTECTED] > http://lists.infradead.org/mailman/listinfo/kexec -- /**** * Neil Horman <[EMAIL PROTECTED]> * Software Engineer, Red Hat / -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/7] netpoll: make __netpoll_cleanup non-block
On Fri, Jul 27, 2012 at 11:37:59PM +0800, Cong Wang wrote: > Like the previous patch, slave_disable_netpoll() and __netpoll_cleanup() > may be called with read_lock() held too, so we should make them > non-block, by moving the cleanup and kfree() to call_rcu_bh() callbacks. > > Cc: "David S. Miller" > Signed-off-by: Cong Wang > --- > drivers/net/bonding/bond_main.c |4 +-- > include/linux/netpoll.h |3 ++ > net/8021q/vlan_dev.c|6 + > net/bridge/br_device.c |6 + > net/core/netpoll.c | 42 +- > 5 files changed, 38 insertions(+), 23 deletions(-) > > struct netpoll_info *npinfo; > @@ -903,20 +921,24 @@ void __netpoll_cleanup(struct netpoll *np) > ops->ndo_netpoll_cleanup(np->dev); > > RCU_INIT_POINTER(np->dev->npinfo, NULL); > + call_rcu_bh(&npinfo->rcu, rcu_cleanup_netpoll_info); > + } > +} > +EXPORT_SYMBOL_GPL(__netpoll_cleanup); > > - /* avoid racing with NAPI reading npinfo */ > - synchronize_rcu_bh(); > +static void rcu_cleanup_netpoll(struct rcu_head *rcu_head) > +{ > + struct netpoll *np = container_of(rcu_head, struct netpoll, rcu); > > - skb_queue_purge(&npinfo->arp_tx); > - skb_queue_purge(&npinfo->txq); > - cancel_delayed_work_sync(&npinfo->tx_work); > + __netpoll_cleanup(np); > + kfree(np); > +} > > - /* clean after last, unfinished work */ > - __skb_queue_purge(&npinfo->txq); > - kfree(npinfo); > - } > +void __netpoll_free_rcu(struct netpoll *np) > +{ > + call_rcu_bh(&np->rcu, rcu_cleanup_netpoll); Here, and above I see you using an rcu_head to defer cleanup, until after all pointer uses are dropped, but I don't see any modification of code points that dereference any struct netpoll pointers to include rcu_read_lock()/rcu_read_unlock(). Without those using rcu to defer cleanup is pointless, as the rcu code won't know when its safe to run. You're no better off that you would be just calling __netpoll_cleanup directly. Neil > } > -EXPORT_SYMBOL_GPL(__netpoll_cleanup); > +EXPORT_SYMBOL_GPL(__netpoll_free_rcu); > > void netpoll_cleanup(struct netpoll *np) > { > -- > 1.7.7.6 > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] ACPI: Fix osc flag setup ordering to allow pcie hotplug use when available
From: root Somewhere between 3.9 and 3.10 it seems the order in which pcie and acpi probed slots for hotplug capabilites got reversed. While this isn't a big deal, it did uncover a bug in the ACPI bus setup path. Specifically, acpi_pci_root_add calls pci_acpi_scan_root before setting the osc flags for the device handle. pci_acpi_scan_root, among other things uses device_is_managed_by_native_pciehp() to determine if a given slot has pcie hotplug capabilties, whcih checks the devices OSC_PCI_EXPRESS_NATIVE_HP_CONTROL flag. Since that flag is not set until after pci_acpi_scan_root_completes, the acpi code never sees that pcie slots are hotplug capable and configures them all to use acpi instead. Fix is pretty simple, just defer the scan until after the osc flags have been set on the device. Tested by myself and it seems to work well. Signed-off-by: Neil Horman CC: Len Brown CC: "Rafael J. Wysocki" CC: Bjorn Helgaas CC: linux-a...@vger.kernel.org CC: linux-...@vger.kernel.org --- Change notes: v2) eferred the disabling of aspm until after the acpi scan of the pci bus is complete. This was done to allow proper handling of pcie 1.1 devices, as per: commit b8178f130e25c1bdac1c33e0996f1ff6e20ec08e Author: Bjorn Helgaas Date: Mon Apr 1 15:47:39 2013 -0600 Revert "PCI/ACPI: Request _OSC control before scanning PCI root bus" As discussed previously in the thread the disable logic for aspm needs to be untangled and refactored, which is not something I'm sufficently versed in teh hotplug code to do right now, but this fixes the problem above, and prevents the problem that necessitated the revert without adding any visible complexity to the user, so I think its ok. --- drivers/acpi/pci_root.c | 54 + 1 file changed, 32 insertions(+), 22 deletions(-) diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c index 5917839..1e80a90 100644 --- a/drivers/acpi/pci_root.c +++ b/drivers/acpi/pci_root.c @@ -378,6 +378,7 @@ static int acpi_pci_root_add(struct acpi_device *device, struct acpi_pci_root *root; u32 flags, base_flags; acpi_handle handle = device->handle; + bool no_aspm = false; root = kzalloc(sizeof(struct acpi_pci_root), GFP_KERNEL); if (!root) @@ -437,27 +438,6 @@ static int acpi_pci_root_add(struct acpi_device *device, flags = base_flags = OSC_PCI_SEGMENT_GROUPS_SUPPORT; acpi_pci_osc_support(root, flags); - /* -* TBD: Need PCI interface for enumeration/configuration of roots. -*/ - - /* -* Scan the Root Bridge -* -* Must do this prior to any attempt to bind the root device, as the -* PCI namespace does not get created until this call is made (and -* thus the root bridge's pci_dev does not exist). -*/ - root->bus = pci_acpi_scan_root(root); - if (!root->bus) { - dev_err(&device->dev, - "Bus %04x:%02x not present in PCI namespace\n", - root->segment, (unsigned int)root->secondary.start); - result = -ENODEV; - goto end; - } - - /* Indicate support for various _OSC capabilities. */ if (pci_ext_cfg_avail()) flags |= OSC_EXT_PCI_CONFIG_SUPPORT; if (pcie_aspm_support_enabled()) { @@ -512,7 +492,14 @@ static int acpi_pci_root_add(struct acpi_device *device, acpi_format_exception(status), flags); dev_info(&device->dev, "ACPI _OSC control for PCIe not granted, disabling ASPM\n"); - pcie_no_aspm(); + /* +* We want to disable aspm here, but aspm_disabled +* needs to remain in its state from boot so that we +* properly handle pcie 1.1 devices. So we set this +* flag here, to defer the action until after the acpi +* root scan +*/ + no_aspm = true; } } else { dev_info(&device->dev, @@ -520,6 +507,29 @@ static int acpi_pci_root_add(struct acpi_device *device, "(_OSC support mask: 0x%02x)\n", flags); } + /* +* TBD: Need PCI interface for enumeration/configuration of roots. +*/ + + /* +* Scan the Root Bridge +* +* Must do this prior to any attempt to bind the root device, as the +* PCI namespace does not get created until this call is made (and +* thus the root bridge's pci_dev does not exist). +*/ + root->bus = pci_acpi_scan_root(root); + if (!root->bu
Re: [PATCH v2] ACPI: Fix osc flag setup ordering to allow pcie hotplug use when available
On Mon, Aug 26, 2013 at 11:36:21AM -0400, Neil Horman wrote: > From: root > > Somewhere between 3.9 and 3.10 it seems the order in which pcie and acpi > probed > slots for hotplug capabilites got reversed. While this isn't a big deal, it > did > uncover a bug in the ACPI bus setup path. Specifically, acpi_pci_root_add > calls > pci_acpi_scan_root before setting the osc flags for the device handle. > pci_acpi_scan_root, among other things uses > device_is_managed_by_native_pciehp() > to determine if a given slot has pcie hotplug capabilties, whcih checks the > devices OSC_PCI_EXPRESS_NATIVE_HP_CONTROL flag. Since that flag is not set > until after pci_acpi_scan_root_completes, the acpi code never sees that pcie > slots are hotplug capable and configures them all to use acpi instead. > > Fix is pretty simple, just defer the scan until after the osc flags have been > set on the device. Tested by myself and it seems to work well. > > Signed-off-by: Neil Horman > CC: Len Brown > CC: "Rafael J. Wysocki" > CC: Bjorn Helgaas > CC: linux-a...@vger.kernel.org > CC: linux-...@vger.kernel.org > Sorry, self NAK on this, sorry, I forgot to fix up authorship from my development machine. I'll resend in a second. Neil -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3] ACPI: Fix osc flag setup ordering to allow pcie hotplug use when available
Somewhere between 3.9 and 3.10 it seems the order in which pcie and acpi probed slots for hotplug capabilites got reversed. While this isn't a big deal, it did uncover a bug in the ACPI bus setup path. Specifically, acpi_pci_root_add calls pci_acpi_scan_root before setting the osc flags for the device handle. pci_acpi_scan_root, among other things uses device_is_managed_by_native_pciehp() to determine if a given slot has pcie hotplug capabilties, whcih checks the devices OSC_PCI_EXPRESS_NATIVE_HP_CONTROL flag. Since that flag is not set until after pci_acpi_scan_root_completes, the acpi code never sees that pcie slots are hotplug capable and configures them all to use acpi instead. Fix is pretty simple, just defer the scan until after the osc flags have been set on the device. Tested by myself and it seems to work well. Signed-off-by: Neil Horman CC: Len Brown CC: "Rafael J. Wysocki" CC: Bjorn Helgaas CC: linux-a...@vger.kernel.org CC: linux-...@vger.kernel.org --- Change notes: v2) eferred the disabling of aspm until after the acpi scan of the pci bus is complete. This was done to allow proper handling of pcie 1.1 devices, as per: commit b8178f130e25c1bdac1c33e0996f1ff6e20ec08e Author: Bjorn Helgaas Date: Mon Apr 1 15:47:39 2013 -0600 Revert "PCI/ACPI: Request _OSC control before scanning PCI root bus" As discussed previously in the thread the disable logic for aspm needs to be untangled and refactored, which is not something I'm sufficently versed in teh hotplug code to do right now, but this fixes the problem above, and prevents the problem that necessitated the revert without adding any visible complexity to the user, so I think its ok. v3) Fixup stupid authorship error --- drivers/acpi/pci_root.c | 54 + 1 file changed, 32 insertions(+), 22 deletions(-) diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c index 5917839..1e80a90 100644 --- a/drivers/acpi/pci_root.c +++ b/drivers/acpi/pci_root.c @@ -378,6 +378,7 @@ static int acpi_pci_root_add(struct acpi_device *device, struct acpi_pci_root *root; u32 flags, base_flags; acpi_handle handle = device->handle; + bool no_aspm = false; root = kzalloc(sizeof(struct acpi_pci_root), GFP_KERNEL); if (!root) @@ -437,27 +438,6 @@ static int acpi_pci_root_add(struct acpi_device *device, flags = base_flags = OSC_PCI_SEGMENT_GROUPS_SUPPORT; acpi_pci_osc_support(root, flags); - /* -* TBD: Need PCI interface for enumeration/configuration of roots. -*/ - - /* -* Scan the Root Bridge -* -* Must do this prior to any attempt to bind the root device, as the -* PCI namespace does not get created until this call is made (and -* thus the root bridge's pci_dev does not exist). -*/ - root->bus = pci_acpi_scan_root(root); - if (!root->bus) { - dev_err(&device->dev, - "Bus %04x:%02x not present in PCI namespace\n", - root->segment, (unsigned int)root->secondary.start); - result = -ENODEV; - goto end; - } - - /* Indicate support for various _OSC capabilities. */ if (pci_ext_cfg_avail()) flags |= OSC_EXT_PCI_CONFIG_SUPPORT; if (pcie_aspm_support_enabled()) { @@ -512,7 +492,14 @@ static int acpi_pci_root_add(struct acpi_device *device, acpi_format_exception(status), flags); dev_info(&device->dev, "ACPI _OSC control for PCIe not granted, disabling ASPM\n"); - pcie_no_aspm(); + /* +* We want to disable aspm here, but aspm_disabled +* needs to remain in its state from boot so that we +* properly handle pcie 1.1 devices. So we set this +* flag here, to defer the action until after the acpi +* root scan +*/ + no_aspm = true; } } else { dev_info(&device->dev, @@ -520,6 +507,29 @@ static int acpi_pci_root_add(struct acpi_device *device, "(_OSC support mask: 0x%02x)\n", flags); } + /* +* TBD: Need PCI interface for enumeration/configuration of roots. +*/ + + /* +* Scan the Root Bridge +* +* Must do this prior to any attempt to bind the root device, as the +* PCI namespace does not get created until this call is made (and +* thus the root bridge's pci_dev does not exist). +*/ + root->bus = pci_acpi_scan_root(root); + if (
Re: [PATCH net-next] drivers:net: Convert dma_alloc_coherent(...__GFP_ZERO) to dma_zalloc_coherent
On Mon, Aug 26, 2013 at 10:45:23PM -0700, Joe Perches wrote: > __GFP_ZERO is an uncommon flag and perhaps is better > not used. static inline dma_zalloc_coherent exists > so convert the uses of dma_alloc_coherent with __GFP_ZERO > to the more common kernel style with zalloc. > > Remove memset from the static inline dma_zalloc_coherent > and add just one use of __GFP_ZERO instead. > > Trivially reduces the size of the existing uses of > dma_zalloc_coherent. > > Realign arguments as appropriate. > > Signed-off-by: Joe Perches Acked-by: Neil Horman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3] ACPI: Fix osc flag setup ordering to allow pcie hotplug use when available
On Tue, Aug 27, 2013 at 03:34:11PM -0600, Bjorn Helgaas wrote: > [+cc Stefan] > > On Mon, Aug 26, 2013 at 9:39 AM, Neil Horman wrote: > > Somewhere between 3.9 and 3.10 it seems the order in which pcie and acpi > > probed > > slots for hotplug capabilites got reversed. While this isn't a big deal, > > it did > > uncover a bug in the ACPI bus setup path. Specifically, acpi_pci_root_add > > calls > > pci_acpi_scan_root before setting the osc flags for the device handle. > > pci_acpi_scan_root, among other things uses > > device_is_managed_by_native_pciehp() > > to determine if a given slot has pcie hotplug capabilties, whcih checks the > > devices OSC_PCI_EXPRESS_NATIVE_HP_CONTROL flag. Since that flag is not set > > until after pci_acpi_scan_root_completes, the acpi code never sees that pcie > > slots are hotplug capable and configures them all to use acpi instead. > > I'd like to make it more explicit what we're fixing. Apparently this > is a regression between v3.9 and v3.10. > > Is this a fix for problems like > https://bugzilla.kernel.org/show_bug.cgi?id=60736 ? That bug is that > an ExpressCard slot doesn't work unless we boot with > "acpiphp.disable=1". I think what happens there is that acpiphp > claims the slot before we run _OSC, so pciehp doesn't claim it, even > though _OSC did grant us control over native PCIe hotplug. > Yes, that bug is precisely what I am trying to fix (although your wording above is inverted from the problem in the bug). What happens is that acpiphp claims the pcie ports, because acpiphp init runs first, and we haven't yet parsed _OSC, which would have told acpiphp to yield control to pciehp. The initial fix I proposed fixed this by parsing _OSC prior to running the acpiphp bus scan, but that reintroduced the problem of not handling pcie 1.1 aspm properly. That has been fixed in v2/v3 by deferring the setting of aspm_disable until after the bus scan. Shall I submit a v4 with an updated changelog? Best Neil -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3] ACPI: Fix osc flag setup ordering to allow pcie hotplug use when available
On Wed, Aug 28, 2013 at 07:04:47AM -0600, Bjorn Helgaas wrote: > On Tue, Aug 27, 2013 at 5:43 PM, Neil Horman wrote: > > On Tue, Aug 27, 2013 at 03:34:11PM -0600, Bjorn Helgaas wrote: > >> [+cc Stefan] > >> > >> On Mon, Aug 26, 2013 at 9:39 AM, Neil Horman wrote: > >> > Somewhere between 3.9 and 3.10 it seems the order in which pcie and acpi > >> > probed > >> > slots for hotplug capabilites got reversed. While this isn't a big > >> > deal, it did > >> > uncover a bug in the ACPI bus setup path. Specifically, > >> > acpi_pci_root_add calls > >> > pci_acpi_scan_root before setting the osc flags for the device handle. > >> > pci_acpi_scan_root, among other things uses > >> > device_is_managed_by_native_pciehp() > >> > to determine if a given slot has pcie hotplug capabilties, whcih checks > >> > the > >> > devices OSC_PCI_EXPRESS_NATIVE_HP_CONTROL flag. Since that flag is not > >> > set > >> > until after pci_acpi_scan_root_completes, the acpi code never sees that > >> > pcie > >> > slots are hotplug capable and configures them all to use acpi instead. > >> > >> I'd like to make it more explicit what we're fixing. Apparently this > >> is a regression between v3.9 and v3.10. > >> > >> Is this a fix for problems like > >> https://bugzilla.kernel.org/show_bug.cgi?id=60736 ? That bug is that > >> an ExpressCard slot doesn't work unless we boot with > >> "acpiphp.disable=1". I think what happens there is that acpiphp > >> claims the slot before we run _OSC, so pciehp doesn't claim it, even > >> though _OSC did grant us control over native PCIe hotplug. > >> > > Yes, that bug is precisely what I am trying to fix (although your wording > > above is > > inverted from the problem in the bug). What happens is that acpiphp claims > > the > > pcie ports, because acpiphp init runs first, and we haven't yet parsed _OSC, > > which would have told acpiphp to yield control to pciehp. The initial fix I > > proposed fixed this by parsing _OSC prior to running the acpiphp bus scan, > > but > > that reintroduced the problem of not handling pcie 1.1 aspm properly. That > > has > > been fixed in v2/v3 by deferring the setting of aspm_disable until after > > the bus > > scan. > > > > Shall I submit a v4 with an updated changelog? > > No need, I'll update the changelog and include the bugzilla reference > and post it for your approval. Thanks! > > Bjorn > Great, thanks! Neil -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] iommu: Remove stack trace from broken irq remapping warning
On Fri, Sep 27, 2013 at 12:53:35PM -0400, Neil Horman wrote: > The warning for the irq remapping broken check in intel_irq_remapping.c is > pretty pointless. We need the warning, but we know where its comming from, > the > stack trace will always be the same, and it needlessly triggers things like > Abrt. This changes the warning to just print a text warning about BIOS being > broken, without the stack trace, then sets the appropriate taint bit. Since > we > automatically disable irq remapping, theres no need to contiue making Abrt > jump > at this problem > > Signed-off-by: Neil Horman > CC: Joerg Roedel > CC: Bjorn Helgaas > CC: Andy Lutomirski > CC: Konrad Rzeszutek Wilk > CC: Sebastian Andrzej Siewior > --- > drivers/iommu/intel_irq_remapping.c | 13 +++-- > 1 file changed, 7 insertions(+), 6 deletions(-) > > diff --git a/drivers/iommu/intel_irq_remapping.c > b/drivers/iommu/intel_irq_remapping.c > index f71673d..b97d70b 100644 > --- a/drivers/iommu/intel_irq_remapping.c > +++ b/drivers/iommu/intel_irq_remapping.c > @@ -525,12 +525,13 @@ static int __init intel_irq_remapping_supported(void) > if (disable_irq_remap) > return 0; > if (irq_remap_broken) { > - WARN_TAINT(1, TAINT_FIRMWARE_WORKAROUND, > -"This system BIOS has enabled interrupt remapping\n" > -"on a chipset that contains an erratum making that\n" > -"feature unstable. To maintain system stability\n" > -"interrupt remapping is being disabled. Please\n" > -"contact your BIOS vendor for an update\n"); > + printk(KERN_WARNING > + "This system BIOS has enabled interrupt remapping\n" > + "on a chipset that contains an erratum making that\n" > + "feature unstable. To maintain system stability\n" > + "interrupt remapping is being disabled. Please\n" > + "contact your BIOS vendor for an update\n"); > + add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK); > disable_irq_remap = 1; > return 0; > } > -- > 1.8.3.1 > > Ping Bjorn, Jeorg, any thoughts here? Neil -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] iommu: Remove stack trace from broken irq remapping warning
On Thu, Oct 03, 2013 at 10:08:24PM +0200, Joerg Roedel wrote: > On Thu, Oct 03, 2013 at 01:21:42PM -0400, Neil Horman wrote: > > On Fri, Sep 27, 2013 at 12:53:35PM -0400, Neil Horman wrote: > > > The warning for the irq remapping broken check in intel_irq_remapping.c is > > > pretty pointless. We need the warning, but we know where its comming > > > from, the > > > stack trace will always be the same, and it needlessly triggers things > > > like > > > Abrt. This changes the warning to just print a text warning about BIOS > > > being > > > broken, without the stack trace, then sets the appropriate taint bit. > > > Since we > > > automatically disable irq remapping, theres no need to contiue making > > > Abrt jump > > > at this problem > > > > > > Signed-off-by: Neil Horman > > > CC: Joerg Roedel > > > CC: Bjorn Helgaas > > > CC: Andy Lutomirski > > > CC: Konrad Rzeszutek Wilk > > > CC: Sebastian Andrzej Siewior > > > > Ping Bjorn, Jeorg, any thoughts here? > > Yes, the patch is doing the right thing. I have it already on my list > and will merge it soon. > Awesome, thanks guys. Regarding the taint, I'll propose something for that early next week. Regards Neil > > Joerg > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] iommu/vt-d: Expand interrupt remapping quirk to cover x58 chipset
Recently we added an early quirk to detect 5500/5520 chipsets with early revisions that had problems with irq draining with interrupt remapping enabled: commit 03bbcb2e7e292838bb0244f5a7816d194c911d62 Author: Neil Horman Date: Tue Apr 16 16:38:32 2013 -0400 iommu/vt-d: add quirk for broken interrupt remapping on 55XX chipsets It turns out this same problem is present in the intel X58 chipset as well. See errata 69 here: http://www.intel.com/content/www/us/en/chipsets/x58-express-specification-update.html This patch extends the pci early quirk so that the chip devices/revisions specified in the above update are also covered in the same way: Signed-off-by: Neil Horman Reviewed-by: Jan Beulich CC: Jan Beulich CC: Joerg Roedel CC: Andrew Cooper CC: Malcolm Crossley CC: Prarit Bhargava CC: Don Zickus CC: Don Dutile CC: Thomas Gleixner CC: Ingo Molnar CC: "H. Peter Anvin" CC: x...@kernel.org (maintainer:X86 ARCHITECTURE...) CC: sta...@vger.kernel.org --- arch/x86/kernel/early-quirks.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c index 94ab6b9..743d583 100644 --- a/arch/x86/kernel/early-quirks.c +++ b/arch/x86/kernel/early-quirks.c @@ -196,15 +196,23 @@ static void __init ati_bugs_contd(int num, int slot, int func) static void __init intel_remapping_check(int num, int slot, int func) { u8 revision; + u16 device; + device = read_pci_config_16(num, slot, func, PCI_DEVICE_ID); revision = read_pci_config_byte(num, slot, func, PCI_REVISION_ID); /* -* Revision 0x13 of this chipset supports irq remapping -* but has an erratum that breaks its behavior, flag it as such +* Revision 13 of all triggering devices id in this quirk have +* a problem draining interrupts when irq remapping is enabled, +* and should be flagged as broken. Additionally revisions 0x12 +* and 0x22 of device id 0x3405 has this problem. */ if (revision == 0x13) set_irq_remapping_broken(); + else if ((device == 0x3405) && + ((revision == 0x12) || +(revision == 0x22))) + set_irq_remapping_broken(); } @@ -239,8 +247,11 @@ static struct chipset early_qrk[] __initdata = { PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs_contd }, { PCI_VENDOR_ID_INTEL, 0x3403, PCI_CLASS_BRIDGE_HOST, PCI_BASE_CLASS_BRIDGE, 0, intel_remapping_check }, + { PCI_VENDOR_ID_INTEL, 0x3405, PCI_CLASS_BRIDGE_HOST, + PCI_BASE_CLASS_BRIDGE, 0, intel_remapping_check }, { PCI_VENDOR_ID_INTEL, 0x3406, PCI_CLASS_BRIDGE_HOST, PCI_BASE_CLASS_BRIDGE, 0, intel_remapping_check }, + {} }; -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] iommu/vt-d: Expand interrupt remapping quirk to cover x58 chipset
On Wed, Jul 10, 2013 at 01:31:36PM -0400, Don Dutile wrote: > On 07/09/2013 03:11 PM, Neil Horman wrote: > >Recently we added an early quirk to detect 5500/5520 chipsets with early > >revisions that had problems with irq draining with interrupt remapping > >enabled: > > > >commit 03bbcb2e7e292838bb0244f5a7816d194c911d62 > >Author: Neil Horman > >Date: Tue Apr 16 16:38:32 2013 -0400 > > > > iommu/vt-d: add quirk for broken interrupt remapping on 55XX chipsets > > > >It turns out this same problem is present in the intel X58 chipset as well. > >See > >errata 69 here: > >http://www.intel.com/content/www/us/en/chipsets/x58-express-specification-update.html > > > >This patch extends the pci early quirk so that the chip devices/revisions > >specified in the above update are also covered in the same way: > > > >Signed-off-by: Neil Horman > >Reviewed-by: Jan Beulich > >CC: Jan Beulich > >CC: Joerg Roedel > >CC: Andrew Cooper > >CC: Malcolm Crossley > >CC: Prarit Bhargava > >CC: Don Zickus > >CC: Don Dutile > >CC: Thomas Gleixner > >CC: Ingo Molnar > >CC: "H. Peter Anvin" > >CC: x...@kernel.org (maintainer:X86 ARCHITECTURE...) > >CC: sta...@vger.kernel.org > >--- > > arch/x86/kernel/early-quirks.c | 15 +-- > > 1 file changed, 13 insertions(+), 2 deletions(-) > > > >diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c > >index 94ab6b9..743d583 100644 > >--- a/arch/x86/kernel/early-quirks.c > >+++ b/arch/x86/kernel/early-quirks.c > >@@ -196,15 +196,23 @@ static void __init ati_bugs_contd(int num, int slot, > >int func) > > static void __init intel_remapping_check(int num, int slot, int func) > > { > > u8 revision; > >+u16 device; > > > >+device = read_pci_config_16(num, slot, func, PCI_DEVICE_ID); > > revision = read_pci_config_byte(num, slot, func, PCI_REVISION_ID); > > > > /* > >- * Revision 0x13 of this chipset supports irq remapping > >- * but has an erratum that breaks its behavior, flag it as such > >+ * Revision 13 of all triggering devices id in this quirk have > >+ * a problem draining interrupts when irq remapping is enabled, > >+ * and should be flagged as broken. Additionally revisions 0x12 > >+ * and 0x22 of device id 0x3405 has this problem. > > */ > > if (revision == 0x13) > > set_irq_remapping_broken(); > >+else if ((device == 0x3405)&& > >+((revision == 0x12) || > >+ (revision == 0x22))) > >+set_irq_remapping_broken(); > > > > } > > > When discussing the original-seen errata w/Intel on 55xx chips, the > statements made were any chip with rev C1(revision = 0x21) or > greater had the correct > hw implementation for the intr-pending flush. > We knew the bug existed in the A3 (rev=0x13) rev of the chip, but the > true check should be: > revision < 0x21 > > I suspect there were multiple revs of the x58, of which B2(0x12) & C2(0x22) > were shipped to oem's, system vendors, etc. > But, in case there were any chip revisions in between these well-known values > out there, I suggest the 0x3405 check be changed to: > revision < 0x22 > > Since it's unlikely that hw degressed in design over revisions, it seems > more correct to check for revs less than a rev-value having an errata, > or conversely, a chip value >= rev-value do not have the errata. > IOW, an equal check may not provide sufficient. > Don and I discussed this offline. Given that his comments make good sense to me, I'm hesitant to apply the quirk to anything other than what the spec update says, given that its clear. Don is attempting to contact people at Intel who will be able (we hope) to give us a definitive answer on this, please hold on this patch until we have resolution on the issue. Neil -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] firmware: Be a bit more verbose about direct firmware loading failure
The direct firmware loading interface is a bit quiet about failures. Failures that occur during loading are masked if firmware exists in multiple locations, and may be masked entirely in the event that we fall back to the user mode helper code. It would be nice to see some of the more unexpected errors get logged, so in the event that you expect the direct firmware loader to work (like if CONFIG_FW_LOADER_USER_HELPER is enabled), and something goes wrong, you can figure out what happened. Signed-off-by: Neil Horman CC: Ming Lei CC: Greg Kroah-Hartman --- drivers/base/firmware_class.c | 38 +- 1 file changed, 25 insertions(+), 13 deletions(-) diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c index 10a4467..eb8fb94 100644 --- a/drivers/base/firmware_class.c +++ b/drivers/base/firmware_class.c @@ -282,31 +282,35 @@ static noinline_for_stack long fw_file_size(struct file *file) return st.size; } -static bool fw_read_file_contents(struct file *file, struct firmware_buf *fw_buf) +static int fw_read_file_contents(struct file *file, struct firmware_buf *fw_buf) { long size; char *buf; + int rc; size = fw_file_size(file); if (size <= 0) - return false; + return -EINVAL; buf = vmalloc(size); if (!buf) - return false; - if (kernel_read(file, 0, buf, size) != size) { + return -ENOMEM; + rc = kernel_read(file, 0, buf, size); + if (rc != size) { + if (rc > 0) + rc = -EIO; vfree(buf); - return false; + return rc; } fw_buf->data = buf; fw_buf->size = size; - return true; + return 0; } -static bool fw_get_filesystem_firmware(struct device *device, +static int fw_get_filesystem_firmware(struct device *device, struct firmware_buf *buf) { int i; - bool success = false; + int rc = -ENOENT; char *path = __getname(); for (i = 0; i < ARRAY_SIZE(fw_path); i++) { @@ -321,14 +325,17 @@ static bool fw_get_filesystem_firmware(struct device *device, file = filp_open(path, O_RDONLY, 0); if (IS_ERR(file)) continue; - success = fw_read_file_contents(file, buf); + rc = fw_read_file_contents(file, buf); fput(file); - if (success) + if (rc) + dev_warn(device, "firmware, attempted to load %s, but failed with error %d\n", + path, rc); + else break; } __putname(path); - if (success) { + if (!rc) { dev_dbg(device, "firmware: direct-loading firmware %s\n", buf->fw_id); mutex_lock(&fw_lock); @@ -337,7 +344,7 @@ static bool fw_get_filesystem_firmware(struct device *device, mutex_unlock(&fw_lock); } - return success; + return rc; } /* firmware holds the ownership of pages */ @@ -1086,9 +1093,14 @@ _request_firmware(const struct firmware **firmware_p, const char *name, } } - if (!fw_get_filesystem_firmware(device, fw->priv)) + ret = fw_get_filesystem_firmware(device, fw->priv); + if (ret) { + dev_warn(device, "Direct firmware load failed with error %d\n", +ret); + dev_warn(device, "Falling back to user helper\n"); ret = fw_load_from_user_helper(fw, name, device, uevent, nowait, timeout); + } /* don't cache firmware handled without uevent */ if (!ret) -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: race condition in crypto larval handling
On Fri, Sep 06, 2013 at 04:20:50PM -0700, Kees Cook wrote: > Hi, > > I've tracked down a race condition and ref counting problem in the > crypto API internals. We've been seeing it under Chrome OS, but it > seems it's not isolated to just us: > > https://code.google.com/p/chromium/issues/detail?id=244581 > http://marc.info/?l=linux-crypto-vger&m=135429403609373&w=2 > https://bugzilla.redhat.com/show_bug.cgi?id=983682 > http://www.mail-archive.com/linux-cifs@vger.kernel.org/msg07933.html > > I think I've found the basic origin of the problem. > crypto_larval_add() has synchronization to make sure only a single > larval can ever be created. That logic seems totally fine. However, > this means that crypto_larval_lookup() from two threads can return the > same larval, which means that crypto_alg_mod_lookup() runs the risk of > calling crypto_larval_kill() on the same larval twice, which does not > appear to be handled sanely. > > I can easily crash the kernel by forcing a synchronization point just > before the "return crypt_larval_add(...)" call in > crypto_larval_lookup(). Basically, I added this sloppy sync code (I > feel like there must be a better way to do this): > > +static atomic_t global_sync = ATOMIC_INIT(0); > ... > struct crypto_alg *crypto_larval_lookup(const char *name, u32 type, u32 mask) > ... > + if (strncmp(name, "asdf", 4) == 0) { > + int value; > + > + value = atomic_add_return(1, &global_sync); > + if (value == 1) { > + /* I was first to stall here, wait for inc. */ > + while (atomic_read(&global_sync) != 2) > + cpu_relax(); > + } else if (value == 2) { > + /* I was second to stall here, wait for dec. */ > + while (atomic_read(&global_sync) != 1) > + cpu_relax(); > + } else { > + BUG(); > + } > + atomic_dec(&global_sync); > + } > > return crypto_larval_add(name, type, mask); > } > > And then ran code from userspace that did, effectively: > > struct sockaddr_alg sa = { > .salg_family = AF_ALG, > .salg_type = "hash", > }; > strcpy(sa.salg_name, argv[1]); > > fork(); > ... > sds[0] = socket(AF_ALG, SOCK_SEQPACKET, 0); > bind(sds[0], (struct sockaddr *) &sa, sizeof(sa)); > > And ran this as "./tickle asdf1" to generate the two threads trying to > find an invalid alg. The race looks possible even with valid algs, but > this was the fastest path I could see to tickle the issue. > > With added printks to the kernel, it was clear that crypto_larval_kill > was being called twice on the same larval, and the list_del() call was > doing bad things. When I fixed that, the refcnt bug became very > obvious. Here's the change I made for crypto_larval_kill: > > @@ -161,7 +166,8 @@ void crypto_larval_kill(struct crypto_alg *alg) > struct crypto_larval *larval = (void *)alg; > > down_write(&crypto_alg_sem); > - list_del(&alg->cra_list); > + if (!list_empty(&alg->cra_list)) > + list_del_init(&alg->cra_list); > up_write(&crypto_alg_sem); > complete_all(&larval->completion); > crypto_alg_put(alg); > > It seems that there are too many "put" calls (mixed between > crypto_alg_put() and crypto_mod_put(), which also calls > crypto_alg_put()). See this annotated portion of > crypto_alg_mod_lookup: > > /* This can (correctly) return the same larval for two threads. */ > larval = crypto_larval_lookup(name, type, mask); > if (IS_ERR(larval) || !crypto_is_larval(larval)) > return larval; > > ok = crypto_probing_notify(CRYPTO_MSG_ALG_REQUEST, larval); > > if (ok == NOTIFY_STOP) > /* This calls crypto_mod_put(), but sometimes also get?? */ > alg = crypto_larval_wait(larval); > else { > /* This directly calls crypto_mod_put */ > crypto_mod_put(larval); > alg = ERR_PTR(-ENOENT); > } > /* This calls crypto_alg_put */ > crypto_larval_kill(larval); > return alg; > > In the two-thread situation, the first thread gets a larval with > refcnt 2 via crypto_larval_add. (Why 2?) The next thread finds the > larval via crypto_larval_add's call to __crypto_alg_lookup() and sees > the ref bump to 3. While exiting crypto_alg_mod_lookup, each thread > decrements the ref count twice. > > It seems to me like either each call to crypto_larval_lookup() should > result in a refcount bumped by two, or crypto_alg_mod_lookup() should > decrement only once, and the initial refcount should be 1 not 2 from > crypto_larval_add. But it's not clear to me which is sensible here. > > What's the right solution here? > > Thanks, > > -Kees > > -- > Kees Cook > Chrome OS
Re: [PATCH] firmware: Be a bit more verbose about direct firmware loading failure
On Wed, Sep 11, 2013 at 07:54:28PM +0800, Ming Lei wrote: > On Sat, Sep 7, 2013 at 3:36 AM, Neil Horman wrote: > > The direct firmware loading interface is a bit quiet about failures. > > Failures > > Because there are several pre-defined search paths, and generally the > requested firmware only exists in one of these paths. > This is true, but you'll note this patch doesn't make any noise in the event that a firmware isn't found until all the search paths are exhausted. I didn't consider this "unexpected". > > that occur during loading are masked if firmware exists in multiple > > locations, > > and may be masked entirely in the event that we fall back to the user mode > > You still can figure out the request falls back to user mode loading since we > have the "firmware: direct-loading firmware %s" log. > Yes, but you're looking at it backwards, that only prints out if the direct load works. If it doesn't, you get silence, which is bad. > > helper code. It would be nice to see some of the more unexpected errors get > > What are the unexpected errors? > If you get a short read in the direct load path for example, or if someone mounts an nfs share over the firmware search path and you get an ESTALE. Alternatively, if the vmalloc fails during the direct load path, these would be "unexpected" errors > > logged, so in the event that you expect the direct firmware loader to work > > (like > > if CONFIG_FW_LOADER_USER_HELPER is enabled), and something goes wrong, you > > can > > figure out what happened. > > Looks we didn't meet this case, do you have real examples? > Yeah, we had a vmalloc failure in the direct load path, and unknowingly had forgot to configure CONFIG_FW_LOADER_USER_HELPER, so the module load failed with an ENOENT, even though the firmware was clearly present on the filesystem. This patch helped us track that down. Neil > > > > Signed-off-by: Neil Horman > > CC: Ming Lei > > CC: Greg Kroah-Hartman > > --- > > drivers/base/firmware_class.c | 38 +- > > 1 file changed, 25 insertions(+), 13 deletions(-) > > > > diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c > > index 10a4467..eb8fb94 100644 > > --- a/drivers/base/firmware_class.c > > +++ b/drivers/base/firmware_class.c > > @@ -282,31 +282,35 @@ static noinline_for_stack long fw_file_size(struct > > file *file) > > return st.size; > > } > > > > -static bool fw_read_file_contents(struct file *file, struct firmware_buf > > *fw_buf) > > +static int fw_read_file_contents(struct file *file, struct firmware_buf > > *fw_buf) > > { > > long size; > > char *buf; > > + int rc; > > > > size = fw_file_size(file); > > if (size <= 0) > > - return false; > > + return -EINVAL; > > buf = vmalloc(size); > > if (!buf) > > - return false; > > - if (kernel_read(file, 0, buf, size) != size) { > > + return -ENOMEM; > > + rc = kernel_read(file, 0, buf, size); > > + if (rc != size) { > > + if (rc > 0) > > + rc = -EIO; > > vfree(buf); > > - return false; > > + return rc; > > } > > fw_buf->data = buf; > > fw_buf->size = size; > > - return true; > > + return 0; > > } > > > > -static bool fw_get_filesystem_firmware(struct device *device, > > +static int fw_get_filesystem_firmware(struct device *device, > >struct firmware_buf *buf) > > { > > int i; > > - bool success = false; > > + int rc = -ENOENT; > > char *path = __getname(); > > > > for (i = 0; i < ARRAY_SIZE(fw_path); i++) { > > @@ -321,14 +325,17 @@ static bool fw_get_filesystem_firmware(struct device > > *device, > > file = filp_open(path, O_RDONLY, 0); > > if (IS_ERR(file)) > > continue; > > - success = fw_read_file_contents(file, buf); > > + rc = fw_read_file_contents(file, buf); > > fput(file); > > - if (success) > > + if (rc) > > + dev_warn(device, "firmware, attempted to load %s, > > but failed with error %d\n", > > +
Re: [PATCH] firmware: Be a bit more verbose about direct firmware loading failure
On Thu, Sep 12, 2013 at 10:25:58AM +0800, Ming Lei wrote: > On Wed, Sep 11, 2013 at 10:19 PM, Neil Horman wrote: > > On Wed, Sep 11, 2013 at 07:54:28PM +0800, Ming Lei wrote: > >> On Sat, Sep 7, 2013 at 3:36 AM, Neil Horman wrote: > >> > The direct firmware loading interface is a bit quiet about failures. > >> > Failures > >> > >> Because there are several pre-defined search paths, and generally the > >> requested firmware only exists in one of these paths. > >> > > This is true, but you'll note this patch doesn't make any noise in the event > > that a firmware isn't found until all the search paths are exhausted. I > > didn't > > consider this "unexpected". > > Yes, it will cause noise, suppose the firmware is in the last search path, and > we may always get the warning during the first three searches, and it > is certainly > annoying, isn't it? > Please re-read the patch, then point out to me which printk the above action will trigger, because its not happening in my testing. If you'll take a look at fw_get_filesystem_firmware, you'll see that if the filp_open on a firmware file fails, we continue the for loop through the list of available search paths. No error is generated in the case you describe above. The exceptions to that rule are: 1) If no file is found in _any_ of the search paths, in which case fw_get_filesystem_firmware will return -ENOENT, causing _request_firmware to print "Direct firmware load failed with error -ENOENT. Falling back to user helper". 2) If a file is successfully opened by fw_get_filesystem_firmware, but something goes wrong during the read in fw_read_file_contents, we print "firmware attempted to load , but failed with error ", followed by the printk from (1) indicating that we are falling back to the user mode helper path. Both of these execptions should be rare, and are something the administrator will want to know about, so as not to confuse the real error with the mystery -ENOENT you would get if you fell back to the user mode helepr and it wansn't configured on in the running kernel. > > > >> > that occur during loading are masked if firmware exists in multiple > >> > locations, > >> > and may be masked entirely in the event that we fall back to the user > >> > mode > >> > >> You still can figure out the request falls back to user mode loading since > >> we > >> have the "firmware: direct-loading firmware %s" log. > >> > > Yes, but you're looking at it backwards, that only prints out if the direct > > load > > works. If it doesn't, you get silence, which is bad. > > OK, you can change to only log the failure. > That is exactly what this patch does. > > > >> > helper code. It would be nice to see some of the more unexpected errors > >> > get > >> > >> What are the unexpected errors? > >> > > If you get a short read in the direct load path for example, or if someone > > mounts an nfs share over the firmware search path and you get an ESTALE. > > That is easy to find, and no one should mount one fs on firmware path. > Thats really rather the problem isn't it? this patch is an assertion that its not that easy to find the root cause of a firmware load problem currently, even if you haven't done something dumb, like mount a network FS on your firmware path. And I agree you shouldn't do such things, but that doesnt' mean that people won't or have good reason to try (I can certainly see mounting the firmware path on an NFS mount for embedded system development with limited storage). Regardless, when people do do something silly, we should tell them so, not mask the problem behind a mystery -ENOENT error. > > Alternatively, if the vmalloc fails during the direct load path, these > > would be > > "unexpected" errors > > This one might make sense since size of some firmwares may be several mega > bytes, and vmalloc space is a bit limited on 32bit arch, so how about just log > this failure in fw_read_file_contents()? > It does exactly that, and a little more. Why just catch the vmalloc error? What if kernel_read fails for some reason? I see no reason to catch the vmalloc error specifcally, when we can catch any error generated by direct read path (that doesn't include simple file not found errors for any single entry in the search path). > > > >> > logged, so in the event that you expect the direct firmware loader to > >> > work (like > >> > if CONFIG_FW_LOADER_USER_HELPER is enabled), and something goes wrong, > >&
Re: [PATCH] firmware: Be a bit more verbose about direct firmware loading failure
On Thu, Sep 12, 2013 at 10:30:59PM +0800, Ming Lei wrote: > On Thu, Sep 12, 2013 at 9:16 PM, Neil Horman wrote: > >> > > Please re-read the patch, then point out to me which printk the above action > > will trigger, because its not happening in my testing. If you'll take a > > look at > > fw_get_filesystem_firmware, you'll see that if the filp_open on a firmware > > file > > fails, we continue the for loop through the list of available search paths. > > No > > error is generated in the case you describe above. > > You are right, sorry for missing "if (IS_ERR(file)) continue;", and looks > the > patch is good. > > Acked-by: Ming Lei > > Thanks, Awesome, thanks! Neil > -- > Ming Lei > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] firmware: Be a bit more verbose about direct firmware loading failure
On Thu, Sep 12, 2013 at 03:46:30PM -0300, Henrique de Moraes Holschuh wrote: > On Thu, 12 Sep 2013, Neil Horman wrote: > > Both of these execptions should be rare, and are something the administrator > > will want to know about, so as not to confuse the real error with the > > mystery > > -ENOENT you would get if you fell back to the user mode helepr and it > > wansn't > > configured on in the running kernel. > > Except, of course, for Intel processor microcode updates, which are going to > cause ENOENT on a large number of systems. > > This will generate a large number of questions by users on the distro MLs. > > However, IMHO this is *not* a reason to refuse this patch series. If > anything, at least for Debian I will use it as an opportunity to educate > people about the existence of microcode update packages in "non-free". > I agree. If people are running with downlevel microcode, they shold know about it. You can't expect request_firmware to fail silently. If people complain, I think the right solution would be to add a test to the microcode_request_fw function to check for the existence of the file before requesting it. Neil > -- > "One disk to rule them all, One disk to find them. One disk to bring > them all and in the darkness grind them. In the Land of Redmond > where the shadows lie." -- The Silicon Valley Tarot > Henrique Holschuh > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] firmware: Be a bit more verbose about direct firmware loading failure
On Fri, Sep 13, 2013 at 10:56:39AM -0300, Henrique de Moraes Holschuh wrote: > On Thu, 12 Sep 2013, Neil Horman wrote: > > On Thu, Sep 12, 2013 at 03:46:30PM -0300, Henrique de Moraes Holschuh wrote: > > > On Thu, 12 Sep 2013, Neil Horman wrote: > > > > Both of these execptions should be rare, and are something the > > > > administrator > > > > will want to know about, so as not to confuse the real error with the > > > > mystery > > > > -ENOENT you would get if you fell back to the user mode helepr and it > > > > wansn't > > > > configured on in the running kernel. > > > > > > Except, of course, for Intel processor microcode updates, which are going > > > to > > > cause ENOENT on a large number of systems. > > > > > > This will generate a large number of questions by users on the distro MLs. > > > > > > However, IMHO this is *not* a reason to refuse this patch series. If > > > anything, at least for Debian I will use it as an opportunity to educate > > > people about the existence of microcode update packages in "non-free". > > > > > I agree. If people are running with downlevel microcode, they shold know > > about > > it. You can't expect request_firmware to fail silently. If people > > complain, I > > think the right solution would be to add a test to the microcode_request_fw > > function to check for the existence of the file before requesting it. > > Make it a firmware loader API for "optional" firmware (i.e. which doesn't > complain on ENOENT), leaving for the caller the detail of whether a > firmware-not-there failure should be logged or not. > That makes sense, I'll subit that in a separate patch this afternoon Thanks! Neil > -- > "One disk to rule them all, One disk to find them. One disk to bring > them all and in the darkness grind them. In the Land of Redmond > where the shadows lie." -- The Silicon Valley Tarot > Henrique Holschuh > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3] ACPI: Fix osc flag setup ordering to allow pcie hotplug use when available
On Thu, Aug 29, 2013 at 11:47:13AM -0600, Bjorn Helgaas wrote: > On Mon, Aug 26, 2013 at 11:39:13AM -0400, Neil Horman wrote: > > Somewhere between 3.9 and 3.10 it seems the order in which pcie and acpi > > probed > > slots for hotplug capabilites got reversed. While this isn't a big deal, > > it did > > uncover a bug in the ACPI bus setup path. Specifically, acpi_pci_root_add > > calls > > pci_acpi_scan_root before setting the osc flags for the device handle. > > pci_acpi_scan_root, among other things uses > > device_is_managed_by_native_pciehp() > > to determine if a given slot has pcie hotplug capabilties, whcih checks the > > devices OSC_PCI_EXPRESS_NATIVE_HP_CONTROL flag. Since that flag is not set > > until after pci_acpi_scan_root_completes, the acpi code never sees that pcie > > slots are hotplug capable and configures them all to use acpi instead. > > > > Fix is pretty simple, just defer the scan until after the osc flags have > > been > > set on the device. Tested by myself and it seems to work well. > > > > Signed-off-by: Neil Horman > > CC: Len Brown > > CC: "Rafael J. Wysocki" > > CC: Bjorn Helgaas > > CC: linux-a...@vger.kernel.org > > CC: linux-...@vger.kernel.org > > > > --- > > Change notes: > > v2) eferred the disabling of aspm until after the acpi scan of the pci bus > > is > > complete. This was done to allow proper handling of pcie 1.1 devices, as > > per: > > > > commit b8178f130e25c1bdac1c33e0996f1ff6e20ec08e > > Author: Bjorn Helgaas > > Date: Mon Apr 1 15:47:39 2013 -0600 > > > > Revert "PCI/ACPI: Request _OSC control before scanning PCI root bus" > > > > As discussed previously in the thread the disable logic for aspm needs to be > > untangled and refactored, which is not something I'm sufficently versed in > > teh > > hotplug code to do right now, but this fixes the problem above, and > > prevents the > > problem that necessitated the revert without adding any visible complexity > > to > > the user, so I think its ok. > > > > v3) Fixup stupid authorship error > > --- > > drivers/acpi/pci_root.c | 54 > > + > > 1 file changed, 32 insertions(+), 22 deletions(-) > > > > diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c > > index 5917839..1e80a90 100644 > > --- a/drivers/acpi/pci_root.c > > +++ b/drivers/acpi/pci_root.c > > @@ -378,6 +378,7 @@ static int acpi_pci_root_add(struct acpi_device *device, > > struct acpi_pci_root *root; > > u32 flags, base_flags; > > acpi_handle handle = device->handle; > > + bool no_aspm = false; > > > > root = kzalloc(sizeof(struct acpi_pci_root), GFP_KERNEL); > > if (!root) > > @@ -437,27 +438,6 @@ static int acpi_pci_root_add(struct acpi_device > > *device, > > flags = base_flags = OSC_PCI_SEGMENT_GROUPS_SUPPORT; > > acpi_pci_osc_support(root, flags); > > > > - /* > > -* TBD: Need PCI interface for enumeration/configuration of roots. > > -*/ > > - > > - /* > > -* Scan the Root Bridge > > -* > > -* Must do this prior to any attempt to bind the root device, as the > > -* PCI namespace does not get created until this call is made (and > > -* thus the root bridge's pci_dev does not exist). > > -*/ > > - root->bus = pci_acpi_scan_root(root); > > - if (!root->bus) { > > - dev_err(&device->dev, > > - "Bus %04x:%02x not present in PCI namespace\n", > > - root->segment, (unsigned int)root->secondary.start); > > - result = -ENODEV; > > - goto end; > > - } > > - > > - /* Indicate support for various _OSC capabilities. */ > > if (pci_ext_cfg_avail()) > > flags |= OSC_EXT_PCI_CONFIG_SUPPORT; > > if (pcie_aspm_support_enabled()) { > > @@ -512,7 +492,14 @@ static int acpi_pci_root_add(struct acpi_device > > *device, > > acpi_format_exception(status), flags); > > dev_info(&device->dev, > > "ACPI _OSC control for PCIe not granted, > > disabling ASPM\n"); > > - pcie_no_aspm(); > > + /* > > +* We want to disable aspm here, but aspm_disabled > > +
[PATCH v4] ACPI: Fix osc flag setup ordering to allow pcie hotplug use when available
This is a fix for: https://bugzilla.kernel.org/show_bug.cgi?id=60736 During the 3.8 devel cycle: commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6 Author: Taku Izumi Date: Tue Oct 30 15:27:13 2012 +0900 PCI/ACPI: Request _OSC control before scanning PCI root bus went in to allow us to query the pcie hotplug flags during the acpi bus scan. It however caused problems with the disabling of pcie aspm, and so: commit b8178f130e25c1bdac1c33e0996f1ff6e20ec08e Author: Bjorn Helgaas Date: Mon Apr 1 15:47:39 2013 -0600 Revert "PCI/ACPI: Request _OSC control before scanning PCI root bus" Backed it out. This of course brought back the problem in which acpi took over hotplug ports that were meant to be controlled by pcie. This patch gives us both items. It lets us request _OSC control before scanning the pci root bus, but defers any disabling of aspm until after the scan is complete, allowing us to properly handle old pcie 1.1 devices aspm settings properly, as b8178f130e documents. Tested successfully by myself. Signed-off-by: Neil Horman CC: Len Brown CC: "Rafael J. Wysocki" CC: linux-a...@vger.kernel.org CC: linux-...@vger.kernel.org --- drivers/acpi/pci_root.c | 62 ++--- 1 file changed, 38 insertions(+), 24 deletions(-) diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c index 5917839..6110fd2 100644 --- a/drivers/acpi/pci_root.c +++ b/drivers/acpi/pci_root.c @@ -378,6 +378,7 @@ static int acpi_pci_root_add(struct acpi_device *device, struct acpi_pci_root *root; u32 flags, base_flags; acpi_handle handle = device->handle; + bool no_aspm = false, clear_aspm = false; root = kzalloc(sizeof(struct acpi_pci_root), GFP_KERNEL); if (!root) @@ -437,27 +438,6 @@ static int acpi_pci_root_add(struct acpi_device *device, flags = base_flags = OSC_PCI_SEGMENT_GROUPS_SUPPORT; acpi_pci_osc_support(root, flags); - /* -* TBD: Need PCI interface for enumeration/configuration of roots. -*/ - - /* -* Scan the Root Bridge -* -* Must do this prior to any attempt to bind the root device, as the -* PCI namespace does not get created until this call is made (and -* thus the root bridge's pci_dev does not exist). -*/ - root->bus = pci_acpi_scan_root(root); - if (!root->bus) { - dev_err(&device->dev, - "Bus %04x:%02x not present in PCI namespace\n", - root->segment, (unsigned int)root->secondary.start); - result = -ENODEV; - goto end; - } - - /* Indicate support for various _OSC capabilities. */ if (pci_ext_cfg_avail()) flags |= OSC_EXT_PCI_CONFIG_SUPPORT; if (pcie_aspm_support_enabled()) { @@ -471,7 +451,7 @@ static int acpi_pci_root_add(struct acpi_device *device, if (ACPI_FAILURE(status)) { dev_info(&device->dev, "ACPI _OSC support " "notification failed, disabling PCIe ASPM\n"); - pcie_no_aspm(); + no_aspm = true; flags = base_flags; } } @@ -503,7 +483,7 @@ static int acpi_pci_root_add(struct acpi_device *device, * We have ASPM control, but the FADT indicates * that it's unsupported. Clear it. */ - pcie_clear_aspm(root->bus); + clear_aspm = true; } } else { dev_info(&device->dev, @@ -512,7 +492,14 @@ static int acpi_pci_root_add(struct acpi_device *device, acpi_format_exception(status), flags); dev_info(&device->dev, "ACPI _OSC control for PCIe not granted, disabling ASPM\n"); - pcie_no_aspm(); + /* +* We want to disable aspm here, but aspm_disabled +* needs to remain in its state from boot so that we +* properly handle pcie 1.1 devices. So we set this +* flag here, to defer the action until after the acpi +* root scan +*/ + no_aspm = true; } } else { dev_info(&device->dev, @@ -520,6 +507,33 @@ static int acpi_pci_root_add(struct acpi_device *device, "(_OSC support mask: 0x%02x)\n", flags); } + /* +* TBD: Need PCI interface for enumeration/configuration of roots.
Re: [PATCH v4] ACPI: Fix osc flag setup ordering to allow pcie hotplug use when available
On Thu, Aug 29, 2013 at 05:40:52PM -0600, Bjorn Helgaas wrote: > On Thu, Aug 29, 2013 at 04:17:05PM -0400, Neil Horman wrote: > > This is a fix for: > > https://bugzilla.kernel.org/show_bug.cgi?id=60736 > > > > During the 3.8 devel cycle: > > > > commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6 > > Author: Taku Izumi > > Date: Tue Oct 30 15:27:13 2012 +0900 > > > > PCI/ACPI: Request _OSC control before scanning PCI root bus > > > > went in to allow us to query the pcie hotplug flags during the acpi bus > > scan. > > It however caused problems with the disabling of pcie aspm, and so: > > commit b8178f130e25c1bdac1c33e0996f1ff6e20ec08e > > Author: Bjorn Helgaas > > Date: Mon Apr 1 15:47:39 2013 -0600 > > > > Revert "PCI/ACPI: Request _OSC control before scanning PCI root bus" > > > > Backed it out. This of course brought back the problem in which acpi took > > over > > hotplug ports that were meant to be controlled by pcie. > > > > This patch gives us both items. It lets us request _OSC control before > > scanning > > the pci root bus, but defers any disabling of aspm until after the scan is > > complete, allowing us to properly handle old pcie 1.1 devices aspm settings > > properly, as b8178f130e documents. > > > > Tested successfully by myself. > > > > Signed-off-by: Neil Horman > > CC: Len Brown > > CC: "Rafael J. Wysocki" > > CC: linux-a...@vger.kernel.org > > CC: linux-...@vger.kernel.org > > I added Yinghai's ack and a stable tag and put this in pci/misc > for v3.12. Thanks! > > I reworked the changelog because I think the actual cause of the > regression was 3b63aaa70e, not the _OSC commits you mentioned: > > 8c33f51df4 ("PCI/ACPI: Request _OSC control before scanning PCI root bus") > appeared in v3.8 and broke ASPM but not acpiphp. > > b8178f130e ('Revert "PCI/ACPI: Request _OSC control before scanning PCI > root bus"') appeared in v3.9 and fixed ASPM, leaving acpiphp working. > > 3b63aaa70e ("PCI: acpiphp: Do not use ACPI PCI subdriver mechanism") > appeared in v3.10, and I believe this is what actually broke acpiphp > because it moved the acpiphp initialization earlier, into the bus scan. > > in v3.8: > acpi_pci_root_add > acpi_pci_osc_control_set # request OS control > pci_acpi_scan_root# scan bus > acpi_pci_root_start > add_bridge# acpi_pci_driver .add method > ... > device_is_managed_by_native_pciehp # OK > > in v3.9: > acpi_pci_root_add > pci_acpi_scan_root# scan bus > acpi_pci_osc_control_set # request OS control > add_bridge# acpi_pci_driver .add method > ... > device_is_managed_by_native_pciehp # OK > > in v3.10: > acpi_pci_root_add > pci_acpi_scan_root# scan bus > ... > acpiphp_enumerate_slots > ... > device_is_managed_by_native_pciehp # PROBLEM > acpi_pci_osc_control_set # request OS control > > So I added a stable tag for v3.10+ only. I'll ask Linus to merge it > directly during the merge window for v3.12, and hopefully it will be > backported to the v3.10 and v3.11 stable trees soon after that. > > Bjorn > Copy that, thanks! Neil -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] msi: add forgotten pci_dev_put(pdev) to populate_msi_sysfs()
On Thu, Sep 26, 2013 at 02:25:52PM +0200, Veaceslav Falico wrote: > On Wed, Sep 25, 2013 at 05:35:54PM -0600, Bjorn Helgaas wrote: > >On Wed, Sep 25, 2013 at 5:23 PM, Neil Horman wrote: > >>On Wed, Sep 25, 2013 at 03:08:05PM -0600, Bjorn Helgaas wrote: > >>>[+cc Neil (he added this code in da8d1c8ba4), Greg] > >>> > >>>On Mon, Sep 16, 2013 at 7:47 PM, Veaceslav Falico > >>>wrote: > >>>> Before trying to kobject_init_and_add(), we add a reference to pdev via > >>>> pci_dev_get(pdev). However, if it fails to init and/or add the kobject, > >>>> we > >>>> don't return it back - even on out_unroll. > >>>> > >>>> Fix this by adding pci_dev_put(pdev) before going to unrolling section. > >>>> > >>>> CC: Bjorn Helgaas > >>>> CC: linux-...@vger.kernel.org > >>>> CC: linux-kernel@vger.kernel.org > >>>> Signed-off-by: Veaceslav Falico > >>>> --- > >>>> drivers/pci/msi.c | 4 +++- > >>>> 1 file changed, 3 insertions(+), 1 deletion(-) > >>>> > >>>> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c > >>>> index d5f90d6..14bf578 100644 > >>>> --- a/drivers/pci/msi.c > >>>> +++ b/drivers/pci/msi.c > >>>> @@ -534,8 +534,10 @@ static int populate_msi_sysfs(struct pci_dev *pdev) > >>>> pci_dev_get(pdev); > >>>> ret = kobject_init_and_add(kobj, &msi_irq_ktype, NULL, > >>>> "%u", entry->irq); > >>>> - if (ret) > >>>> + if (ret) { > >>>> + pci_dev_put(pdev); > >>>> goto out_unroll; > >>>> + } > >>>> > >>>> count++; > >>>> } > >>> > >>>I don't understand why this code does the pci_dev_get() in the first > >>>place. The pdev->msi_list of msi_desc structs is private to the > >>>pci_dev, and even without bumping the refcount, there should be no way > >>>for the pci_dev to be freed before the msi_desc. > >>> > >>Its been a few years now, but IIRC I did the pci_dev_get/put here to ensure > >>that > >>people didn't try to remove the device prior to freeing all their interrupts > >>(i.e I didn't want a broken driver to go through its remove routine without > >>freeing all its irqs). That might have been the wrong thing to do, but > >>thats > >>what bubbles to the front of my head when looking at this. > > > >That sounds plausible, but I think I'd rather deal with that by having > >the PCI core remove logic free all the interrupts. I *think* that's > >already in place, i.e., pci_free_resources() calls > >msi_remove_pci_irq_vectors(). So I propose that we remove the > >pci_dev_get()/put() unless we come up with a more compelling reason > >for it. > > As an update - I've found an interesting case why exactly that > kobject_del() might be needed: > > in kobject_del() it removes instantly the link to kset - via > kobj_kset_leave(), so that our kset remains without links and, thus, might > be instantly removed. > > So, with kobject_del(), our kset (msi_irqs sysfs dir) remains instantly > without any links (i.e. other kobjects) and, when we call kset_unregister() > - it exits instantly (if it's not being hold somewhere elsewhere...). > > Without it, kset_unregister() will wait till all the kobjects will be gone. > > Now, the fun part starts - if we quickly call pci_disable_msi() and, > afterwards, pci_enable_msi() - we might fail because the msi_irqs kset is > still there, waiting to unregister, and the sysfs dir is still active. > > It's used, for example, in tg3_open/tg3_close, which are ndo_open/close, > and are called on enslave/deslave in bonding. > > What I get: > [ 60.458319] WARNING: CPU: 0 PID: 5552 at fs/sysfs/dir.c:526 > sysfs_add_one+0xbb/0xe0() > [ 60.458350] sysfs: cannot create duplicate filename > '/devices/pci:00/:00:1c.5/:3f:00.0/msi_irqs' > > I'll take a deeper look at the issue, though any feedback/advise is > welcome. And I'll hold on with the patchset that removes pci_dev_get/put > and kobject_del. > > The origional post may offer some guidance here: https://lkml.org/lkml/2011/9/29/220 In particular the v3 update I think is rele
Re: [PATCH 1/1] i2c-ismt: add Initialize DMA buffer code ismt_access()
On Thu, Sep 26, 2013 at 10:40:32AM +0200, Wolfram Sang wrote: > On Tue, Sep 24, 2013 at 04:47:55PM -0700, James Ralston wrote: > > This patch adds code to Initialize the DMA buffer to compensate for > > possible hardware data corruption. > > Take care of the line width here. > > > Signed-off-by: James Ralston > > --- > > drivers/i2c/busses/i2c-ismt.c |3 +++ > > 1 files changed, 3 insertions(+), 0 deletions(-) > > > > diff --git a/drivers/i2c/busses/i2c-ismt.c b/drivers/i2c/busses/i2c-ismt.c > > index 8ed79a0..d3f30ea 100644 > > --- a/drivers/i2c/busses/i2c-ismt.c > > +++ b/drivers/i2c/busses/i2c-ismt.c > > @@ -393,6 +393,9 @@ static int ismt_access(struct i2c_adapter *adap, u16 > > addr, > > > > desc = &priv->hw[priv->head]; > > > > + /* Initialize the DMA buffer */ > > + memset(priv->dma_buffer, 0, (I2C_SMBUS_BLOCK_MAX+1)); > > sizeof()? > Agreed. Neil > > + > > /* Initialize the descriptor */ > > memset(desc, 0, sizeof(struct ismt_desc)); > > desc->tgtaddr_rw = ISMT_DESC_ADDR_RW(addr, read_write); > > -- > > 1.7.7.6 > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] msi: free msi_desc entry only after we've released the kobject
On Thu, Sep 26, 2013 at 11:59:51AM +0200, Veaceslav Falico wrote: > Currently, we first do kobject_put(&entry->kobj) and the kfree(entry), > however kobject_put() doesn't guarantee us that it was the last reference > and that the kobj isn't used currently by someone else, so after we > kfree(entry) with the struct kobject - other users will begin using the > freed memory, instead of the actual kobject. > > Fix this by using the kobject->release callback, which is called last when > the kobject is indeed not used and is cleaned up - it's msi_kobj_release(), > which can do the kfree(entry) safely (kobject_put/cleanup doesn't use the > kobj itself after ->release() was called, so we're safe). > > In case we've failed to create the sysfs directories - just kfree() > it - cause we don't have the kobjects attached. > > Also, remove the same functionality from populate_msi_sysfs(), cause on > failure we anyway call free_msi_irqs(), which will take care of all the > kobjects properly. > > And add the forgotten pci_dev_put(pdev) in case of failure to register the > kobject in populate_msi_sysfs(). > > CC: Bjorn Helgaas > CC: Neil Horman > CC: Greg Kroah-Hartman > CC: linux-...@vger.kernel.org > CC: linux-kernel@vger.kernel.org > Signed-off-by: Veaceslav Falico > --- > Acked-by: Neil Horman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 1/3] pci: remove redundant pci_dev_get/put() on kobject (un)register
On Thu, Sep 26, 2013 at 04:11:13PM +0200, Veaceslav Falico wrote: > Currently we're pci_dev_get/put()-ing pci device on every kobject > creation/removal. It's useless per se - the kobject is part of the device > itself, so it should be removed when there are no users of the pdev, and is > not a user of it. > > CC: Bjorn Helgaas > CC: Neil Horman > CC: Greg Kroah-Hartman > CC: linux-...@vger.kernel.org > CC: linux-kernel@vger.kernel.org > Signed-off-by: Veaceslav Falico Acked-by: Neil Horman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 2/3] msi: always unregister ->msi_kset within free_msi_irqs()
On Thu, Sep 26, 2013 at 04:11:14PM +0200, Veaceslav Falico wrote: > Currently we create and populate ->msi_kset while allocating irqs in > populate_msi_sysfs(), however if it fails and/or we want to free the > entries - we don't always remove it, and we might have problems if we've > failed to allocate irqs and try it again. > > To fix that, move the unregister part to free_msi_irqs() and remove already > existing ones. Also, verify if it was actually created - we don't always > call free_msi_irqs() after populate_msi_sysfs(). > > CC: Bjorn Helgaas > CC: Neil Horman > CC: Greg Kroah-Hartman > CC: linux-...@vger.kernel.org > CC: linux-kernel@vger.kernel.org > Signed-off-by: Veaceslav Falico Acked-by: Neil Horman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] iommu: Remove stack trace from broken irq remapping warning
The warning for the irq remapping broken check in intel_irq_remapping.c is pretty pointless. We need the warning, but we know where its comming from, the stack trace will always be the same, and it needlessly triggers things like Abrt. This changes the warning to just print a text warning about BIOS being broken, without the stack trace, then sets the appropriate taint bit. Since we automatically disable irq remapping, theres no need to contiue making Abrt jump at this problem Signed-off-by: Neil Horman CC: Joerg Roedel CC: Bjorn Helgaas CC: Andy Lutomirski CC: Konrad Rzeszutek Wilk CC: Sebastian Andrzej Siewior --- drivers/iommu/intel_irq_remapping.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c index f71673d..b97d70b 100644 --- a/drivers/iommu/intel_irq_remapping.c +++ b/drivers/iommu/intel_irq_remapping.c @@ -525,12 +525,13 @@ static int __init intel_irq_remapping_supported(void) if (disable_irq_remap) return 0; if (irq_remap_broken) { - WARN_TAINT(1, TAINT_FIRMWARE_WORKAROUND, - "This system BIOS has enabled interrupt remapping\n" - "on a chipset that contains an erratum making that\n" - "feature unstable. To maintain system stability\n" - "interrupt remapping is being disabled. Please\n" - "contact your BIOS vendor for an update\n"); + printk(KERN_WARNING + "This system BIOS has enabled interrupt remapping\n" + "on a chipset that contains an erratum making that\n" + "feature unstable. To maintain system stability\n" + "interrupt remapping is being disabled. Please\n" + "contact your BIOS vendor for an update\n"); + add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK); disable_irq_remap = 1; return 0; } -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] iommu: Remove stack trace from broken irq remapping warning
On Fri, Sep 27, 2013 at 12:24:10PM -0700, Andy Lutomirski wrote: > On Fri, Sep 27, 2013 at 9:53 AM, Neil Horman wrote: > > The warning for the irq remapping broken check in intel_irq_remapping.c is > > pretty pointless. We need the warning, but we know where its comming from, > > the > > stack trace will always be the same, and it needlessly triggers things like > > Abrt. This changes the warning to just print a text warning about BIOS > > being > > broken, without the stack trace, then sets the appropriate taint bit. > > Since we > > automatically disable irq remapping, theres no need to contiue making Abrt > > jump > > at this problem > > > > Signed-off-by: Neil Horman > > CC: Joerg Roedel > > CC: Bjorn Helgaas > > CC: Andy Lutomirski > > CC: Konrad Rzeszutek Wilk > > CC: Sebastian Andrzej Siewior > > --- > > drivers/iommu/intel_irq_remapping.c | 13 +++-- > > 1 file changed, 7 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/iommu/intel_irq_remapping.c > > b/drivers/iommu/intel_irq_remapping.c > > index f71673d..b97d70b 100644 > > --- a/drivers/iommu/intel_irq_remapping.c > > +++ b/drivers/iommu/intel_irq_remapping.c > > @@ -525,12 +525,13 @@ static int __init intel_irq_remapping_supported(void) > > if (disable_irq_remap) > > return 0; > > if (irq_remap_broken) { > > - WARN_TAINT(1, TAINT_FIRMWARE_WORKAROUND, > > - "This system BIOS has enabled interrupt > > remapping\n" > > - "on a chipset that contains an erratum making > > that\n" > > - "feature unstable. To maintain system > > stability\n" > > - "interrupt remapping is being disabled. > > Please\n" > > - "contact your BIOS vendor for an update\n"); > > + printk(KERN_WARNING > > + "This system BIOS has enabled interrupt remapping\n" > > + "on a chipset that contains an erratum making > > that\n" > > + "feature unstable. To maintain system stability\n" > > + "interrupt remapping is being disabled. Please\n" > > + "contact your BIOS vendor for an update\n"); > > + add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK); > > Is the taint bit actually useful? It seems like functionality will be > missing if this workaround happens, but everything should be stable. > I think its useful yes. The system will be stable, an in fact should run exactly as it did before, but since the errata indicates this should be fixed in BIOS, its a reminder to the admin that you should investigate an update, or take action in manually disabling iommu on the command line Its also in keeping with the way this was structured previously. Neil > --Andy > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 12/12] sctp: Remove extern from function prototypes
On Mon, Sep 23, 2013 at 11:37:59AM -0700, Joe Perches wrote: > There are a mix of function prototypes with and without extern > in the kernel sources. Standardize on not using extern for > function prototypes. > > Function prototypes don't need to be written with extern. > extern is assumed by the compiler. Its use is as unnecessary as > using auto to declare automatic/local variables in a block. > > Signed-off-by: Joe Perches > --- > include/net/sctp/sctp.h | 13 ++--- > 1 file changed, 6 insertions(+), 7 deletions(-) > > diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h > index 3794c5a..c5fe806 100644 > --- a/include/net/sctp/sctp.h > +++ b/include/net/sctp/sctp.h > @@ -90,12 +90,11 @@ > /* > * sctp/protocol.c > */ > -extern int sctp_copy_local_addr_list(struct net *, struct sctp_bind_addr *, > - sctp_scope_t, gfp_t gfp, > - int flags); > -extern struct sctp_pf *sctp_get_pf_specific(sa_family_t family); > -extern int sctp_register_pf(struct sctp_pf *, sa_family_t); > -extern void sctp_addr_wq_mgmt(struct net *, struct sctp_sockaddr_entry *, > int); > +int sctp_copy_local_addr_list(struct net *, struct sctp_bind_addr *, > + sctp_scope_t, gfp_t gfp, int flags); > +struct sctp_pf *sctp_get_pf_specific(sa_family_t family); > +int sctp_register_pf(struct sctp_pf *, sa_family_t); > +void sctp_addr_wq_mgmt(struct net *, struct sctp_sockaddr_entry *, int); > > /* > * sctp/socket.c > @@ -110,7 +109,7 @@ void sctp_sock_rfree(struct sk_buff *skb); > void sctp_copy_sock(struct sock *newsk, struct sock *sk, > struct sctp_association *asoc); > extern struct percpu_counter sctp_sockets_allocated; > -extern int sctp_asconf_mgmt(struct sctp_sock *, struct sctp_sockaddr_entry > *); > +int sctp_asconf_mgmt(struct sctp_sock *, struct sctp_sockaddr_entry *); > > /* > * sctp/primitive.c > -- > 1.8.1.2.459.gbcd45b4.dirty > > Acked-by: Neil Horman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] msi: add forgotten pci_dev_put(pdev) to populate_msi_sysfs()
On Wed, Sep 25, 2013 at 03:08:05PM -0600, Bjorn Helgaas wrote: > [+cc Neil (he added this code in da8d1c8ba4), Greg] > > On Mon, Sep 16, 2013 at 7:47 PM, Veaceslav Falico wrote: > > Before trying to kobject_init_and_add(), we add a reference to pdev via > > pci_dev_get(pdev). However, if it fails to init and/or add the kobject, we > > don't return it back - even on out_unroll. > > > > Fix this by adding pci_dev_put(pdev) before going to unrolling section. > > > > CC: Bjorn Helgaas > > CC: linux-...@vger.kernel.org > > CC: linux-kernel@vger.kernel.org > > Signed-off-by: Veaceslav Falico > > --- > > drivers/pci/msi.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c > > index d5f90d6..14bf578 100644 > > --- a/drivers/pci/msi.c > > +++ b/drivers/pci/msi.c > > @@ -534,8 +534,10 @@ static int populate_msi_sysfs(struct pci_dev *pdev) > > pci_dev_get(pdev); > > ret = kobject_init_and_add(kobj, &msi_irq_ktype, NULL, > > "%u", entry->irq); > > - if (ret) > > + if (ret) { > > + pci_dev_put(pdev); > > goto out_unroll; > > + } > > > > count++; > > } > > I don't understand why this code does the pci_dev_get() in the first > place. The pdev->msi_list of msi_desc structs is private to the > pci_dev, and even without bumping the refcount, there should be no way > for the pci_dev to be freed before the msi_desc. > Its been a few years now, but IIRC I did the pci_dev_get/put here to ensure that people didn't try to remove the device prior to freeing all their interrupts (i.e I didn't want a broken driver to go through its remove routine without freeing all its irqs). That might have been the wrong thing to do, but thats what bubbles to the front of my head when looking at this. > I also don't understand this nearby code (the same pattern appears in > free_msi_irqs()): > > out_unroll: > list_for_each_entry(entry, &pdev->msi_list, list) { > if (!count) > break; > kobject_del(&entry->kobj); > kobject_put(&entry->kobj); > count--; > } > > Why do we call kobject_del() here? The kobject_put() will call > kobject_del() anyway, so it looks redundant. > Documentation/kobject.txt says kobject_del() must be called explicitly > to break a circular reference, but I don't think we have that here. > I think thats exactly why I did it, because of the documentation. I agree however, it does look redundant. Harmless, but redundant. > Also, I think it is incorrect that free_msi_irqs() does this: > > if (entry->kobj.parent) { > kobject_del(&entry->kobj); > kobject_put(&entry->kobj); > } > > list_del(&entry->list); > kfree(entry); > > I think the "kfree(entry)" should be in msi_kobj_release() instead. > As far as this change goes, I think it looks ok Acked-by: Neil Horman > Bjorn > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] net, cgroup: Fix boot failure due to iteration of uninitialized list
On Mon, Sep 10, 2012 at 02:59:18PM +0530, Srivatsa S. Bhat wrote: > On 07/23/2012 05:10 PM, Neil Horman wrote: > > On Mon, Jul 23, 2012 at 09:15:05AM +0800, Gao feng wrote: > >> 于 2012年07月20日 00:27, Srivatsa S. Bhat 写道: > >>> After commit ef209f15 (net: cgroup: fix access the unallocated memory in > >>> netprio cgroup), boot fails with the following NULL pointer dereference: > >>> > [...] > >>> Call Trace: > >>> [] cgroup_init_subsys+0x83/0x169 > >>> [] cgroup_init+0x36/0x119 > >>> [] start_kernel+0x3ba/0x3ef > >>> [] ? kernel_init+0x27b/0x27b > >>> [] x86_64_start_reservations+0x131/0x136 > >>> [] x86_64_start_kernel+0x103/0x112 > >>> RIP [] cgrp_create+0xf6/0x190 > >>> RSP > >>> CR2: 0698 > >>> ---[ end trace a7919e7f17c0a725 ]--- > >>> Kernel panic - not syncing: Attempted to kill the idle task! > >>> > >>> The code corresponds to: > >>> > >>> update_netdev_tables(): > >>> for_each_netdev(&init_net, dev) { > >>> map = rtnl_dereference(dev->priomap); < HERE > >>> > >>> > >>> The list head is initialized in netdev_init(), which is called much > >>> later than cgrp_create(). So the problem is that we are calling > >>> update_netdev_tables() way too early (in cgrp_create()), which will > >>> end up traversing the not-yet-circular linked list. So at some point, > >>> the dev pointer will become NULL and hence dev->priomap becomes an > >>> invalid access. > >>> > >>> To fix this, just remove the update_netdev_tables() function entirely, > >>> since it appears that write_update_netdev_table() will handle things > >>> just fine. > >> > >> The reason I add update_netdev_tables in cgrp_create is to avoid additional > >> bound checkings when we accessing the dev->priomap.priomap. > >> > >> Eric,can we revert this commit 91c68ce2b26319248a32d7baa1226f819d283758 > >> now? > >> I think it's safe enough to access priomap without bound check. > >> > > > > I think its probably safe, yes, but lets leave it there for just a bit. > > Its not > > hurting anything, and I'd like to look into getting Srivatsa' patch in > > first. > > Hi Neil, > > Did you get around to look into this again? > I haven't looked at it specifically no, I apologize. That said I think the other changes that went in back in that time frame have had time to soak, and looking at the way we current update the priomap table, I think its safe for us to remove the update_netdev_table call and definition. If you repost your patch, I'll ack it. Thanks! Neil > Regards, > Srivatsa S. Bhat > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH] core_pattern: set core helpers root and namespace to crashing process
As its currently implemented, redirection of core dumps to a pipe reader should be executed such that the reader runs in the namespace of the crashing process, and it currently does not. This is the only sane way to deal with namespaces properly it seems to me, and this patch implements that functionality. Theres one problem I currently see with it, and that is that I'm not sure we can change the current behavior of how the root fs is set for the pipe reader, lest we break some user space expectations. As such, I've added a sysctl in this patch to allow administrators to globally select if a core reader specified via /proc/sys/kernel/core_pattern should use the global rootfs, or the (possibly) chrooted fs of the crashing process. I've tested this using both settings for the new sysctl, and it works well. For the sake of history, this was discussed before: http://www.spinics.net/lists/linux-containers/msg21531.html It seems there was some modicum of consensus as to how this should work, but there was no action taken on it. Signed-off-by: Neil Horman Reported-by: Daniel Berrange CC: Alexander Viro CC: Andrew Morton CC: Serge Hallyn CC: contain...@lists.linux-foundation.org --- fs/coredump.c | 11 +++ include/linux/binfmts.h | 1 + kernel/sysctl.c | 8 3 files changed, 20 insertions(+) diff --git a/fs/coredump.c b/fs/coredump.c index ce47379..e7d8ca2 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -47,6 +47,7 @@ int core_uses_pid; char core_pattern[CORENAME_MAX_SIZE] = "core"; unsigned int core_pipe_limit; +unsigned int core_pipe_global_root; struct core_name { char *corename; @@ -443,6 +444,7 @@ static void wait_for_dump_helpers(struct file *file) static int umh_pipe_setup(struct subprocess_info *info, struct cred *new) { struct file *files[2]; + struct path root; struct coredump_params *cp = (struct coredump_params *)info->data; int err = create_pipe_files(files, 0); if (err) @@ -455,6 +457,14 @@ static int umh_pipe_setup(struct subprocess_info *info, struct cred *new) /* and disallow core files too */ current->signal->rlim[RLIMIT_CORE] = (struct rlimit){1, 1}; + + if (!core_pipe_global_root) { + get_fs_root(cp->cprocess->fs, &root); + set_fs_root(current->fs, &root); + } + + switch_task_namespaces(current, cp->cprocess->nsproxy); + return err; } @@ -476,6 +486,7 @@ void do_coredump(siginfo_t *siginfo, struct pt_regs *regs) .siginfo = siginfo, .regs = regs, .limit = rlimit(RLIMIT_CORE), + .cprocess = current, /* * We must use the same mm->flags while dumping core to avoid * inconsistency of bit flags, since this flag is not protected diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index cfcc6bf..08e7bcb 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -61,6 +61,7 @@ struct coredump_params { siginfo_t *siginfo; struct pt_regs *regs; struct file *file; + struct task_struct *cprocess; unsigned long limit; unsigned long mm_flags; }; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 26f65ea..be7b69d 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -102,6 +102,7 @@ extern int suid_dumpable; extern int core_uses_pid; extern char core_pattern[]; extern unsigned int core_pipe_limit; +extern unsigned int core_pipe_global_root; #endif extern int pid_max; extern int min_free_kbytes; @@ -430,6 +431,13 @@ static struct ctl_table kern_table[] = { .mode = 0644, .proc_handler = proc_dointvec, }, + { + .procname = "core_pipe_global_root", + .data = &core_pipe_global_root, + .maxlen = sizeof(unsigned int), + .mode = 0644, + .proc_handler = proc_dointvec, + }, #endif #ifdef CONFIG_PROC_SYSCTL { -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: net,sctp: oops in sctp_do_sm
On Thu, Oct 18, 2012 at 10:33:29PM -0400, Sasha Levin wrote: > Hi all, > > While fuzzing with trinity inside a KVM tools (lkvm) guest running today's > linux-next, I've > stumbled on the following: > > [ 439.574039] BUG: unable to handle kernel paging request at 88001b9f40c8 > [ 439.576486] IP: [] sctp_do_sm+0x293/0x310 > [ 439.578128] PGD 4e27063 PUD 4e2b063 PMD 1fa57067 PTE 1b9f4160 > [ 439.580796] Oops: [#1] PREEMPT SMP DEBUG_PAGEALLOC > [ 439.581635] Dumping ftrace buffer: > [ 439.582171](ftrace buffer empty) > [ 439.582673] CPU 3 > [ 439.582957] Pid: 7101, comm: trinity-child16 Tainted: GW > 3.7.0-rc1-next-20121018-sasha-2-g60a870d-dirty #62 > [ 439.582986] RIP: 0010:[] [] > sctp_do_sm+0x293/0x310 > [ 439.582986] RSP: 0018:880010c57988 EFLAGS: 00010286 > [ 439.582986] RAX: 0003 RBX: 0001 RCX: > 0006 > [ 439.582986] RDX: 0003 RSI: 0001 RDI: > 880010c579d0 > [ 439.582986] RBP: 880010c57ae8 R08: R09: > > [ 439.582986] R10: 0001 R11: R12: > 0004 > [ 439.582986] R13: 88001b9f4000 R14: 880065d22600 R15: > 0003 > [ 439.582986] FS: 7f9a949c3700() GS:88006760() > knlGS: > [ 439.582986] CS: 0010 DS: ES: CR0: 80050033 > [ 439.582986] CR2: 88001b9f40c8 CR3: 1585 CR4: > 000406e0 > [ 439.582986] DR0: DR1: DR2: > > [ 439.582986] DR3: DR6: 0ff0 DR7: > 0400 > [ 439.582986] Process trinity-child16 (pid: 7101, threadinfo > 880010c56000, task 880010a98000) > [ 439.582986] Stack: > [ 439.582986] 00d0 84c92d36 > 84cc4b50 > [ 439.582986] 83763b30 0004 842c0370 > 000181152f15 > [ 439.582986] 880010c579f8 0002 0015 > > [ 439.582986] Call Trace: > [ 439.582986] [] ? sctp_cname+0x70/0x70 > [ 439.582986] [] sctp_primitive_SHUTDOWN+0x43/0x50 > [ 439.582986] [] sctp_close+0x150/0x310 > [ 439.606533] [] inet_release+0x1b2/0x1c0 > [ 439.606533] [] ? inet_release+0x1d/0x1c0 > [ 439.606533] [] inet6_release+0x34/0x60 > [ 439.606533] [] sock_release+0x18/0x80 > [ 439.610261] [] sock_close+0x29/0x30 > [ 439.610261] [] __fput+0x122/0x2d0 > [ 439.610261] [] fput+0x9/0x10 > [ 439.610261] [] task_work_run+0xbe/0x100 > [ 439.610261] [] do_exit+0x432/0xbd0 > [ 439.610261] [] ? get_signal_to_deliver+0x899/0x910 > [ 439.610261] [] ? get_lock_stats+0x22/0x70 > [ 439.610261] [] ? put_lock_stats.isra.16+0xe/0x40 > [ 439.610261] [] ? _raw_spin_unlock_irq+0x2b/0x80 > [ 439.610261] [] do_group_exit+0x84/0xd0 > [ 439.610261] [] get_signal_to_deliver+0x7fd/0x910 > [ 439.610261] [] ? trace_hardirqs_off+0xd/0x10 > [ 439.620391] [] ? debug_object_assert_init+0xbb/0x110 > [ 439.620391] [] do_signal+0x3a/0x950 > [ 439.620391] [] ? rcu_cleanup_after_idle+0x23/0x170 > [ 439.620391] [] ? rcu_eqs_exit_common+0x64/0x270 > [ 439.620391] [] ? rcu_user_enter+0x10d/0x140 > [ 439.620391] [] ? rcu_user_exit+0xc5/0xf0 > [ 439.620391] [] do_notify_resume+0x4f/0xa0 > [ 439.620391] [] int_signal+0x12/0x17 > [ 439.620391] Code: e8 eb 48 2c 00 0f 0b 90 41 b8 f4 ff ff ff 66 2e 0f 1f 84 > 00 00 00 00 00 8b 35 5a 0a 06 02 85 f6 74 66 4d 85 > ed 75 04 31 c0 eb 2a <41> 8b b5 c8 00 00 00 44 89 85 b8 fe ff ff 49 8b 7e 20 > e8 f6 51 > [ 439.630251] RIP [] sctp_do_sm+0x293/0x310 > [ 439.630251] RSP > [ 439.630251] CR2: 88001b9f40c8 > [ 439.630251] ---[ end trace aa5ad9f036ee09dd ]--- > > This points to the DEBUG_POST_SFX macro in sctp_do_sm(). > > > Thanks, > Sasha You don't have any of the logs right before this oops available do you? It might be helpful in determining what went wrong here Thanks Neil > -- > To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/4] net: remove obsolete simple_strto
On Fri, Dec 07, 2012 at 05:19:58PM +0530, Abhijit Pawar wrote: > This patch replace the obsolete simple_strto with kstrto > > Signed-off-by: Abhijit Pawar > --- > net/core/netpoll.c |9 +++-- > net/ipv4/netfilter/ipt_CLUSTERIP.c |9 +++-- > net/mac80211/debugfs_sta.c |4 +++- > net/netfilter/nf_conntrack_core.c |6 -- > 4 files changed, 21 insertions(+), 7 deletions(-) > > diff --git a/net/core/netpoll.c b/net/core/netpoll.c > index 77a0388..596b127 100644 > --- a/net/core/netpoll.c > +++ b/net/core/netpoll.c > @@ -668,13 +668,16 @@ EXPORT_SYMBOL(netpoll_print_options); > > int netpoll_parse_options(struct netpoll *np, char *opt) > { > + int rc; > char *cur=opt, *delim; > > if (*cur != '@') { > if ((delim = strchr(cur, '@')) == NULL) > goto parse_failed; > *delim = 0; > - np->local_port = simple_strtol(cur, NULL, 10); > + rc = kstrtol(cur, 10, &np->local_port); > + if (rc) > + goto parse_failed; Perhaps consolidate this to: if (kstrtol(cur, 10, &np->local_port) goto parse_failed Then you don't have to declare the new stack variable > cur = delim; > } > cur++; > @@ -705,7 +708,9 @@ int netpoll_parse_options(struct netpoll *np, char *opt) > *delim = 0; > if (*cur == ' ' || *cur == '\t') > np_info(np, "warning: whitespace is not allowed\n"); > - np->remote_port = simple_strtol(cur, NULL, 10); > + rc = kstrtol(cur, 10, &np->remote_port); > + if (rc) > + goto parse_failed; > cur = delim; Ditto > } > cur++; > diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c > b/net/ipv4/netfilter/ipt_CLUSTERIP.c > index fe5daea..55e7b73 100644 > --- a/net/ipv4/netfilter/ipt_CLUSTERIP.c > +++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c > @@ -661,6 +661,7 @@ static ssize_t clusterip_proc_write(struct file *file, > const char __user *input, > #define PROC_WRITELEN10 > char buffer[PROC_WRITELEN+1]; > unsigned long nodenum; > + int rc; > > if (size > PROC_WRITELEN) > return -EIO; > @@ -669,11 +670,15 @@ static ssize_t clusterip_proc_write(struct file *file, > const char __user *input, > buffer[size] = 0; > > if (*buffer == '+') { > - nodenum = simple_strtoul(buffer+1, NULL, 10); > + rc = kstrtoul(buffer+1, 10, &nodenum); > + if (rc) > + return -EINVAL; > if (clusterip_add_node(c, nodenum)) > return -ENOMEM; > } else if (*buffer == '-') { > - nodenum = simple_strtoul(buffer+1, NULL,10); > + rc = kstrtoul(buffer+1, 10, &nodenum); > + if (rc) > + return -EINVAL; > if (clusterip_del_node(c, nodenum)) > return -ENOENT; Same deal with the rc variable, although in this case it might make sense to return rc if kstrtoul fails, instead of just filtering it all down to -EINVAL. > } else > diff --git a/net/mac80211/debugfs_sta.c b/net/mac80211/debugfs_sta.c > index 89281d2..18754fd 100644 > --- a/net/mac80211/debugfs_sta.c > +++ b/net/mac80211/debugfs_sta.c > @@ -219,7 +219,9 @@ static ssize_t sta_agg_status_write(struct file *file, > const char __user *userbu > } else > return -EINVAL; > > - tid = simple_strtoul(buf, NULL, 0); > + ret = kstrtoul(buf, 0, &tid); > + if (ret) > + return -EINVAL; > > if (tid >= IEEE80211_NUM_TIDS) > return -EINVAL; > diff --git a/net/netfilter/nf_conntrack_core.c > b/net/netfilter/nf_conntrack_core.c > index af17516..18ce24b 100644 > --- a/net/netfilter/nf_conntrack_core.c > +++ b/net/netfilter/nf_conntrack_core.c > @@ -1409,7 +1409,7 @@ EXPORT_SYMBOL_GPL(nf_ct_alloc_hashtable); > > int nf_conntrack_set_hashsize(const char *val, struct kernel_param *kp) > { > - int i, bucket; > + int i, bucket, rc; > unsigned int hashsize, old_size; > struct hlist_nulls_head *hash, *old_hash; > struct nf_conntrack_tuple_hash *h; > @@ -1422,7 +1422,9 @@ int nf_conntrack_set_hashsize(const char *val, struct > kernel_param *kp) > if (!nf_conntrack_htable_size) > return param_set_uint(val, kp); > > - hashsize = simple_strtoul(val, NULL, 0); > + rc = kstrtouint(val, 0, &hashsize); > + if (rc) > + return -EINVAL; > if (!hashsize) > return -EINVAL; > As above, these call points might benefit from returning rc rather than just EINVAL. Neil > -- > 1.7.7.6 > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at
Re: [PATCH RESEND] net: remove obsolete simple_strto
On Mon, Dec 10, 2012 at 02:42:28PM +0530, Abhijit Pawar wrote: > This patch replace the obsolete simple_strto with kstrto > > Signed-off-by: Abhijit Pawar Acked-by: Neil Horman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] core_pattern: set core helpers root and namespace to crashing process
As its currently implemented, redirection of core dumps to a pipe reader should be executed such that the reader runs in the namespace of the crashing process, and it currently does not. This is the only sane way to deal with namespaces properly it seems to me, and this patch implements that functionality. Theres one problem I currently see with it, and that is that I'm not sure we can change the current behavior of how the root fs is set for the pipe reader, lest we break some user space expectations. As such, I've added a sysctl in this patch to allow administrators to globally select if a core reader specified via /proc/sys/kernel/core_pattern should use the global rootfs, or the (possibly) chrooted fs of the crashing process. I've tested this using both settings for the new sysctl, and it works well. For the sake of history, this was discussed before: http://www.spinics.net/lists/linux-containers/msg21531.html It seems there was some modicum of consensus as to how this should work, but there was no action taken on it. Signed-off-by: Neil Horman Reported-by: Daniel Berrange CC: Daniel Berrange CC: Alexander Viro CC: Andrew Morton CC: Serge Hallyn CC: contain...@lists.linux-foundation.org --- fs/coredump.c | 11 +++ include/linux/binfmts.h | 1 + kernel/sysctl.c | 8 3 files changed, 20 insertions(+) diff --git a/fs/coredump.c b/fs/coredump.c index ce47379..e7d8ca2 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -47,6 +47,7 @@ int core_uses_pid; char core_pattern[CORENAME_MAX_SIZE] = "core"; unsigned int core_pipe_limit; +unsigned int core_pipe_global_root; struct core_name { char *corename; @@ -443,6 +444,7 @@ static void wait_for_dump_helpers(struct file *file) static int umh_pipe_setup(struct subprocess_info *info, struct cred *new) { struct file *files[2]; + struct path root; struct coredump_params *cp = (struct coredump_params *)info->data; int err = create_pipe_files(files, 0); if (err) @@ -455,6 +457,14 @@ static int umh_pipe_setup(struct subprocess_info *info, struct cred *new) /* and disallow core files too */ current->signal->rlim[RLIMIT_CORE] = (struct rlimit){1, 1}; + + if (!core_pipe_global_root) { + get_fs_root(cp->cprocess->fs, &root); + set_fs_root(current->fs, &root); + } + + switch_task_namespaces(current, cp->cprocess->nsproxy); + return err; } @@ -476,6 +486,7 @@ void do_coredump(siginfo_t *siginfo, struct pt_regs *regs) .siginfo = siginfo, .regs = regs, .limit = rlimit(RLIMIT_CORE), + .cprocess = current, /* * We must use the same mm->flags while dumping core to avoid * inconsistency of bit flags, since this flag is not protected diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index cfcc6bf..08e7bcb 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -61,6 +61,7 @@ struct coredump_params { siginfo_t *siginfo; struct pt_regs *regs; struct file *file; + struct task_struct *cprocess; unsigned long limit; unsigned long mm_flags; }; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 26f65ea..be7b69d 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -102,6 +102,7 @@ extern int suid_dumpable; extern int core_uses_pid; extern char core_pattern[]; extern unsigned int core_pipe_limit; +extern unsigned int core_pipe_global_root; #endif extern int pid_max; extern int min_free_kbytes; @@ -430,6 +431,13 @@ static struct ctl_table kern_table[] = { .mode = 0644, .proc_handler = proc_dointvec, }, + { + .procname = "core_pipe_global_root", + .data = &core_pipe_global_root, + .maxlen = sizeof(unsigned int), + .mode = 0644, + .proc_handler = proc_dointvec, + }, #endif #ifdef CONFIG_PROC_SYSCTL { -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] random: prime last_data value per fips requirements
On Mon, Nov 05, 2012 at 04:00:10PM -0500, Jarod Wilson wrote: > The value stored in last_data must be primed for FIPS 140-2 purposes. Upon > first use, either on system startup or after an RNDCLEARPOOL ioctl, we > need to take an initial random sample, store it internally in last_data, > then pass along the value after that to the requester, so that consistency > checks aren't being run against stale and possibly known data. > > CC: Herbert Xu > CC: "David S. Miller" > CC: Neil Horman > CC: Matt Mackall > CC: linux-cry...@vger.kernel.org > Signed-off-by: Jarod Wilson > --- > drivers/char/random.c | 11 +++ > 1 files changed, 11 insertions(+), 0 deletions(-) > > diff --git a/drivers/char/random.c b/drivers/char/random.c > index b86eae9..24d17b8 100644 > --- a/drivers/char/random.c > +++ b/drivers/char/random.c > @@ -437,6 +437,7 @@ struct entropy_store { > int entropy_count; > int entropy_total; > unsigned int initialized:1; > + bool last_data_init; > __u8 last_data[EXTRACT_SIZE]; > }; > > @@ -967,6 +968,15 @@ static ssize_t extract_entropy(struct entropy_store *r, > void *buf, > if (fips_enabled) { > unsigned long flags; > > + /* prime last_data value if need be, per fips 140-2 */ > + if (!r->last_data_init) { > + spin_lock_irqsave(&r->lock, flags); > + memcpy(r->last_data, tmp, EXTRACT_SIZE); > + r->last_data_init = true; > + spin_unlock_irqrestore(&r->lock, flags); > + continue; Continue? Is that left over from earlier work? Or did you have some other purpose in mind for it? Also, not that its in a hot path or anything, but it might be nice to consolodate this code such that you only lock and unlock r->flags once instead of twice here. Best Neil > + } > + > spin_lock_irqsave(&r->lock, flags); > if (!memcmp(tmp, r->last_data, EXTRACT_SIZE)) > panic("Hardware RNG duplicated output!\n"); > @@ -1086,6 +1096,7 @@ static void init_std_data(struct entropy_store *r) > > r->entropy_count = 0; > r->entropy_total = 0; > + r->last_data_init = false; > mix_pool_bytes(r, &now, sizeof(now), NULL); > for (i = r->poolinfo->POOLBYTES; i > 0; i -= sizeof(rv)) { > if (!arch_get_random_long(&rv)) > -- > 1.7.1 > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3] random: prime last_data value per fips requirements
On Tue, Nov 06, 2012 at 10:42:42AM -0500, Jarod Wilson wrote: > The value stored in last_data must be primed for FIPS 140-2 purposes. Upon > first use, either on system startup or after an RNDCLEARPOOL ioctl, we > need to take an initial random sample, store it internally in last_data, > then pass along the value after that to the requester, so that consistency > checks aren't being run against stale and possibly known data. > > v2: streamline code flow a bit, eliminating extra loop and spinlock in the > case where we need to prime, and account for the extra primer bits. > > v3: extract_buf() can't be called with spinlock already held, so bring > back some extra lock/unlock calls. > > CC: Herbert Xu > CC: "David S. Miller" > CC: Neil Horman > CC: Matt Mackall > CC: linux-cry...@vger.kernel.org > Signed-off-by: Jarod Wilson > --- > drivers/char/random.c | 17 + > 1 files changed, 17 insertions(+), 0 deletions(-) > > diff --git a/drivers/char/random.c b/drivers/char/random.c > index b86eae9..d0139df 100644 > --- a/drivers/char/random.c > +++ b/drivers/char/random.c > @@ -437,6 +437,7 @@ struct entropy_store { > int entropy_count; > int entropy_total; > unsigned int initialized:1; > + bool last_data_init; > __u8 last_data[EXTRACT_SIZE]; > }; > > @@ -957,6 +958,10 @@ static ssize_t extract_entropy(struct entropy_store *r, > void *buf, > ssize_t ret = 0, i; > __u8 tmp[EXTRACT_SIZE]; > > + /* if last_data isn't primed, we need EXTRACT_SIZE extra bytes */ > + if (fips_enabled && !r->last_data_init) > + nbytes += EXTRACT_SIZE; > + > trace_extract_entropy(r->name, nbytes, r->entropy_count, _RET_IP_); > xfer_secondary_pool(r, nbytes); > nbytes = account(r, nbytes, min, reserved); > @@ -967,6 +972,17 @@ static ssize_t extract_entropy(struct entropy_store *r, > void *buf, > if (fips_enabled) { > unsigned long flags; > > + > + /* prime last_data value if need be, per fips 140-2 */ > + if (!r->last_data_init) { > + spin_lock_irqsave(&r->lock, flags); > + memcpy(r->last_data, tmp, EXTRACT_SIZE); > + r->last_data_init = true; > + nbytes -= EXTRACT_SIZE; > + spin_unlock_irqrestore(&r->lock, flags); > + extract_buf(r, tmp); > + } > + > spin_lock_irqsave(&r->lock, flags); > if (!memcmp(tmp, r->last_data, EXTRACT_SIZE)) > panic("Hardware RNG duplicated output!\n"); > @@ -1086,6 +1102,7 @@ static void init_std_data(struct entropy_store *r) > > r->entropy_count = 0; > r->entropy_total = 0; > + r->last_data_init = false; > mix_pool_bytes(r, &now, sizeof(now), NULL); > for (i = r->poolinfo->POOLBYTES; i > 0; i -= sizeof(rv)) { > if (!arch_get_random_long(&rv)) > -- > 1.7.1 > > Thanks Jarod. Acked-by: Neil Horman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 28/29] net/: rename net_random() to prandom_u32()
On Tue, Dec 25, 2012 at 08:47:26PM +0900, Akinobu Mita wrote: > 2012/12/25 Neil Horman : > > On Mon, Dec 24, 2012 at 11:14:15AM +0900, Akinobu Mita wrote: > >> Use more preferable function name which implies using a pseudo-random > >> number generator. > >> > >> Signed-off-by: Akinobu Mita > >> Cc: Jesse Gross > >> Cc: Venkat Venkatsubra > >> Cc: Vlad Yasevich > >> Cc: Sridhar Samudrala > >> Cc: Neil Horman > >> Cc: Steffen Klassert > >> Cc: Herbert Xu > >> Cc: "David S. Miller" > >> Cc: linux-s...@vger.kernel.org > >> Cc: d...@openvswitch.org > >> Cc: net...@vger.kernel.org > >> --- > >> include/net/red.h | 2 +- > >> net/802/garp.c| 2 +- > >> net/openvswitch/actions.c | 2 +- > >> net/rds/bind.c| 2 +- > >> net/sctp/socket.c | 2 +- > >> net/xfrm/xfrm_state.c | 2 +- > >> 6 files changed, 6 insertions(+), 6 deletions(-) > >> > > I'm largely indifferent to this patch, but I kind of feel like its just > > churn. > > Whats the real advantage in making this change? I grant that it clearly > > indicates the type of random number generator we're using at a given call > > site, > > But for those using net_random, you probably don't care too much about > > the source of your random bits. If you did really want true random vs. > > pseudo-random data, you need to explicitly use the right call. You're > > previous > > patch series did good cleanup on differentiating the different random > > calls, but > > this just seems like its removing what is otherwise useful indirection. > > I overlooked the importance of net_random() indirection. > Thanks for the feedback. I'll leave all net_random() callers as-is in > the next version. Well, I guess I should qualify my opinion. I find it useful personally (the generation of nonces in many cases can be left to most any pseudo random generator that the system deems is a 'good enough' balance between a fast generator that doesn't block on low entropy and a reasonably secure one that doesn't allow for easy prediction. As those needs and factors change, its nice to have a set point to change them at. If you (or anyone else has a differing opinion, I'm happy to listen to it. Regards Neil > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] netprio_cgroup: define sk_cgrp_prioidx only if NETPRIO_CGROUP is enabled
On Wed, Dec 26, 2012 at 02:48:24PM +0800, Li Zefan wrote: > sock->sk_cgrp_prioidx won't be used at all if CONFIG_NETPRIO_CGROUP=n. > > Signed-off-by: Li Zefan Acked-by: Neil Horman > --- > include/net/sock.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/include/net/sock.h b/include/net/sock.h > index 93a6745..182ca99 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -367,7 +367,7 @@ struct sock { > unsigned short sk_ack_backlog; > unsigned short sk_max_ack_backlog; > __u32 sk_priority; > -#ifdef CONFIG_CGROUPS > +#if IS_ENABLED(CONFIG_NETPRIO_CGROUP) > __u32 sk_cgrp_prioidx; > #endif > struct pid *sk_peer_pid; > -- > 1.8.0.2 > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: + core_pattern-set-core-helpers-root-and-namespace-to-crashing-process .patch added to -mm tree
On Tue, Dec 18, 2012 at 09:06:04PM +0100, Oleg Nesterov wrote: > On 12/17, Neil Horman wrote: > > > > On Mon, Dec 17, 2012 at 05:04:08PM +0100, Oleg Nesterov wrote: > > > > > > > Is there a way to switch all namespaces, except for the pid > > > > namespace? > > > > > > Which exactly namespaces you want to change? > > > > > Ideally, I want the pipe reader process to execute in the same namespaces > > that > > the crashing process executed in (i.e. the pipe reader should execute as > > though > > the crashing process forked it). > > Yes, and we probably want to change pid_ns as well. But afaics currently > this is not possible, even setns can't do this. > > I am starting to think that in this case, perhaps, do_coredump() should > not use call_usermode_helper() at all. Perhaps we can do clone(CLONE_VM) + > commit_creds/restore_root/etc + kernel_execve. > Yeah, I was comming to this same conclusion last night as well. I'd rather keep using the call_usermode_helper solution if at all possible, but perhaps we can integrate a clone path into it. > > > To be honest, I do not understand this patch at all. It seems that > > > you need to do something like sys_setns(). But if we do this, then > > > why we can't make core_pattern per-namespace? > > > > > That actually would make sense, although we can't really use setns > > directly, as > > I don't think we want to open file descriptors to do this manipulation in > > the > > kernel. > > Yes, yes, sure. But this is solveable. We do not really need to open > the files in /proc, we could use proc_ns_operations->install() directly. > Although this is not pretty. > Its not pretty, no. If we use the above clone solution however, we can avoid this entirely. > > Perhaps its best just to restrict this patch to adjusting the root fs > > location > > for the chroot case. > > Probably... at least for the start. > > BTW. Of course this is subjective, but personally I think that "||" > looks strange. Perhaps it would be better to add something like > --croot argument? > The || is ambiguous with its simmilarity to a shell 'or' command, but I don't think the --croot argument is much better on that front, as that then becomes ambiguous with arguments supplied to the pipe reader directly. The token should be leading the pipe_reader string in core_pattern to indicate a change in environment independent of the executable path. Perhaps |^ or something simmilar? Either way, Andrew, could you please drop this patch? Olegs comments I think make it pretty clear I've got some more work to do on this. Thanks! Neil -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: + core_pattern-set-core-helpers-root-and-namespace-to-crashing-process .patch added to -mm tree
On Tue, Dec 18, 2012 at 12:45:18PM -0800, Eric W. Biederman wrote: > Neil Horman writes: > > > On Tue, Dec 18, 2012 at 09:06:04PM +0100, Oleg Nesterov wrote: > >> On 12/17, Neil Horman wrote: > >> > > >> > On Mon, Dec 17, 2012 at 05:04:08PM +0100, Oleg Nesterov wrote: > >> > > > >> > > > Is there a way to switch all namespaces, except for the pid > >> > > > namespace? > >> > > > >> > > Which exactly namespaces you want to change? > >> > > > >> > Ideally, I want the pipe reader process to execute in the same > >> > namespaces that > >> > the crashing process executed in (i.e. the pipe reader should execute as > >> > though > >> > the crashing process forked it). > >> > >> Yes, and we probably want to change pid_ns as well. But afaics currently > >> this is not possible, even setns can't do this. > > The code for setns to change the pid namespace just merged. > Can you post a link to the merge commit for reference so I can take a look at it? > Oleg I copied you on that code when I put it up for review. Did I use > the wrong email address? > > >> BTW. Of course this is subjective, but personally I think that "||" > >> looks strange. Perhaps it would be better to add something like > >> --croot argument? > >> > > The || is ambiguous with its simmilarity to a shell 'or' command, but I > > don't > > think the --croot argument is much better on that front, as that then > > becomes > > ambiguous with arguments supplied to the pipe reader directly. The token > > should > > be leading the pipe_reader string in core_pattern to indicate a change in > > environment independent of the executable path. Perhaps |^ or something > > simmilar? > > I failed to send my earlier reply but there is another problem with the > approach of only having one global core dump pattern. You can't set it > per container. Which means a special character to switch namespeces > while a reasonable solution (and arguably unnecessary solution) is not a > complete solution. > > > Either way, Andrew, could you please drop this patch? Olegs comments I > > think > > make it pretty clear I've got some more work to do on this. > > If we just want one pattern we should be able to to robustly implement > this in userspace with the existing functionality. With the caveat that > we need to get some pid namespace and user namespace bugs in the core > pattern generation fixed. But we need to fix those bugs anyway. > Then perhaps the right thing to do here is in fact just make core_pattern a per-namespace sysctl. I only took a brief look, but I was unable to find an example of such a per-namespace systctl. Do we already have the infrastructure to do such a thing? I didn't think we did. Thanks! Neil > Eric > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: + core_pattern-set-core-helpers-root-and-namespace-to-crashing-process .patch added to -mm tree
On Wed, Dec 19, 2012 at 05:22:30PM +0100, Oleg Nesterov wrote: > On 12/18, Neil Horman wrote: > > > > On Tue, Dec 18, 2012 at 09:06:04PM +0100, Oleg Nesterov wrote: > > > > Perhaps its best just to restrict this patch to adjusting the root fs > > > > location > > > > for the chroot case. > > > > > > Probably... at least for the start. > > > > > > BTW. Of course this is subjective, but personally I think that "||" > > > looks strange. Perhaps it would be better to add something like > > > --croot argument? > > > > > The || is ambiguous with its simmilarity to a shell 'or' command, > > Ah, I didn't mean this. > > I meant, this is not flexible. We can implement --croot, then we can > (may be) add --switch_ns. If you use "||" now for chroot, what will you > do for switch_ns? > > > but I don't > > think the --croot argument is much better on that front, as that then > > becomes > > ambiguous with arguments supplied to the pipe reader directly. > > Not sure I understand... why? > > > The token should > > be leading the pipe_reader string in core_pattern to indicate a change in > > environment independent of the executable path. > > Do you mean that || at the front is more "visible" ? True, but I am > not sure this is that important. > All I really meant was that placing the token for croot at the front of the pattern would be more readable as it disambiguates its meaning from the rest of the command. > But I won't insist. > I wouldn't worry about it. After looking over the changes Eric just had merged for 3.8 I'm becomming more convinced that this isn't really needed anymore. With his changes, we can migrate a process between all available namespaces dynamically in user space. With that functionality we can just write a setns administrative utility to make this all work. I've started that work here: http://marc.info/?l=util-linux-ng&m=135594402801895&w=2 all thats arguably left after that is to make core_pattern a per-container (possibly per mount namespace?). My patch doesn't do that. But its something I can look into. Neil > Oleg. > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 28/29] net/: rename net_random() to prandom_u32()
On Mon, Dec 24, 2012 at 11:14:15AM +0900, Akinobu Mita wrote: > Use more preferable function name which implies using a pseudo-random > number generator. > > Signed-off-by: Akinobu Mita > Cc: Jesse Gross > Cc: Venkat Venkatsubra > Cc: Vlad Yasevich > Cc: Sridhar Samudrala > Cc: Neil Horman > Cc: Steffen Klassert > Cc: Herbert Xu > Cc: "David S. Miller" > Cc: linux-s...@vger.kernel.org > Cc: d...@openvswitch.org > Cc: net...@vger.kernel.org > --- > include/net/red.h | 2 +- > net/802/garp.c| 2 +- > net/openvswitch/actions.c | 2 +- > net/rds/bind.c| 2 +- > net/sctp/socket.c | 2 +- > net/xfrm/xfrm_state.c | 2 +- > 6 files changed, 6 insertions(+), 6 deletions(-) > I'm largely indifferent to this patch, but I kind of feel like its just churn. Whats the real advantage in making this change? I grant that it clearly indicates the type of random number generator we're using at a given call site, But for those using net_random, you probably don't care too much about the source of your random bits. If you did really want true random vs. pseudo-random data, you need to explicitly use the right call. You're previous patch series did good cleanup on differentiating the different random calls, but this just seems like its removing what is otherwise useful indirection. Neil > diff --git a/include/net/red.h b/include/net/red.h > index ef46058..168bb2f 100644 > --- a/include/net/red.h > +++ b/include/net/red.h > @@ -303,7 +303,7 @@ static inline unsigned long red_calc_qavg(const struct > red_parms *p, > > static inline u32 red_random(const struct red_parms *p) > { > - return reciprocal_divide(net_random(), p->max_P_reciprocal); > + return reciprocal_divide(prandom_u32(), p->max_P_reciprocal); > } > > static inline int red_mark_probability(const struct red_parms *p, > diff --git a/net/802/garp.c b/net/802/garp.c > index 8456f5d..cf7410d 100644 > --- a/net/802/garp.c > +++ b/net/802/garp.c > @@ -397,7 +397,7 @@ static void garp_join_timer_arm(struct garp_applicant > *app) > { > unsigned long delay; > > - delay = (u64)msecs_to_jiffies(garp_join_time) * net_random() >> 32; > + delay = (u64)msecs_to_jiffies(garp_join_time) * prandom_u32() >> 32; > mod_timer(&app->join_timer, jiffies + delay); > } > > diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c > index ac2defe..257bc36 100644 > --- a/net/openvswitch/actions.c > +++ b/net/openvswitch/actions.c > @@ -404,7 +404,7 @@ static int sample(struct datapath *dp, struct sk_buff > *skb, >a = nla_next(a, &rem)) { > switch (nla_type(a)) { > case OVS_SAMPLE_ATTR_PROBABILITY: > - if (net_random() >= nla_get_u32(a)) > + if (prandom_u32() >= nla_get_u32(a)) > return 0; > break; > > diff --git a/net/rds/bind.c b/net/rds/bind.c > index 637bde5..7f95f4b 100644 > --- a/net/rds/bind.c > +++ b/net/rds/bind.c > @@ -118,7 +118,7 @@ static int rds_add_bound(struct rds_sock *rs, __be32 > addr, __be16 *port) > rover = be16_to_cpu(*port); > last = rover; > } else { > - rover = max_t(u16, net_random(), 2); > + rover = max_t(u16, prandom_u32(), 2); > last = rover - 1; > } > > diff --git a/net/sctp/socket.c b/net/sctp/socket.c > index 9e65758..95860aa 100644 > --- a/net/sctp/socket.c > +++ b/net/sctp/socket.c > @@ -5899,7 +5899,7 @@ static long sctp_get_port_local(struct sock *sk, union > sctp_addr *addr) > > inet_get_local_port_range(&low, &high); > remaining = (high - low) + 1; > - rover = net_random() % remaining + low; > + rover = prandom_u32() % remaining + low; > > do { > rover++; > diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c > index 3459692..35ddaab 100644 > --- a/net/xfrm/xfrm_state.c > +++ b/net/xfrm/xfrm_state.c > @@ -1546,7 +1546,7 @@ int xfrm_alloc_spi(struct xfrm_state *x, u32 low, u32 > high) > } else { > u32 spi = 0; > for (h=0; h - spi = low + net_random()%(high-low+1); > + spi = low + prandom_u32() % (high - low + 1); > x0 = xfrm_state_lookup(net, mark, &x->id.daddr, > htonl(spi), x->id.proto, x->props.family); > if (x0 == NULL) { > x->id.spi = htonl(spi); > -- > 1.7.11.7 > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHSET cgroup/for-3.8] netprio_cgroup: implement hierarchy support
On Fri, Nov 16, 2012 at 11:20:16AM -0800, Tejun Heo wrote: > Hello, guys. > > This patchset implements hierarchy support for netprio_cgroup. > netprio_cgroup along with netcls_cgroup is a rather weird in that it > really isn't about resource control. It just hitches on cgroup as a > convenient mechanism to do stuff to groups of tasks. The hierarchy > support reflects such nature. There's no limit being imposed from > ancestors. It simply propagates configuration downwards until there's > a node with its own config. IOW, any given cgroup inherits priorities > from its parent for all netdevs which it doesn't have its own config > for. > > As a parent isn't affected by child inheriting its config, the > hierarchy implementation is pretty simple. It's enough to inherit > config from ->css_online() and propagate new config downwards from > write_priomap(). As each node needs to know which config is its local > one and which is inherited, an extra config array is added - > netprio_map->aux[]. It's a separate array to avoid disturbing spatial > locality of ->priomap[]. While it currently contains single one-bit > flag, I still made it a struct so that adding more configuration > (e.g. max_prio) is easy. > > Note that this does change userland-visible behavior. Now, nesting is > allowed and cgroups at the first level inherit priorities from the > root cgroup. I can't think of any better way than just biting the > bullet here. :( > > 0001-cgroup-add-cgroup-id.patch > 0002-netprio-simplify-write_priomap.patch > 0003-netprio_cgroup-shorten-variable-names-in-extend_netd.patch > 0004-netprio_cgroup-reimplement-priomap-expansion.patch > 0005-netprio_cgroup-use-cgroup-id-instead-of-cgroup_netpr.patch > 0006-netprio_cgroup-implement-netprio-_set-_prio-helpers.patch > 0007-netprio_cgroup-keep-track-of-whether-prio-is-set-or-.patch > 0008-netprio_cgroup-implement-hierarchy-support.patch > > 0001 adds cgroup->id. This will eventually replace css_id. > > 0002-0006 are prep patches. > > 0007 implements is_local flag which tracks whether a cgroup has its > own config or should inherit from its parent. > > 0008 implements hierarchy support. > > This patchset is on top of > > cgroup/for-3.8 ef9fe980c6 ("cgroup_freezer: implement proper hierarchy > support") > + [1] "[PATCHSET cgroup/for-3.8] cgroup: allow ->post_create() to fail" > + [2] "[PATCH 1/2] cgroup: s/CGRP_CLONE_CHILDREN/CGRP_CPUSET_CLONE_CHILDREN/" > "[PATCH 2/2] cgroup, cpuset: remove cgroup_subsys->post_clone()" > > and available in the following git branch. > > git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git > review-netprio_cgroup-hierarchy > > diffstat follows. > > Documentation/cgroups/net_prio.txt | 21 +- > include/linux/cgroup.h |2 > include/net/netprio_cgroup.h | 21 +- > kernel/cgroup.c| 15 + > net/core/netprio_cgroup.c | 376 > +++-- > 5 files changed, 284 insertions(+), 151 deletions(-) > > Thanks. > > -- > tejun > > [1] http://thread.gmane.org/gmane.linux.kernel.cgroups/5047 > [2] http://thread.gmane.org/gmane.linux.kernel/1393151 > Thanks Tejun For the series: Acked-by: Neil Horman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHSET cgroup/for-3.8] netcls_cgroup: implement hierarchy support
On Fri, Nov 16, 2012 at 07:30:59PM -0800, Tejun Heo wrote: > Hello, guys. > > This patchset implements proper hierarchy support for netcls_cgroup. > Pretty simliar to the netprio one[3]. Simpler as each cgroup has > single config value instead of array of them. > > This patchset contains the following three patches. > > 0001-netcls_cgroup-introduce-netcls_mutex.patch > 0002-netcls_cgroup-introduce-cgroup_cls_state-is_local.patch > 0003-netcls_cgroup-implement-proper-hierarchy-support.patch > > This patchset is on top of > > cgroup/for-3.8 ef9fe980c6 ("cgroup_freezer: implement proper hierarchy > support") > + [1] "[PATCHSET cgroup/for-3.8] cgroup: allow ->post_create() to fail" > + [2] "[PATCH 1/2] cgroup: s/CGRP_CLONE_CHILDREN/CGRP_CPUSET_CLONE_CHILDREN/" > "[PATCH 2/2] cgroup, cpuset: remove cgroup_subsys->post_clone()" > + [3] "[PATCHSET cgroup/for-3.8] netprio_cgroup: implement hierarchy support" > > and available in the following git branch. > > git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git > review-netcls_cgroup-hierarchy > > diffstat follows. > > include/net/cls_cgroup.h |1 > net/sched/cls_cgroup.c | 102 > --- > 2 files changed, 88 insertions(+), 15 deletions(-) > > Thanks. > > -- > tejun > > [1] http://thread.gmane.org/gmane.linux.kernel.cgroups/5047 > [2] http://thread.gmane.org/gmane.linux.kernel/1393151 > [3] https://lkml.org/lkml/2012/11/16/514 > Acked-by: Neil Horman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v10] irq: add quirk for broken interrupt remapping on 55XX chipsets
A few years back intel published a spec update: http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf For the 5520 and 5500 chipsets which contained an errata (specificially errata 53), which noted that these chipsets can't properly do interrupt remapping, and as a result the recommend that interrupt remapping be disabled in bios. While many vendors have a bios update to do exactly that, not all do, and of course not all users update their bios to a level that corrects the problem. As a result, occasionally interrupts can arrive at a cpu even after affinity for that interrupt has be moved, leading to lost or spurrious interrupts (usually characterized by the message: kernel: do_IRQ: 7.71 No irq handler for vector (irq -1) There have been several incidents recently of people seeing this error, and investigation has shown that they have system for which their BIOS level is such that this feature was not properly turned off. As such, it would be good to give them a reminder that their systems are vulnurable to this problem. For details of those that reported the problem, please see: https://bugzilla.redhat.com/show_bug.cgi?id=887006 Signed-off-by: Neil Horman CC: Prarit Bhargava CC: Don Zickus CC: Don Dutile CC: Bjorn Helgaas CC: Asit Mallick CC: David Woodhouse CC: linux-...@vger.kernel.org CC: Joerg Roedel CC: Konrad Rzeszutek Wilk CC: Arkadiusz Miśkiewicz --- Change notes: v2) * Moved the quirk to the x86 arch, since consensus seems to be that the 55XX chipset series is x86 only. I decided however to keep the quirk as a regular quirk, not an early_quirk. Early quirks have no way currently to determine if BIOS has properly disabled the feature in the iommu, at least not without significant hacking, and since its quite possible this will be a short lived quirk, should Don Z's workaround code prove successful (and it looks like it may well), I don't think that necessecary. * Removed the WARNING banner from the quirk, and added the HW_ERR token to the string, I opted to leave the newlines in place however, as I really couldnt find a way to keep the text on a single line is still legible from a code perspective. I think theres enough language in there that using cscope on just about any substring however will turn it up, and again, this may be a short lived quirk. v3) * Removed defines from pci_ids.h, and used direct id values as per request from Bjorn. v4) * Converted pr_warn to WARN_TAINT(TAINT_FIRMWARE_WORKAROUND) as per David Woodhouse v5) * Moved check to an early quirk, and flagged the broken chip, so we could reasonably disable irq remapping during bootup. v6) * Clean up of stupid extra thrash in quirks.c v7) * Move broken check to intel_irq_remapping.c * Fixed another typo * Finally made the reference bugzilla public v8) * Removed extraneous code from irq_remapping_enabled v9) * Fix stupid build break from rushing to shuffle simmilar header files about Thanks to Arkadiusz Miśkiewicz for pointing it out v10) * Rewrite to hide the irq_remap_broken variable so we don't need to pull in a private header file --- arch/x86/include/asm/irq_remapping.h | 1 + arch/x86/kernel/early-quirks.c | 26 ++ drivers/iommu/intel_irq_remapping.c | 10 ++ drivers/iommu/irq_remapping.c| 6 ++ drivers/iommu/irq_remapping.h| 2 ++ 5 files changed, 45 insertions(+) diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h index 95fd352..d740cb4 100644 --- a/arch/x86/include/asm/irq_remapping.h +++ b/arch/x86/include/asm/irq_remapping.h @@ -28,6 +28,7 @@ extern void setup_irq_remapping_ops(void); extern int irq_remapping_supported(void); +extern void set_irq_remapping_broken(void); extern int irq_remapping_prepare(void); extern int irq_remapping_enable(void); extern void irq_remapping_disable(void); diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c index 3755ef4..589092d 100644 --- a/arch/x86/kernel/early-quirks.c +++ b/arch/x86/kernel/early-quirks.c @@ -18,6 +18,7 @@ #include #include #include +#include static void __init fix_hypertransport_config(int num, int slot, int func) { @@ -192,6 +193,27 @@ static void __init ati_bugs_contd(int num, int slot, int func) } #endif +#ifdef CONFIG_IRQ_REMAP +static void __init intel_remapping_check(int num, int slot, int func) +{ + u8 revision; + + revision = read_pci_config_byte(num, slot, func, PCI_REVISION_ID); + + /* +* Revision 0x13 of this chipset supports irq remapping +* but has an erratum that breaks its behavior, flag it as such +*/ + if (revision == 0x13) + set_irq_remapping_broken(); + +} +#else +static void __init intel_remapping_check(int num, int slot, int func) +{ +} +#endif + #define QFLAG_APPLY_ONCE 0x1 #define QFLAG_APPLIED 0x2
Re: [PATCH v10] irq: add quirk for broken interrupt remapping on 55XX chipsets
On Thu, Apr 18, 2013 at 05:02:39PM +0200, Joerg Roedel wrote: > On Tue, Apr 16, 2013 at 04:38:32PM -0400, Neil Horman wrote: > > A few years back intel published a spec update: > > http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf > > > > For the 5520 and 5500 chipsets which contained an errata (specificially > > errata > > 53), which noted that these chipsets can't properly do interrupt remapping, > > and > > as a result the recommend that interrupt remapping be disabled in bios. > > While > > many vendors have a bios update to do exactly that, not all do, and of > > course > > not all users update their bios to a level that corrects the problem. As a > > result, occasionally interrupts can arrive at a cpu even after affinity for > > that > > interrupt has be moved, leading to lost or spurrious interrupts (usually > > characterized by the message: > > kernel: do_IRQ: 7.71 No irq handler for vector (irq -1) > > > > There have been several incidents recently of people seeing this error, and > > investigation has shown that they have system for which their BIOS level is > > such > > that this feature was not properly turned off. As such, it would be good to > > give them a reminder that their systems are vulnurable to this problem. For > > details of those that reported the problem, please see: > > https://bugzilla.redhat.com/show_bug.cgi?id=887006 > > > > Signed-off-by: Neil Horman > > CC: Prarit Bhargava > > CC: Don Zickus > > CC: Don Dutile > > CC: Bjorn Helgaas > > CC: Asit Mallick > > CC: David Woodhouse > > CC: linux-...@vger.kernel.org > > CC: Joerg Roedel > > CC: Konrad Rzeszutek Wilk > > CC: Arkadiusz Miśkiewicz > > Applied with some small changes to my x86/vt-d branch, thanks Neil. > > > Thanks! Neil -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] libcxgbi: supress warning when we request to much space from kmalloc
cxgbi_alloc_big_mem allocates large chunks of memory, and can occasionally request amounts from kmalloc that exceed the allocators capacity. This typically leads to a stack trace from the zoned buddy allocator in the message log. But if kmalloc fails, cxgbi_alloc_big_mem backs off and uses vmalloc instead. Given that, and the fact that the two calls sites have their own error messages if both kmalloc and vmalloc fail, I think the stack trace printing isn't really needed. Modify the call to kmalloc to pass __GFP_NOWARN in as well, so that internal kmalloc warnings are suppressed. Signed-off-by: Neil Horman Reported-by: Honggang LI CC: "James E.J. Bottomley" CC: linux-kernel@vger.kernel.org --- drivers/scsi/cxgbi/libcxgbi.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/cxgbi/libcxgbi.h b/drivers/scsi/cxgbi/libcxgbi.h index 80fa99b..3daf996 100644 --- a/drivers/scsi/cxgbi/libcxgbi.h +++ b/drivers/scsi/cxgbi/libcxgbi.h @@ -658,7 +658,7 @@ static inline u32 cxgbi_tag_nonrsvd_bits(struct cxgbi_tag_format *tformat, static inline void *cxgbi_alloc_big_mem(unsigned int size, gfp_t gfp) { - void *p = kmalloc(size, gfp); + void *p = kmalloc(size, gfp | __GFP_NOWARN); if (!p) p = vmalloc(size); if (p) -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH net] tun: fix recovery from gup errors
On Sun, Jun 23, 2013 at 05:19:03PM +0300, Michael S. Tsirkin wrote: > get user pages might fail partially in tun zero copy > mode. To recover we need to put all pages that we got, > but code used a wrong index resulting in double-free > errors. > > Reported-by: Brad Hubbard > Signed-off-by: Michael S. Tsirkin > --- > > I haven't figured out why do we get failures, > but recovery is clearly wrong. > > This is also -stable material. > > drivers/net/tun.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > Acked-by: Neil Horman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] iommu/vt-d: Expand interrupt remapping quirk to cover x58 chipset
On Thu, Jul 11, 2013 at 06:59:52AM -0400, Neil Horman wrote: > On Wed, Jul 10, 2013 at 01:31:36PM -0400, Don Dutile wrote: > > On 07/09/2013 03:11 PM, Neil Horman wrote: > > >Recently we added an early quirk to detect 5500/5520 chipsets with early > > >revisions that had problems with irq draining with interrupt remapping > > >enabled: > > > > > >commit 03bbcb2e7e292838bb0244f5a7816d194c911d62 > > >Author: Neil Horman > > >Date: Tue Apr 16 16:38:32 2013 -0400 > > > > > > iommu/vt-d: add quirk for broken interrupt remapping on 55XX chipsets > > > > > >It turns out this same problem is present in the intel X58 chipset as > > >well. See > > >errata 69 here: > > >http://www.intel.com/content/www/us/en/chipsets/x58-express-specification-update.html > > > > > >This patch extends the pci early quirk so that the chip devices/revisions > > >specified in the above update are also covered in the same way: > > > > > >Signed-off-by: Neil Horman > > >Reviewed-by: Jan Beulich > > >CC: Jan Beulich > > >CC: Joerg Roedel > > >CC: Andrew Cooper > > >CC: Malcolm Crossley > > >CC: Prarit Bhargava > > >CC: Don Zickus > > >CC: Don Dutile > > >CC: Thomas Gleixner > > >CC: Ingo Molnar > > >CC: "H. Peter Anvin" > > >CC: x...@kernel.org (maintainer:X86 ARCHITECTURE...) > > >CC: sta...@vger.kernel.org > > >--- > > > arch/x86/kernel/early-quirks.c | 15 +-- > > > 1 file changed, 13 insertions(+), 2 deletions(-) > > > > > >diff --git a/arch/x86/kernel/early-quirks.c > > >b/arch/x86/kernel/early-quirks.c > > >index 94ab6b9..743d583 100644 > > >--- a/arch/x86/kernel/early-quirks.c > > >+++ b/arch/x86/kernel/early-quirks.c > > >@@ -196,15 +196,23 @@ static void __init ati_bugs_contd(int num, int slot, > > >int func) > > > static void __init intel_remapping_check(int num, int slot, int func) > > > { > > > u8 revision; > > >+ u16 device; > > > > > >+ device = read_pci_config_16(num, slot, func, PCI_DEVICE_ID); > > > revision = read_pci_config_byte(num, slot, func, PCI_REVISION_ID); > > > > > > /* > > >- * Revision 0x13 of this chipset supports irq remapping > > >- * but has an erratum that breaks its behavior, flag it as such > > >+ * Revision 13 of all triggering devices id in this quirk have > > >+ * a problem draining interrupts when irq remapping is enabled, > > >+ * and should be flagged as broken. Additionally revisions 0x12 > > >+ * and 0x22 of device id 0x3405 has this problem. > > >*/ > > > if (revision == 0x13) > > > set_irq_remapping_broken(); > > >+ else if ((device == 0x3405)&& > > >+ ((revision == 0x12) || > > >+ (revision == 0x22))) > > >+ set_irq_remapping_broken(); > > > > > > } > > > > > When discussing the original-seen errata w/Intel on 55xx chips, the > > statements made were any chip with rev C1(revision = 0x21) or > > greater had the correct > > hw implementation for the intr-pending flush. > > We knew the bug existed in the A3 (rev=0x13) rev of the chip, but the > > true check should be: > > revision < 0x21 > > > > I suspect there were multiple revs of the x58, of which B2(0x12) & C2(0x22) > > were shipped to oem's, system vendors, etc. > > But, in case there were any chip revisions in between these well-known > > values > > out there, I suggest the 0x3405 check be changed to: > > revision < 0x22 > > > > Since it's unlikely that hw degressed in design over revisions, it seems > > more correct to check for revs less than a rev-value having an errata, > > or conversely, a chip value >= rev-value do not have the errata. > > IOW, an equal check may not provide sufficient. > > > Don and I discussed this offline. Given that his comments make good sense > to me, I'm hesitant to apply the quirk to anything other than what the spec > update says, given that its clear. Don is attempting to contact people at > Intel > who will be able (we hope) to give us a definitive answer on this, please hold > on this patch until we have resolution on the issue. > Neil > Don, do you have any updates here from Intel? I'd like to get this put to bed. Neil -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] iommu/vt-d: Expand interrupt remapping quirk to cover x58 chipset
Recently we added an early quirk to detect 5500/5520 chipsets with early revisions that had problems with irq draining with interrupt remapping enabled: commit 03bbcb2e7e292838bb0244f5a7816d194c911d62 Author: Neil Horman Date: Tue Apr 16 16:38:32 2013 -0400 iommu/vt-d: add quirk for broken interrupt remapping on 55XX chipsets It turns out this same problem is present in the intel X58 chipset as well. See errata 69 here: http://www.intel.com/content/www/us/en/chipsets/x58-express-specification-update.html This patch extends the pci early quirk so that the chip devices/revisions specified in the above update are also covered in the same way: Signed-off-by: Neil Horman Reviewed-by: Jan Beulich CC: Jan Beulich CC: Joerg Roedel CC: Andrew Cooper CC: Malcolm Crossley CC: Prarit Bhargava CC: Don Zickus CC: Don Dutile CC: Thomas Gleixner CC: Ingo Molnar CC: "H. Peter Anvin" CC: x...@kernel.org CC: sta...@vger.kernel.org --- Note, this a repost of this patch. Don and I talked about this offline again, and neither of us have been able to gather any information from intel on the subject. While I understand his point that we should try to get confirmation about inclusive steppings that are affected by this errata, I feel like we should commit this patch based on the documentation we do have, and we can always ammend it later if Intel indicates other chips are affected. --- arch/x86/kernel/early-quirks.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c index 94ab6b9..743d583 100644 --- a/arch/x86/kernel/early-quirks.c +++ b/arch/x86/kernel/early-quirks.c @@ -196,15 +196,23 @@ static void __init ati_bugs_contd(int num, int slot, int func) static void __init intel_remapping_check(int num, int slot, int func) { u8 revision; + u16 device; + device = read_pci_config_16(num, slot, func, PCI_DEVICE_ID); revision = read_pci_config_byte(num, slot, func, PCI_REVISION_ID); /* -* Revision 0x13 of this chipset supports irq remapping -* but has an erratum that breaks its behavior, flag it as such +* Revision 13 of all triggering devices id in this quirk have +* a problem draining interrupts when irq remapping is enabled, +* and should be flagged as broken. Additionally revisions 0x12 +* and 0x22 of device id 0x3405 has this problem. */ if (revision == 0x13) set_irq_remapping_broken(); + else if ((device == 0x3405) && + ((revision == 0x12) || +(revision == 0x22))) + set_irq_remapping_broken(); } @@ -239,8 +247,11 @@ static struct chipset early_qrk[] __initdata = { PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs_contd }, { PCI_VENDOR_ID_INTEL, 0x3403, PCI_CLASS_BRIDGE_HOST, PCI_BASE_CLASS_BRIDGE, 0, intel_remapping_check }, + { PCI_VENDOR_ID_INTEL, 0x3405, PCI_CLASS_BRIDGE_HOST, + PCI_BASE_CLASS_BRIDGE, 0, intel_remapping_check }, { PCI_VENDOR_ID_INTEL, 0x3406, PCI_CLASS_BRIDGE_HOST, PCI_BASE_CLASS_BRIDGE, 0, intel_remapping_check }, + {} }; -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 089/145] iommu/vt-d: add quirk for broken interrupt remapping on 55XX chipsets
On Thu, Jul 18, 2013 at 11:02:00AM +0300, Thomas Backlund wrote: > 18.07.2013 01:47, Kamal Mostafa skrev: > >3.8.13.5 -stable review patch. If anyone has any objections, please let me > >know. > > > >------ > > > >From: Neil Horman > > > >commit 03bbcb2e7e292838bb0244f5a7816d194c911d62 upstream. > > > >A few years back intel published a spec update: > >http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf > > > >For the 5520 and 5500 chipsets which contained an errata (specificially > >errata > >53), which noted that these chipsets can't properly do interrupt remapping, > >and > >as a result the recommend that interrupt remapping be disabled in bios. > >While > >many vendors have a bios update to do exactly that, not all do, and of course > >not all users update their bios to a level that corrects the problem. As a > >result, occasionally interrupts can arrive at a cpu even after affinity for > >that > >interrupt has be moved, leading to lost or spurrious interrupts (usually > >characterized by the message: > >kernel: do_IRQ: 7.71 No irq handler for vector (irq -1) > > > >There have been several incidents recently of people seeing this error, and > >investigation has shown that they have system for which their BIOS level is > >such > >that this feature was not properly turned off. As such, it would be good to > >give them a reminder that their systems are vulnurable to this problem. For > >details of those that reported the problem, please see: > >https://bugzilla.redhat.com/show_bug.cgi?id=887006 > > > >[ Joerg: Removed CONFIG_IRQ_REMAP ifdef from early-quirks.c ] > > > >Signed-off-by: Neil Horman > >CC: Prarit Bhargava > >CC: Don Zickus > >CC: Don Dutile > >CC: Bjorn Helgaas > >CC: Asit Mallick > >CC: David Woodhouse > >CC: linux-...@vger.kernel.org > >CC: Joerg Roedel > >CC: Konrad Rzeszutek Wilk > >CC: Arkadiusz Miśkiewicz > >Signed-off-by: Joerg Roedel > >Signed-off-by: Luis Henriques > >--- > > arch/x86/include/asm/irq_remapping.h | 2 ++ > > arch/x86/kernel/early-quirks.c | 20 > > drivers/iommu/intel_irq_remapping.c | 10 ++ > > drivers/iommu/irq_remapping.c| 6 ++ > > drivers/iommu/irq_remapping.h| 2 ++ > > 5 files changed, 40 insertions(+) > > > > This patch introduces this warning on 3.8 series kernels: > > In file included from arch/x86/kernel/early-quirks.c:21:0: > /kernel/linux-3.8.13.5/arch/x86/include/asm/irq_remapping.h:46:10: > varning: ”struct irq_data” deklarerad inuti parameterlista > [aktiverat som standard] > /kernel/linux-3.8.13.5/arch/x86/include/asm/irq_remapping.h:46:10: > varning: dess scope-område är endast denna definition eller > deklaration, vilket troligen inte är vad du vill. [aktiverat som > standard] > /kernel/linux-3.8.13.5/arch/x86/include/asm/irq_remapping.h:50:17: > varning: ”struct msi_msg” deklarerad inuti parameterlista [aktiverat > som standard] > > > You need to add this upstream fix too: > > commit 35d3d814cbd46a85bed97cd74ba97fbbb51e0ccd > Author: Joerg Roedel > Date: Fri Apr 19 20:34:55 2013 +0200 > > iommu: Fix compile warnings with forward declarations > I submited a 3.9 backport that included that fix to -stable over a week ago, you should just be able to use that if you want. Neil > > -- > > Thomas > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 03/23] netprio_cgroup: pass around @css instead of @cgroup and kill struct cgroup_netprio_state
On Thu, Aug 01, 2013 at 05:49:41PM -0400, Tejun Heo wrote: > cgroup controller API will be converted to primarily use struct > cgroup_subsys_state instead of struct cgroup. In preparation, make > the internal functions of netprio_cgroup pass around @css instead of > @cgrp. > > While at it, kill struct cgroup_netprio_state which only contained > struct cgroup_subsys_state without serving any purpose. All functions > are converted to deal with @css directly. > > This patch shouldn't cause any behavior differences. > > Signed-off-by: Tejun Heo > Cc: Neil Horman > Cc: David S. Miller Acked-by: Neil Horman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ACPI: Fix osc flag setup ordering to allow pcie hotplug use when available
Somewhere between 3.9 and 3.10 it seems the order in which pcie and acpi probed slots for hotplug capabilites got reversed. While this isn't a big deal, it did uncover a bug in the ACPI bus setup path. Specifically, acpi_pci_root_add calls pci_acpi_scan_root before setting the osc flags for the device handle. pci_acpi_scan_root, among other things uses device_is_managed_by_native_pciehp() to determine if a given slot has pcie hotplug capabilties, whcih checks the devices OSC_PCI_EXPRESS_NATIVE_HP_CONTROL flag. Since that flag is not set until after pci_acpi_scan_root_completes, the acpi code never sees that pcie slots are hotplug capable and configures them all to use acpi instead. Fix is pretty simple, just defer the scan until after the osc flags have been set on the device. Tested by myself and it seems to work well. Signed-off-by: Neil Horman CC: Len Brown CC: "Rafael J. Wysocki" CC: linux-a...@vger.kernel.org --- drivers/acpi/pci_root.c | 41 - 1 file changed, 20 insertions(+), 21 deletions(-) diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c index 5917839..a2c2661 100644 --- a/drivers/acpi/pci_root.c +++ b/drivers/acpi/pci_root.c @@ -437,27 +437,6 @@ static int acpi_pci_root_add(struct acpi_device *device, flags = base_flags = OSC_PCI_SEGMENT_GROUPS_SUPPORT; acpi_pci_osc_support(root, flags); - /* -* TBD: Need PCI interface for enumeration/configuration of roots. -*/ - - /* -* Scan the Root Bridge -* -* Must do this prior to any attempt to bind the root device, as the -* PCI namespace does not get created until this call is made (and -* thus the root bridge's pci_dev does not exist). -*/ - root->bus = pci_acpi_scan_root(root); - if (!root->bus) { - dev_err(&device->dev, - "Bus %04x:%02x not present in PCI namespace\n", - root->segment, (unsigned int)root->secondary.start); - result = -ENODEV; - goto end; - } - - /* Indicate support for various _OSC capabilities. */ if (pci_ext_cfg_avail()) flags |= OSC_EXT_PCI_CONFIG_SUPPORT; if (pcie_aspm_support_enabled()) { @@ -520,6 +499,26 @@ static int acpi_pci_root_add(struct acpi_device *device, "(_OSC support mask: 0x%02x)\n", flags); } + /* +* TBD: Need PCI interface for enumeration/configuration of roots. +*/ + + /* +* Scan the Root Bridge +* +* Must do this prior to any attempt to bind the root device, as the +* PCI namespace does not get created until this call is made (and +* thus the root bridge's pci_dev does not exist). +*/ + root->bus = pci_acpi_scan_root(root); + if (!root->bus) { + dev_err(&device->dev, + "Bus %04x:%02x not present in PCI namespace\n", + root->segment, (unsigned int)root->secondary.start); + result = -ENODEV; + goto end; + } + pci_acpi_add_bus_pm_notifier(device, root->bus); if (device->wakeup.flags.run_wake) device_set_run_wake(root->bus->bridge, true); -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ACPI: Fix osc flag setup ordering to allow pcie hotplug use when available
On Fri, Aug 23, 2013 at 09:38:18PM +0200, Rafael J. Wysocki wrote: > [CCs added] > > Please always send PCI-related material to linux-pci in the first place. > Sorry, I ran get_maintainers and it seemed to think linux-acpi was sufficient. > The change that broke things for you was actually intentional: > > commit b8178f130e25c1bdac1c33e0996f1ff6e20ec08e > Author: Bjorn Helgaas > Date: Mon Apr 1 15:47:39 2013 -0600 > > Revert "PCI/ACPI: Request _OSC control before scanning PCI root bus" > > This reverts commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6. > > so I think we'll need to clean up the ASMP initialization after all. > Crud. Reading over the revert commit, it seems like the problem boils down to the odering of checking aspm_disabled. It seems to me that the simple fix is to track the desire for acpi to disable aspm separately from the users desire to disable aspm (aspm_disabled). Then we just turn the points where we check the aspm_disabled into the appropriate or of two variables, except for pcie_asmp_sanity_check, which remains sensitive to just the user disable option. Or is there more to this? Neil > On Friday, August 23, 2013 01:19:39 PM Neil Horman wrote: > > Somewhere between 3.9 and 3.10 it seems the order in which pcie and acpi > > probed > > slots for hotplug capabilites got reversed. While this isn't a big deal, > > it did > > uncover a bug in the ACPI bus setup path. Specifically, acpi_pci_root_add > > calls > > pci_acpi_scan_root before setting the osc flags for the device handle. > > pci_acpi_scan_root, among other things uses > > device_is_managed_by_native_pciehp() > > to determine if a given slot has pcie hotplug capabilties, whcih checks the > > devices OSC_PCI_EXPRESS_NATIVE_HP_CONTROL flag. Since that flag is not set > > until after pci_acpi_scan_root_completes, the acpi code never sees that pcie > > slots are hotplug capable and configures them all to use acpi instead. > > > > Fix is pretty simple, just defer the scan until after the osc flags have > > been > > set on the device. Tested by myself and it seems to work well. > > > > Signed-off-by: Neil Horman > > CC: Len Brown > > CC: "Rafael J. Wysocki" > > CC: linux-a...@vger.kernel.org > > --- > > drivers/acpi/pci_root.c | 41 - > > 1 file changed, 20 insertions(+), 21 deletions(-) > > > > diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c > > index 5917839..a2c2661 100644 > > --- a/drivers/acpi/pci_root.c > > +++ b/drivers/acpi/pci_root.c > > @@ -437,27 +437,6 @@ static int acpi_pci_root_add(struct acpi_device > > *device, > > flags = base_flags = OSC_PCI_SEGMENT_GROUPS_SUPPORT; > > acpi_pci_osc_support(root, flags); > > > > - /* > > -* TBD: Need PCI interface for enumeration/configuration of roots. > > -*/ > > - > > - /* > > -* Scan the Root Bridge > > -* > > -* Must do this prior to any attempt to bind the root device, as the > > -* PCI namespace does not get created until this call is made (and > > -* thus the root bridge's pci_dev does not exist). > > -*/ > > - root->bus = pci_acpi_scan_root(root); > > - if (!root->bus) { > > - dev_err(&device->dev, > > - "Bus %04x:%02x not present in PCI namespace\n", > > - root->segment, (unsigned int)root->secondary.start); > > - result = -ENODEV; > > - goto end; > > - } > > - > > - /* Indicate support for various _OSC capabilities. */ > > if (pci_ext_cfg_avail()) > > flags |= OSC_EXT_PCI_CONFIG_SUPPORT; > > if (pcie_aspm_support_enabled()) { > > @@ -520,6 +499,26 @@ static int acpi_pci_root_add(struct acpi_device > > *device, > > "(_OSC support mask: 0x%02x)\n", flags); > > } > > > > + /* > > +* TBD: Need PCI interface for enumeration/configuration of roots. > > +*/ > > + > > + /* > > +* Scan the Root Bridge > > +* > > +* Must do this prior to any attempt to bind the root device, as the > > +* PCI namespace does not get created until this call is made (and > > +* thus the root bridge's pci_dev does not exist). > > +*/ > > + root->bus = pci_acpi_scan_root(root); > > + if (!root->bus) { > > + dev_err(&device->dev, > > +
Re: [PATCH] ACPI: Fix osc flag setup ordering to allow pcie hotplug use when available
On Fri, Aug 23, 2013 at 04:04:59PM -0600, Bjorn Helgaas wrote: > On Fri, Aug 23, 2013 at 3:40 PM, Rafael J. Wysocki wrote: > > On Friday, August 23, 2013 02:46:23 PM Bjorn Helgaas wrote: > >> On Fri, Aug 23, 2013 at 2:53 PM, Rafael J. Wysocki wrote: > >> > On Friday, August 23, 2013 04:05:11 PM Neil Horman wrote: > >> >> On Fri, Aug 23, 2013 at 09:38:18PM +0200, Rafael J. Wysocki wrote: > >> >> > [CCs added] > >> >> > > >> >> > Please always send PCI-related material to linux-pci in the first > >> >> > place. > >> >> > > >> >> Sorry, I ran get_maintainers and it seemed to think linux-acpi was > >> >> sufficient. > >> >> > >> >> > The change that broke things for you was actually intentional: > >> >> > > >> >> > commit b8178f130e25c1bdac1c33e0996f1ff6e20ec08e > >> >> > Author: Bjorn Helgaas > >> >> > Date: Mon Apr 1 15:47:39 2013 -0600 > >> >> > > >> >> > Revert "PCI/ACPI: Request _OSC control before scanning PCI root > >> >> > bus" > >> >> > > >> >> > This reverts commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6. > >> >> > > >> >> > so I think we'll need to clean up the ASMP initialization after all. > >> >> > > >> >> Crud. Reading over the revert commit, it seems like the problem boils > >> >> down to > >> >> the odering of checking aspm_disabled. It seems to me that the simple > >> >> fix is to > >> >> track the desire for acpi to disable aspm separately from the users > >> >> desire to > >> >> disable aspm (aspm_disabled). Then we just turn the points where we > >> >> check the > >> >> aspm_disabled into the appropriate or of two variables, except for > >> >> pcie_asmp_sanity_check, which remains sensitive to just the user > >> >> disable option. > >> >> > >> >> Or is there more to this? > >> > > >> > No, I think you're right. > >> > > >> > Bjorn, what do you think? > >> > >> My opinion is that the _OSC/ASPM state management is ridiculously > >> complicated already, and this would make it slightly more complicated. > >> That's why my preference would be to attempt a more radical cleanup > >> and simplification instead of adding another wart. > > > > Well, do you have anything specific in mind? > > If I had a specific fix in mind, I would just post it, but I've never > had time to work through it all. What I mean by complicated is that > every time I have to look at ASPM, I have to start by trying to figure > out the relationships between these variables: > > aspm_policy # default 0 (POLICY_DEFAULT) > or POLICY_PERFORMANCE > or POLICY_POWERSAVE depending on config > aspm_disabled # default 0 > aspm_force # default 0 > aspm_support_enabled# default true > > plus the _OSC-related code in acpi_pci_root_add(), which honestly is a > rat's nest. > I agree, I've only looked at it for an hour, and it looks pretty hairy. > It sounds like Neil's fix (while probably correct) would tangle that > nest a little more. But I guess it would make sense to see the actual No argument, it would add complexity to something thats already very complex. That said, I think this needs to be fixed. Currently there are several systems that are reporting conflicts between ACPI and PCIE hotplug. While that means theres lots of griping, theres several people willing to test, so I think we have an opportunity to fix this. > patch and the justification (a regression fix, I suppose, but the > details weren't clear to me) before deciding. > Agreed. A larger cleanup would be preferable at this point, but I'm not sufficiently versed in the code to do that right now, so I'll try write an appropriate for this particular bug, and see what you think on monday. Regards Neil -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] netconsole: avoid a crash with multiple sysfs writers
On Wed, Aug 07, 2013 at 12:02:44PM +0300, Dan Aloni wrote: > When my 'ifup eth' script was fired multiple times and ran concurrent on > my laptop, for some obscure /etc scripting reason, it was revealed > that the store_enabled() function in netconsole doesn't handle it nicely, > as recorded by the Oops below (a syslog paste, but not mangled too much > to prevent from discerning the traceback). > > On Linux 3.10.4, this patch seeks to remedy the problem, and it has been > running stable on my laptop for a few days. > > [52608.609325] BUG: unable to handle kernel NULL pointer dereference at > 03e0 > [52608.609331] IP: [] __netpoll_cleanup+0x27/0xe0 > [52608.609339] PGD 15e51a067 PUD 15433e067 PMD 0 > [52608.609343] Oops: [#1] SMP re firewire_ohci firewire_core crc_itu_t > [last unloaded: kvm_intel] > [52608.609347] Modules linked in: kvm_intel tun vfat fat ppdev parport_pc > parport fuse ipt_MASQUERADE usb_storage nf_conntrack_netbios_ns nf_connAug 5 > 09:39:28 inception kernel: [52608.609419] CPU: 1 PID: 13467 Comm: > netconsole-osir Tainted: PF O 3.10.4-alonid #15ntrack_ipv4 bridge > nf_defrag_ipv4 xt_conntrack nf_Aug 5 09:39:28 inception kernel: > [52608.609422] Hardware name: LENOVO 24474KG/24474KG, BIOS G5ET90WW (2.50 ) > 12/21/2012vm snd_hda_codec_hdmi mac80211 snd_hda_codec_realtek vsock Aug 5 > 09:39:28 inception kernel: [52608.609424] task: 88015a4397a0 ti: > 8801ba446000 task.ti: 8801ba446000lloc nvidia(POF) videobuf2_memops > snd_hwdep snd_seq videobuf2Aug 5 09:39:28 inception kernel: [52608.609426] > RIP: 0010:[] [] > __netpoll_cleanup+0x27/0xe0imer media snd i2c_i801 ptp lpc_ich mfd_core > pps_cAug 5 09:39:28 inception kernel: [52608.609430] RSP: > 0018:8801ba447de8 EFLAGS: 00010286e_core crc_itu_t [last unloaded: > kvm_intel] > [52608.609433] RAX: RBX: 880210bbcc68 RCX: > > [52608.609435] RDX: RSI: 8801ba447da0 RDI: > 880210bbcc68 > [52608.609437] RBP: 8801ba447e18 R08: R09: > 0001 > [52608.609439] R10: 000a R11: f000 R12: > 880210bbcc68 > [52608.609441] R13: 88020bc41000 R14: 0002 R15: > 0002000 > [52608.609443] FS: 7f38d7bff740() GS:88021dc4() > knlGS: > [52608.609446] CS: 0010 DS: ES: CR0: > 80050033001427e0 > [52608.609448] CR2: 03e0 CR3: 000154103000 CR4: > 001427e0 > [52608.609450] DR0: DR1: DR2: > > [52608.609452] netpoll: netconsole: local port 6665ess 10.0.0.27 > [52608.609454] netpoll: netconsole: local IPv4 address 10.0.0.27 > [52608.609456] netpoll: netconsole: interface 'em1' > [52608.609457] netpoll: netconsole: remote port 514ress 10.0.0.15 > [52608.609459] netpoll: netconsole: remote IPv4 address 10.0.0.15:65:a8:9a:c7 > [52608.609461] netpoll: netconsole: remote ethernet address > 1c:6f:65:a8:9a:c7400 > [52608.609463] DR3: DR6: 0ff0 DR7: > 0400 > [52608.609464] Stack:801ba447e08 880210bbcc68 ffea > 88020bc41000 > [52608.609466] 8801ba447e08 880210bbcc68 ffea > 88020bc41000 > [52608.609471] 0002 0002 8801ba447e38 > 81532af4 > [52608.609475] 880210bbcc00 8801ba447e78 > 81420e7c > [52608.609479] Call Trace: > [52608.609484] [] netpoll_cleanup+0x24/0x50 > [52608.609489] [] store_enabled+0x5c/0xe0store+0x2e/0x40 > [52608.609492] [] netconsole_target_attr_store+0x2e/0x40 > [52608.609498] [] configfs_write_file+0xd2/0x130 > [52608.609503] [] vfs_write+0xc5/0x1f0 > [52608.609506] [] SyS_write+0x52/0xa0/0x10 > [52608.609511] [] ? do_page_fault+0xe/0x10 > [52608.609516] [] system_call_fastpath+0x16/0x1b > [52608.609517] Code: 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 48 83 ec 30 4c 89 > 65 e0 48 89 5d d8 49 89 fc 4c 89 6d e8 4c 89 75 f0 4c 89 7d f8 48 8 > [52608.609559] RIP [] __netpoll_cleanup+0x27/0xe0 > [52608.609563] RSP > [52608.609564] CR2: 03e0 > [52608.609567] ---[ end trace d25ec343349b61d2 ]--- > > Signed-off-by: Dan Aloni > CC: David S. Miller > CC: Neil Horman > --- > drivers/net/netconsole.c | 13 - > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c > index 4822aaf..dcb2134 100644 > --- a/drivers/net/netconsole.c > +++ b/drivers/net/netconsole.c > @@ -102,6 +102,7 @@ struct netconsole_target { > struct config_item item; >
Re: [PATCH] netconsole: avoid a crash with multiple sysfs writers
On Thu, Aug 08, 2013 at 09:14:31AM +0300, Dan Aloni wrote: > On Thu, Aug 8, 2013 at 8:50 AM, Neil Horman wrote: > > > > On Wed, Aug 07, 2013 at 12:02:44PM +0300, Dan Aloni wrote: > [..] > > > > > When my 'ifup eth' script was fired multiple times and ran concurrent o> > > > @@ -682,7 +689,11 @@ restart: > > >* we might sleep in __netpoll_cleanup() > > >*/ > > > spin_unlock_irqrestore(&target_list_lock, > > > flags); > > > + > > > + mutex_lock(&nt->mutex); > > > __netpoll_cleanup(&nt->np); > > > + mutex_unlock(&nt->mutex); > > > + > > NAK, you can't hold a mutex while calling __netpoll_cleanup. > > __netpoll_cleanup > > may sleep and its illegal to hold a mutex while doing so. > > Neil > > > > To my understanding, it mostly depends on locking order, and having > sleeplocks in the outer order and spinlocks in the inner order is > valid as long the locking order is not reversed. > > Also, drivers/net/team/team.c - another netpoll user, already does the > same thing I intended in this patch - it locks the outer team->lock > mutex in team_uninit() while calling team_port_del() and then > team_port_disable_netpoll() calls __netpoll_cleanup(). > Sorry, you're right, I was under the impression that you were already in atomic context, but as long as you don't have preempt disabled when you try to take the mutex in any path, you should be ok. Signed-off-by: Neil Horman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] module: ppc64 module CRC relocation fix causes perf issues
On Thu, Jul 25, 2013 at 09:14:25AM +1000, Benjamin Herrenschmidt wrote: > On Thu, 2013-07-25 at 08:34 +1000, Anton Blanchard wrote: > > > Apart from the annoying colors, is there anything specific I should > > > be looking for? Some sort of error message, or output that actually > > > makes sense? > > > > Thanks for testing! Ben, I think the patch is good to go. > > Sent it yesterday to Linus, it's upstream already :-) > > Cheers, > Ben. > Sorry I'm a bit late to the thread, I've ben swamped. Has someone tested this with kexec/kdump? Thats why the origional patch was created, because when kexec loads the kernel at a different physical address, the relocations messed with the module crc's, and modules couldn't load during the kexec boot. Assuming that kernaddr_start gets set appropriately during boot, using PHYSICAL_START should be fine, but I wanted to check, and don't currently have access to a powerpc system to do so. Neil > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] module: ppc64 module CRC relocation fix causes perf issues
On Fri, Jul 26, 2013 at 11:19:13AM +1000, Anton Blanchard wrote: > > Hi Neil, > > > Sorry I'm a bit late to the thread, I've ben swamped. Has someone > > tested this with kexec/kdump? Thats why the origional patch was > > created, because when kexec loads the kernel at a different physical > > address, the relocations messed with the module crc's, and modules > > couldn't load during the kexec boot. Assuming that kernaddr_start > > gets set appropriately during boot, using PHYSICAL_START should be > > fine, but I wanted to check, and don't currently have access to a > > powerpc system to do so. Neil > > I tested a relocatable kernel forced to run at a non zero physical > address (ie basically kdump). I verified CRCs were bad with your > original patch backed out, and were good with this patch applied. > > Anton > Perfect, sounds like a sufficient test to me. Acked-by: Neil Horman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] irq: add quirk for broken interrupt remapping on 55XX chipsets
A few years back intel published a spec update: http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf For the 5520 and 5500 chipsets which contained an errata (specificially errata 53), which noted that these chipsets can't properly do interrupt remapping, and as a result the recommend that interrupt remapping be disabled in bios. While many vendors have a bios update to do exactly that, not all do, and of course not all users update their bios to a level that corrects the problem. As a result, occasionally interrupts can arrive at a cpu even after affinity for that interrupt has be moved, leading to lost or spurrious interrupts (usually characterized by the message: kernel: do_IRQ: 7.71 No irq handler for vector (irq -1) There have been several incidents recently of people seeing this error, and investigation has shown that they have system for which their BIOS level is such that this feature was not properly turned off. As such, it would be good to give them a reminder that their systems are vulnurable to this problem. Signed-off-by: Neil Horman CC: Prarit Bhargava CC: Don Zickus CC: Don Dutile CC: Bjorn Helgaas CC: Asit Mallick CC: linux-...@vger.kernel.org --- drivers/iommu/intel_irq_remapping.c | 20 include/linux/pci_ids.h | 2 ++ 2 files changed, 22 insertions(+) diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c index f3b8f23..9bfb6c2 100644 --- a/drivers/iommu/intel_irq_remapping.c +++ b/drivers/iommu/intel_irq_remapping.c @@ -1113,3 +1113,23 @@ struct irq_remap_ops intel_irq_remap_ops = { .msi_setup_irq = intel_msi_setup_irq, .setup_hpet_msi = intel_setup_hpet_msi, }; + + +static void intel_remapping_check(struct pci_dev *dev) +{ + u8 revision; + + pci_read_config_byte(dev, PCI_REVISION_ID, &revision); + + if ((revision == 0x13) && irq_remapping_enabled) { + pr_warn("WARNING WARNING WARNING WARNING WARNING WARNING\n" + "This system BIOS has enabled interrupt remapping\n" + "on a chipset that contains an errata making that\n" + "feature unstable. Please reboot with nointremap\n" + "added to the kernel command line and contact\n" + "your BIOS vendor for an update"); + } +} +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_5520_IOHUB, intel_remapping_check); +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_5500_IOHUB, intel_remapping_check); + diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h index 31717bd..54027a6 100644 --- a/include/linux/pci_ids.h +++ b/include/linux/pci_ids.h @@ -2732,6 +2732,8 @@ #define PCI_DEVICE_ID_INTEL_LYNNFIELD_MC_CH2_RANK_REV2 0x2db2 #define PCI_DEVICE_ID_INTEL_LYNNFIELD_MC_CH2_TC_REV20x2db3 #define PCI_DEVICE_ID_INTEL_82855PM_HB 0x3340 +#define PCI_DEVICE_ID_INTEL_5500_IOHUB 0x3403 +#define PCI_DEVICE_ID_INTEL_5520_IOHUB 0x3406 #define PCI_DEVICE_ID_INTEL_IOAT_TBG4 0x3429 #define PCI_DEVICE_ID_INTEL_IOAT_TBG5 0x342a #define PCI_DEVICE_ID_INTEL_IOAT_TBG6 0x342b -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] irq: add quirk for broken interrupt remapping on 55XX chipsets
On Fri, Mar 01, 2013 at 10:20:35AM -0800, Yinghai Lu wrote: > On Fri, Mar 1, 2013 at 9:17 AM, Neil Horman wrote: > > A few years back intel published a spec update: > > http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf > > > > For the 5520 and 5500 chipsets which contained an errata (specificially > > errata > > 53), which noted that these chipsets can't properly do interrupt remapping, > > and > > as a result the recommend that interrupt remapping be disabled in bios. > > While > > many vendors have a bios update to do exactly that, not all do, and of > > course > > not all users update their bios to a level that corrects the problem. As a > > result, occasionally interrupts can arrive at a cpu even after affinity for > > that > > interrupt has be moved, leading to lost or spurrious interrupts (usually > > characterized by the message: > > kernel: do_IRQ: 7.71 No irq handler for vector (irq -1) > > > > There have been several incidents recently of people seeing this error, and > > investigation has shown that they have system for which their BIOS level is > > such > > that this feature was not properly turned off. As such, it would be good to > > give them a reminder that their systems are vulnurable to this problem. > > > > Signed-off-by: Neil Horman > > CC: Prarit Bhargava > > CC: Don Zickus > > CC: Don Dutile > > CC: Bjorn Helgaas > > CC: Asit Mallick > > CC: linux-...@vger.kernel.org > > --- > > drivers/iommu/intel_irq_remapping.c | 20 > > include/linux/pci_ids.h | 2 ++ > > 2 files changed, 22 insertions(+) > > > > diff --git a/drivers/iommu/intel_irq_remapping.c > > b/drivers/iommu/intel_irq_remapping.c > > index f3b8f23..9bfb6c2 100644 > > --- a/drivers/iommu/intel_irq_remapping.c > > +++ b/drivers/iommu/intel_irq_remapping.c > > @@ -1113,3 +1113,23 @@ struct irq_remap_ops intel_irq_remap_ops = { > > .msi_setup_irq = intel_msi_setup_irq, > > .setup_hpet_msi = intel_setup_hpet_msi, > > }; > > + > > + > > +static void intel_remapping_check(struct pci_dev *dev) > > +{ > > + u8 revision; > > + > > + pci_read_config_byte(dev, PCI_REVISION_ID, &revision); > > + > > + if ((revision == 0x13) && irq_remapping_enabled) { > > + pr_warn("WARNING WARNING WARNING WARNING WARNING WARNING\n" > > + "This system BIOS has enabled interrupt remapping\n" > > + "on a chipset that contains an errata making that\n" > > + "feature unstable. Please reboot with nointremap\n" > > + "added to the kernel command line and contact\n" > > + "your BIOS vendor for an update"); > > + } > > +} > > +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, > > PCI_DEVICE_ID_INTEL_5520_IOHUB, intel_remapping_check); > > +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, > > PCI_DEVICE_ID_INTEL_5500_IOHUB, intel_remapping_check); > > only for x86 platform? > If so, you can check that in arch/x86/kernel/early-quirks.c::early_quirks() > and set one flag and later print warning and skip there if someone > need to enable intr-remap. > So users will not need to reboot the system... > > Thanks > I was under the impression that BIOS might have enabled irq remapping prior to the OS booting, at which point a reboot was requried anyway. Neil -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] irq: add quirk for broken interrupt remapping on 55XX chipsets
On Sat, Mar 02, 2013 at 11:21:29AM -0500, Prarit Bhargava wrote: > On 03/01/2013 12:17 PM, Neil Horman wrote: > > A few years back intel published a spec update: > > http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf > > > > For the 5520 and 5500 chipsets which contained an errata (specificially > > errata > > 53), which noted that these chipsets can't properly do interrupt remapping, > > and > > as a result the recommend that interrupt remapping be disabled in bios. > > While > > many vendors have a bios update to do exactly that, not all do, and of > > course > > not all users update their bios to a level that corrects the problem. As a > > result, occasionally interrupts can arrive at a cpu even after affinity for > > that > > interrupt has be moved, leading to lost or spurrious interrupts (usually > > characterized by the message: > > kernel: do_IRQ: 7.71 No irq handler for vector (irq -1) > > > > There have been several incidents recently of people seeing this error, and > > investigation has shown that they have system for which their BIOS level is > > such > > that this feature was not properly turned off. As such, it would be good to > > give them a reminder that their systems are vulnurable to this problem. > > > > Signed-off-by: Neil Horman > > CC: Prarit Bhargava > > CC: Don Zickus > > CC: Don Dutile > > CC: Bjorn Helgaas > > CC: Asit Mallick > > CC: linux-...@vger.kernel.org > > --- > > drivers/iommu/intel_irq_remapping.c | 20 > > include/linux/pci_ids.h | 2 ++ > > 2 files changed, 22 insertions(+) > > > > diff --git a/drivers/iommu/intel_irq_remapping.c > > b/drivers/iommu/intel_irq_remapping.c > > index f3b8f23..9bfb6c2 100644 > > --- a/drivers/iommu/intel_irq_remapping.c > > +++ b/drivers/iommu/intel_irq_remapping.c > > @@ -1113,3 +1113,23 @@ struct irq_remap_ops intel_irq_remap_ops = { > > .msi_setup_irq = intel_msi_setup_irq, > > .setup_hpet_msi = intel_setup_hpet_msi, > > }; > > + > > + > > +static void intel_remapping_check(struct pci_dev *dev) > > +{ > > + u8 revision; > > + > > + pci_read_config_byte(dev, PCI_REVISION_ID, &revision); > > + > > + if ((revision == 0x13) && irq_remapping_enabled) { > > + pr_warn("WARNING WARNING WARNING WARNING WARNING WARNING\n" > > + "This system BIOS has enabled interrupt remapping\n" > > + "on a chipset that contains an errata making that\n" > > + "feature unstable. Please reboot with nointremap\n" > > + "added to the kernel command line and contact\n" > > + "your BIOS vendor for an update"); > > Make this one line? Might be too long but I believe the preferred policy is > now > to keep the output on one line so that it is easy to find in the kernel > source. > > Also, IMO, remove the WARNING WARNING stuff. > > You also should probably use HW_ERR here too. > > P. > I can do that, I'll post version 2 on monday. Neil -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] irq: add quirk for broken interrupt remapping on 55XX chipsets
A few years back intel published a spec update: http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf For the 5520 and 5500 chipsets which contained an errata (specificially errata 53), which noted that these chipsets can't properly do interrupt remapping, and as a result the recommend that interrupt remapping be disabled in bios. While many vendors have a bios update to do exactly that, not all do, and of course not all users update their bios to a level that corrects the problem. As a result, occasionally interrupts can arrive at a cpu even after affinity for that interrupt has be moved, leading to lost or spurrious interrupts (usually characterized by the message: kernel: do_IRQ: 7.71 No irq handler for vector (irq -1) There have been several incidents recently of people seeing this error, and investigation has shown that they have system for which their BIOS level is such that this feature was not properly turned off. As such, it would be good to give them a reminder that their systems are vulnurable to this problem. Signed-off-by: Neil Horman CC: Prarit Bhargava CC: Don Zickus CC: Don Dutile CC: Bjorn Helgaas CC: Asit Mallick CC: linux-...@vger.kernel.org --- Change notes: v2) * Moved the quirk to the x86 arch, since consensus seems to be that the 55XX chipset series is x86 only. I decided however to keep the quirk as a regular quirk, not an early_quirk. Early quirks have no way currently to determine if BIOS has properly disabled the feature in the iommu, at least not without significant hacking, and since its quite possible this will be a short lived quirk, should Don Z's workaround code prove successful (and it looks like it may well), I don't think that necessecary. * Removed the WARNING banner from the quirk, and added the HW_ERR token to the string, I opted to leave the newlines in place however, as I really couldnt find a way to keep the text on a single line is still legible from a code perspective. I think theres enough language in there that using cscope on just about any substring however will turn it up, and again, this may be a short lived quirk. --- arch/x86/kernel/quirks.c | 18 ++ include/linux/pci_ids.h | 2 ++ 2 files changed, 20 insertions(+) diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c index 26ee48a..a718ea2 100644 --- a/arch/x86/kernel/quirks.c +++ b/arch/x86/kernel/quirks.c @@ -5,6 +5,7 @@ #include #include +#include "../../../drivers/iommu/irq_remapping.h" #if defined(CONFIG_X86_IO_APIC) && defined(CONFIG_SMP) && defined(CONFIG_PCI) @@ -567,3 +568,20 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_15H_NB_F5, quirk_amd_nb_node); #endif + +static void intel_remapping_check(struct pci_dev *dev) +{ + u8 revision; + + pci_read_config_byte(dev, PCI_REVISION_ID, &revision); + + if ((revision == 0x13) && irq_remapping_enabled) { +pr_warn(HW_ERR "This system BIOS has enabled interrupt remapping\n" +"on a chipset that contains an errata making that\n" +"feature unstable. Please reboot with nointremap\n" +"added to the kernel command line and contact\n" +"your BIOS vendor for an update"); + } +} +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_5520_IOHUB, intel_remapping_check); +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_5500_IOHUB, intel_remapping_check); diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h index 31717bd..54027a6 100644 --- a/include/linux/pci_ids.h +++ b/include/linux/pci_ids.h @@ -2732,6 +2732,8 @@ #define PCI_DEVICE_ID_INTEL_LYNNFIELD_MC_CH2_RANK_REV2 0x2db2 #define PCI_DEVICE_ID_INTEL_LYNNFIELD_MC_CH2_TC_REV20x2db3 #define PCI_DEVICE_ID_INTEL_82855PM_HB 0x3340 +#define PCI_DEVICE_ID_INTEL_5500_IOHUB 0x3403 +#define PCI_DEVICE_ID_INTEL_5520_IOHUB 0x3406 #define PCI_DEVICE_ID_INTEL_IOAT_TBG4 0x3429 #define PCI_DEVICE_ID_INTEL_IOAT_TBG5 0x342a #define PCI_DEVICE_ID_INTEL_IOAT_TBG6 0x342b -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/6] coredump: format_corename() fixes/cleanups
On Wed, May 15, 2013 at 10:11:58PM +0200, Oleg Nesterov wrote: > Hello. > > On 05/13, Oleg Nesterov wrote: > > > > With the patch below we can trivially fix the problem, > > > > + char *fmt = ispipe ? "\e%s\e" : "%s"; > > ... > > - err = cn_printf(cn, "%s", current->comm); > > + err = cn_printf(cn, fmt, current->comm); > > > > Or this ESC hack is too ugly or can break something? > > OK, nobody really nacked "[PATCH] teach argv_split() to ignore the spaces > surrounded by \e", see http://marc.info/?l=linux-kernel&m=136845597401674 > > I am going to send this patch "officially" and fix format_corename/argv_split > interaction. > > But lets fix other format_corename() bugs first: leak and use-after-free. > Plus some cleanups. > > Oleg. > > fs/coredump.c | 120 +++- > 1 files changed, 58 insertions(+), 62 deletions(-) > > For the series Acked-by: Neil Horman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: net_dropmon usage documentation/examples?
On Fri, Apr 05, 2013 at 07:38:55PM +, Eric Wong wrote: > Hi Neil, I'm wondering if you have or know of any public > documentation/examples for using net_dropmon. > > If not, I'll figure it out on my own at some point. > > Thanks in advance! > > (Not a very high priority project for me, my network connectivity > problems are sadly _very_ obvious at the moment :x) > I don't think there are any particular examples at the moment, but if you download the dropwatch utility, it uses the dropmon protocol to detect frame loss in the stack, and its use is pretty self explanitory, as is the perf script that uses the same tracepoints as the dropmon protocol. https://fedorahosted.org/dropwatch/ Regards Neil -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/