date:20190306

Re: [PATCH] vsock/virtio: fix kernel panic from virtio_transport_reset_no_sock

2019-03-06 Thread Stefano Garzarella

Hi Adalbert,
thanks for catching this issue, I have a comment below.

On Tue, Mar 05, 2019 at 08:01:45PM +0200, Adalbert Lazăr wrote:
> Previous to commit 22b5c0b63f32 ("vsock/virtio: fix kernel panic after device 
> hot-unplug"),
> vsock_core_init() was called from virtio_vsock_probe(). Now,
> virtio_transport_reset_no_sock() can be called before vsock_core_init()
> has the chance to run.
> 
> [Wed Feb 27 14:17:09 2019] BUG: unable to handle kernel NULL pointer 
> dereference at 0110
> [Wed Feb 27 14:17:09 2019] #PF error: [normal kernel read fault]
> [Wed Feb 27 14:17:09 2019] PGD 0 P4D 0
> [Wed Feb 27 14:17:09 2019] Oops:  [#1] SMP PTI
> [Wed Feb 27 14:17:09 2019] CPU: 3 PID: 59 Comm: kworker/3:1 Not tainted 
> 5.0.0-rc7-390-generic-hvi #390
> [Wed Feb 27 14:17:09 2019] Hardware name: QEMU Standard PC (i440FX + PIIX, 
> 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> [Wed Feb 27 14:17:09 2019] Workqueue: virtio_vsock virtio_transport_rx_work 
> [vmw_vsock_virtio_transport]
> [Wed Feb 27 14:17:09 2019] RIP: 0010:virtio_transport_reset_no_sock+0x8c/0xc0 
> [vmw_vsock_virtio_transport_common]
> [Wed Feb 27 14:17:09 2019] Code: 35 8b 4f 14 48 8b 57 08 31 f6 44 8b 4f 10 44 
> 8b 07 48 8d 7d c8 e8 84 f8 ff ff 48 85 c0 48 89 c3 74 2a e8 f7 31 03 00 48 89 
> df <48> 8b 80 10 01 00 00 e8 68 fb 69 ed 48 8b 75 f0 65 48 33 34 25 28
> [Wed Feb 27 14:17:09 2019] RSP: 0018:b42701ab7d40 EFLAGS: 00010282
> [Wed Feb 27 14:17:09 2019] RAX:  RBX: 9d79637ee080 RCX: 
> 0003
> [Wed Feb 27 14:17:09 2019] RDX: 0001 RSI: 0002 RDI: 
> 9d79637ee080
> [Wed Feb 27 14:17:09 2019] RBP: b42701ab7d78 R08: 9d796fae70e0 R09: 
> 9d796f403500
> [Wed Feb 27 14:17:09 2019] R10: b42701ab7d90 R11:  R12: 
> 9d7969d09240
> [Wed Feb 27 14:17:09 2019] R13: 9d79624e6840 R14: 9d7969d09318 R15: 
> 9d796d48ff80
> [Wed Feb 27 14:17:09 2019] FS:  () 
> GS:9d796fac() knlGS:
> [Wed Feb 27 14:17:09 2019] CS:  0010 DS:  ES:  CR0: 80050033
> [Wed Feb 27 14:17:09 2019] CR2: 0110 CR3: 000427f22000 CR4: 
> 06e0
> [Wed Feb 27 14:17:09 2019] DR0:  DR1:  DR2: 
> 
> [Wed Feb 27 14:17:09 2019] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [Wed Feb 27 14:17:09 2019] Call Trace:
> [Wed Feb 27 14:17:09 2019]  virtio_transport_recv_pkt+0x63/0x820 
> [vmw_vsock_virtio_transport_common]
> [Wed Feb 27 14:17:09 2019]  ? kfree+0x17e/0x190
> [Wed Feb 27 14:17:09 2019]  ? detach_buf_split+0x145/0x160
> [Wed Feb 27 14:17:09 2019]  ? __switch_to_asm+0x40/0x70
> [Wed Feb 27 14:17:09 2019]  virtio_transport_rx_work+0xa0/0x106 
> [vmw_vsock_virtio_transport]
> [Wed Feb 27 14:17:09 2019] NET: Registered protocol family 40
> [Wed Feb 27 14:17:09 2019]  process_one_work+0x167/0x410
> [Wed Feb 27 14:17:09 2019]  worker_thread+0x4d/0x460
> [Wed Feb 27 14:17:09 2019]  kthread+0x105/0x140
> [Wed Feb 27 14:17:09 2019]  ? rescuer_thread+0x360/0x360
> [Wed Feb 27 14:17:09 2019]  ? kthread_destroy_worker+0x50/0x50
> [Wed Feb 27 14:17:09 2019]  ret_from_fork+0x35/0x40
> [Wed Feb 27 14:17:09 2019] Modules linked in: vmw_vsock_virtio_transport 
> vmw_vsock_virtio_transport_common input_leds vsock serio_raw i2c_piix4 
> mac_hid qemu_fw_cfg autofs4 cirrus ttm drm_kms_helper syscopyarea sysfillrect 
> sysimgblt fb_sys_fops virtio_net psmouse drm net_failover pata_acpi 
> virtio_blk failover floppy
> [Wed Feb 27 14:17:09 2019] CR2: 0110
> [Wed Feb 27 14:17:09 2019] ---[ end trace baa35abd2e040fe5 ]---
> [Wed Feb 27 14:17:09 2019] RIP: 0010:virtio_transport_reset_no_sock+0x8c/0xc0 
> [vmw_vsock_virtio_transport_common]
> [Wed Feb 27 14:17:09 2019] Code: 35 8b 4f 14 48 8b 57 08 31 f6 44 8b 4f 10 44 
> 8b 07 48 8d 7d c8 e8 84 f8 ff ff 48 85 c0 48 89 c3 74 2a e8 f7 31 03 00 48 89 
> df <48> 8b 80 10 01 00 00 e8 68 fb 69 ed 48 8b 75 f0 65 48 33 34 25 28
> [Wed Feb 27 14:17:09 2019] RSP: 0018:b42701ab7d40 EFLAGS: 00010282
> [Wed Feb 27 14:17:09 2019] RAX:  RBX: 9d79637ee080 RCX: 
> 0003
> [Wed Feb 27 14:17:09 2019] RDX: 0001 RSI: 0002 RDI: 
> 9d79637ee080
> [Wed Feb 27 14:17:09 2019] RBP: b42701ab7d78 R08: 9d796fae70e0 R09: 
> 9d796f403500
> [Wed Feb 27 14:17:09 2019] R10: b42701ab7d90 R11:  R12: 
> 9d7969d09240
> [Wed Feb 27 14:17:09 2019] R13: 9d79624e6840 R14: 9d7969d09318 R15: 
> 9d796d48ff80
> [Wed Feb 27 14:17:09 2019] FS:  () 
> GS:9d796fac() knlGS:
> [Wed Feb 27 14:17:09 2019] CS:  0010 DS:  ES:  CR0: 80050033
> [Wed Feb 27 14:17:09 2019] CR2: 0110 CR3: 000427f22000 CR4: 
> 06e0
> [Wed Feb 27 14:17:09 2019] DR0:  DR1:  DR2: 
> 
> [Wed Feb 27 14:17:09 20

[PATCH net] net: hns3: Fix a logical vs bitwise typo

2019-03-06 Thread Dan Carpenter

There were a couple logical ORs accidentally mixed in with the bitwise
ORs.

Fixes: e8149933b1fa ("net: hns3: remove hnae3_get_bit in data path")
Signed-off-by: Dan Carpenter 
---
Very recent bug.

 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 3cb43b1f1c2e..1e4efc47c7a5 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -2321,8 +2321,8 @@ static void hns3_rx_checksum(struct hns3_enet_ring *ring, 
struct sk_buff *skb,
if (!(bd_base_info & BIT(HNS3_RXD_L3L4P_B)))
return;
 
-   if (unlikely(l234info & (BIT(HNS3_RXD_L3E_B) | BIT(HNS3_RXD_L4E_B) ||
-BIT(HNS3_RXD_OL3E_B) ||
+   if (unlikely(l234info & (BIT(HNS3_RXD_L3E_B) | BIT(HNS3_RXD_L4E_B) |
+BIT(HNS3_RXD_OL3E_B) |
 BIT(HNS3_RXD_OL4E_B {
u64_stats_update_begin(&ring->syncp);
ring->stats.l3l4_csum_err++;
-- 
2.17.1

Re: [RFC PATCH net-next] failover: allow name change on IFF_UP slave interfaces

2019-03-06 Thread si-wei liu

On 3/5/2019 11:23 PM, Michael S. Tsirkin wrote:

On Tue, Mar 05, 2019 at 11:15:06PM -0800, si-wei liu wrote:

On 3/5/2019 10:43 PM, Michael S. Tsirkin wrote:

On Tue, Mar 05, 2019 at 04:51:00PM -0800, si-wei liu wrote:

On 3/5/2019 4:36 PM, Michael S. Tsirkin wrote:

On Tue, Mar 05, 2019 at 04:20:50PM -0800, si-wei liu wrote:

On 3/5/2019 4:06 PM, Michael S. Tsirkin wrote:

On Tue, Mar 05, 2019 at 11:35:50AM -0800, si-wei liu wrote:

On 3/5/2019 11:24 AM, Stephen Hemminger wrote:

On Tue, 5 Mar 2019 11:19:32 -0800
si-wei liu wrote:

I have a vague idea: would it work to *not* set
IFF_UP on slave devices at all?

Hmm, I ever thought about this option, and it appears this solution is
more invasive than required to convert existing scripts, despite the
controversy of introducing internal netdev state to differentiate user
visible state. Either we disallow slave to be brought up by user, or to
not set IFF_UP flag but instead use the internal one, could end up with
substantial behavioral change that breaks scripts. Consider any admin
script that does `ip link set dev ... up' successfully just assumes the
link is up and subsequent operation can be done as usual.

How would it work when carrier is off?

While it *may*

work for dracut (yet to be verified), I'm a bit concerned that there are
more scripts to be converted than those that don't follow volatile
failover slave names. It's technically doable, but may not worth the
effort (in terms of porting existing scripts/apps).

Thanks
-Siwei

Won't work for most devices. Many devices turn off PHY and link layer
if not IFF_UP

True, that's what I said about introducing internal state for those driver
and other kernel component. Very invasive change indeed.

-Siwei

Well I did say it's vague.
How about hiding IFF_UP from dev_get_flags (and probably
__dev_change_flags)?

Any different? This has small footprint for the kernel change for sure,
while the discrepancy is still there. Anyone who writes code for IFF_UP will
not notice IFF_FAILOVER_SLAVE.

Not to mention more userspace "fixup" work has to be done due to this
change.

-Siwei

Point is it's ok since most userspace should just ignore slaves
- hopefully it will just ignore it since it already
ignores interfaces that are down.

Admin script thought the interface could be bright up and do further
operations without checking the UP flag.

These scripts then would be broken on any box with multiple interfaces
since not all of these would have carrier.

Consider a script executing `ifconfig ... up' and once succeeds runs tcpdump
or some other command relying on UP interface. It's quite common that those
scripts don't check the UP flag but instead just rely on the well-known fact
that the command exits with 0 meaning the interface should be UP. This
change might well break scripts of that kind.

I am sorry I don't get it. Could you give an example
of a script that works now but would be broken?

https://github.com/torvalds/linux/blob/master/tools/testing/selftests/net/netdevice.sh#L27
https://github.com/WPO-Foundation/wptagent/blob/master/internal/adb.py#L443
https://github.com/openstack/steth/blob/master/steth/agent/api.py#L134

There are more if you keep searching.

-Siwei

It doesn't look to be a reliable
way of prohibit userspace from operating against slaves.

-Siwei

This does not mean we shouldn't make an effort to disable broken
configurations.

I am not arguing against your patch. Not at all. I see better
hiding of slaves as a separate enhancement.

I understand, but my point is we should try to minimize unnecessary side
impact to the current usage for whatever "hiding" effort we can make. It's
hard to find a tradeoff sometimes.

Yes if some userspace made an assumption and it worked, we should keep
it working I think. I don't necessarily agree we should worry too much
about theoretical issues. In half a year since the feature got merged
it's unlikely there are millions of slightly different scripts using it.

1 2 >

1 - 100 of 138 matches

Mail list logo