Re: [PATCH net] vsock: forward all packets to the host when no H2G is registered

2020-11-19 Thread Alexander Graf


On 19.11.20 15:03, Stefan Hajnoczi wrote:

On Thu, Nov 12, 2020 at 02:38:37PM +0100, Stefano Garzarella wrote:

Before commit c0cfa2d8a788 ("vsock: add multi-transports support"),
if a G2H transport was loaded (e.g. virtio transport), every packets
was forwarded to the host, regardless of the destination CID.
The H2G transports implemented until then (vhost-vsock, VMCI) always
responded with an error, if the destination CID was not
VMADDR_CID_HOST.

 From that commit, we are using the remote CID to decide which
transport to use, so packets with remote CID > VMADDR_CID_HOST(2)
are sent only through H2G transport. If no H2G is available, packets
are discarded directly in the guest.

Some use cases (e.g. Nitro Enclaves [1]) rely on the old behaviour
to implement sibling VMs communication, so we restore the old
behavior when no H2G is registered.
It will be up to the host to discard packets if the destination is
not the right one. As it was already implemented before adding
multi-transport support.

Tested with nested QEMU/KVM by me and Nitro Enclaves by Andra.

[1] Documentation/virt/ne_overview.rst

Cc: Jorgen Hansen 
Cc: Dexuan Cui 
Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
Reported-by: Andra Paraschiv 
Tested-by: Andra Paraschiv 
Signed-off-by: Stefano Garzarella 
---
  net/vmw_vsock/af_vsock.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

Acked-by: Stefan Hajnoczi 



Is there anything we have to do to also get this into the affected 
stable trees? :)


Alex





Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




Re: How to implement message forwarding from one CID to another in vhost driver

2024-05-20 Thread Alexander Graf

Howdy,

On 20.05.24 14:44, Dorjoy Chowdhury wrote:

Hey Stefano,

Thanks for the reply.


On Mon, May 20, 2024, 2:55 PM Stefano Garzarella  wrote:

Hi Dorjoy,

On Sat, May 18, 2024 at 04:17:38PM GMT, Dorjoy Chowdhury wrote:

Hi,

Hope you are doing well. I am working on adding AWS Nitro Enclave[1]
emulation support in QEMU. Alexander Graf is mentoring me on this work. A v1
patch series has already been posted to the qemu-devel mailing list[2].

AWS nitro enclaves is an Amazon EC2[3] feature that allows creating isolated
execution environments, called enclaves, from Amazon EC2 instances, which are
used for processing highly sensitive data. Enclaves have no persistent storage
and no external networking. The enclave VMs are based on Firecracker microvm
and have a vhost-vsock device for communication with the parent EC2 instance
that spawned it and a Nitro Secure Module (NSM) device for cryptographic
attestation. The parent instance VM always has CID 3 while the enclave VM gets
a dynamic CID. The enclave VMs can communicate with the parent instance over
various ports to CID 3, for example, the init process inside an enclave sends a
heartbeat to port 9000 upon boot, expecting a heartbeat reply, letting the
parent instance know that the enclave VM has successfully booted.

The plan is to eventually make the nitro enclave emulation in QEMU standalone
i.e., without needing to run another VM with CID 3 with proper vsock

If you don't have to launch another VM, maybe we can avoid vhost-vsock
and emulate virtio-vsock in user-space, having complete control over the
behavior.

So we could use this opportunity to implement virtio-vsock in QEMU [4]
or use vhost-user-vsock [5] and customize it somehow.
(Note: vhost-user-vsock already supports sibling communication, so maybe
with a few modifications it fits your case perfectly)

[4] https://gitlab.com/qemu-project/qemu/-/issues/2095
[5] https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock



Thanks for letting me know. Right now I don't have a complete picture
but I will look into them. Thank you.




communication support. For this to work, one approach could be to teach the
vhost driver in kernel to forward CID 3 messages to another CID N

So in this case both CID 3 and N would be assigned to the same QEMU
process?



CID N is assigned to the enclave VM. CID 3 was supposed to be the
parent VM that spawns the enclave VM (this is how it is in AWS, where
an EC2 instance VM spawns the enclave VM from inside it and that
parent EC2 instance always has CID 3). But in the QEMU case as we
don't want a parent VM (we want to run enclave VMs standalone) we
would need to forward the CID 3 messages to host CID. I don't know if
it means CID 3 and CID N is assigned to the same QEMU process. Sorry.



There are 2 use cases here:

1) Enclave wants to treat host as parent (default). In this scenario, 
the "parent instance" that shows up as CID 3 in the Enclave doesn't 
really exist. Instead, when the Enclave attempts to talk to CID 3, it 
should really land on CID 0 (hypervisor). When the hypervisor tries to 
connect to the Enclave on port X, it should look as if it originates 
from CID 3, not CID 0.


2) Multiple parent VMs. Think of an actual cloud hosting scenario. Here, 
we have multiple "parent instances". Each of them thinks it's CID 3. 
Each can spawn an Enclave that talks to CID 3 and reach the parent. For 
this case, I think implementing all of virtio-vsock in user space is the 
best path forward. But in theory, you could also swizzle CIDs to make 
random "real" CIDs appear as CID 3.






Do you have to allocate 2 separate virtio-vsock devices, one for the
parent and one for the enclave?



If there is a parent VM, then I guess both parent and enclave VMs need
virtio-vsock devices.


(set to CID 2 for host) i.e., it patches CID from 3 to N on incoming messages
and from N to 3 on responses. This will enable users of the

Will these messages have the VMADDR_FLAG_TO_HOST flag set?

We don't support this in vhost-vsock yet, if supporting it helps, we
might, but we need to better understand how to avoid security issues, so
maybe each device needs to explicitly enable the feature and specify
from which CIDs it accepts packets.



I don't know about the flag. So I don't know if it will be set. Sorry.



From the guest's point of view, the parent (CID 3) is just another VM. 
Since Linux as of


 
https://patchwork.ozlabs.org/project/netdev/patch/20201204170235.84387-4-andra...@amazon.com/#2594117

always sets VMADDR_FLAG_TO_HOST when local_CID > 0 && remote_CID > 0, I 
would say the message has the flag set.


How would you envision the host to implement the flag? Would the host 
allow user space to listen on any CID and hence receive the respective 
target connections? And wouldn't listening on CID 0 then mean you're 
effectively listening to "any" othe

Re: How to implement message forwarding from one CID to another in vhost driver

2024-05-27 Thread Alexander Graf

Hey Stefano,

On 23.05.24 10:45, Stefano Garzarella wrote:

On Tue, May 21, 2024 at 08:50:22AM GMT, Alexander Graf wrote:

Howdy,

On 20.05.24 14:44, Dorjoy Chowdhury wrote:

Hey Stefano,

Thanks for the reply.


On Mon, May 20, 2024, 2:55 PM Stefano Garzarella 
 wrote:

Hi Dorjoy,

On Sat, May 18, 2024 at 04:17:38PM GMT, Dorjoy Chowdhury wrote:

Hi,

Hope you are doing well. I am working on adding AWS Nitro Enclave[1]
emulation support in QEMU. Alexander Graf is mentoring me on this 
work. A v1
patch series has already been posted to the qemu-devel mailing 
list[2].


AWS nitro enclaves is an Amazon EC2[3] feature that allows 
creating isolated
execution environments, called enclaves, from Amazon EC2 
instances, which are
used for processing highly sensitive data. Enclaves have no 
persistent storage
and no external networking. The enclave VMs are based on 
Firecracker microvm
and have a vhost-vsock device for communication with the parent 
EC2 instance
that spawned it and a Nitro Secure Module (NSM) device for 
cryptographic
attestation. The parent instance VM always has CID 3 while the 
enclave VM gets
a dynamic CID. The enclave VMs can communicate with the parent 
instance over
various ports to CID 3, for example, the init process inside an 
enclave sends a
heartbeat to port 9000 upon boot, expecting a heartbeat reply, 
letting the

parent instance know that the enclave VM has successfully booted.

The plan is to eventually make the nitro enclave emulation in QEMU 
standalone

i.e., without needing to run another VM with CID 3 with proper vsock

If you don't have to launch another VM, maybe we can avoid vhost-vsock
and emulate virtio-vsock in user-space, having complete control 
over the

behavior.

So we could use this opportunity to implement virtio-vsock in QEMU [4]
or use vhost-user-vsock [5] and customize it somehow.
(Note: vhost-user-vsock already supports sibling communication, so 
maybe

with a few modifications it fits your case perfectly)

[4] https://gitlab.com/qemu-project/qemu/-/issues/2095
[5] 
https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock



Thanks for letting me know. Right now I don't have a complete picture
but I will look into them. Thank you.



communication support. For this to work, one approach could be to 
teach the

vhost driver in kernel to forward CID 3 messages to another CID N

So in this case both CID 3 and N would be assigned to the same QEMU
process?



CID N is assigned to the enclave VM. CID 3 was supposed to be the
parent VM that spawns the enclave VM (this is how it is in AWS, where
an EC2 instance VM spawns the enclave VM from inside it and that
parent EC2 instance always has CID 3). But in the QEMU case as we
don't want a parent VM (we want to run enclave VMs standalone) we
would need to forward the CID 3 messages to host CID. I don't know if
it means CID 3 and CID N is assigned to the same QEMU process. Sorry.



There are 2 use cases here:

1) Enclave wants to treat host as parent (default). In this scenario,
the "parent instance" that shows up as CID 3 in the Enclave doesn't
really exist. Instead, when the Enclave attempts to talk to CID 3, it
should really land on CID 0 (hypervisor). When the hypervisor tries to
connect to the Enclave on port X, it should look as if it originates
from CID 3, not CID 0.

2) Multiple parent VMs. Think of an actual cloud hosting scenario.
Here, we have multiple "parent instances". Each of them thinks it's
CID 3. Each can spawn an Enclave that talks to CID 3 and reach the
parent. For this case, I think implementing all of virtio-vsock in
user space is the best path forward. But in theory, you could also
swizzle CIDs to make random "real" CIDs appear as CID 3.



Thank you for clarifying the use cases!

Also for case 1, vhost-vsock doesn't support CID 0, so in my opinion
it's easier to go into user-space with vhost-user-vsock or the built-in
device.



Sorry, I believe I meant CID 2. Effectively for case 1, when a process 
on the hypervisor listens on port 1234, it should be visible as 3:1234 
from the VM and when the hypervisor process connects to :1234, 
it should look as if that connection came from CID 3.




Maybe initially with vhost-user-vsock it's easier because we already
have some thing that works and supports sibling communication (for case
2).



The problem with vhost-user-vsock is that you don't get to use AF_VSOCK 
as a host process.


A typical Nitro Enclaves application is split into 2 parts: An 
in-Enclave component that listens/connects to vsock and a parent process 
that listens/connects to vsock. The experience of launching an Enclave 
is very similar to launching a QEMU VM: You run nitro-cli and tell it to 
pop up the Enclave based on an EIF file. Nitro-cli then tells you the 
CID that was allocated for the Enclave and you communicate to it using that.


What I would ideally like to have as development e

Re: How to implement message forwarding from one CID to another in vhost driver

2024-05-27 Thread Alexander Graf


On 27.05.24 09:08, Alexander Graf wrote:

Hey Stefano,

On 23.05.24 10:45, Stefano Garzarella wrote:

On Tue, May 21, 2024 at 08:50:22AM GMT, Alexander Graf wrote:

Howdy,

On 20.05.24 14:44, Dorjoy Chowdhury wrote:

Hey Stefano,

Thanks for the reply.


On Mon, May 20, 2024, 2:55 PM Stefano Garzarella 
 wrote:

Hi Dorjoy,

On Sat, May 18, 2024 at 04:17:38PM GMT, Dorjoy Chowdhury wrote:

Hi,

Hope you are doing well. I am working on adding AWS Nitro Enclave[1]
emulation support in QEMU. Alexander Graf is mentoring me on this 
work. A v1
patch series has already been posted to the qemu-devel mailing 
list[2].


AWS nitro enclaves is an Amazon EC2[3] feature that allows 
creating isolated
execution environments, called enclaves, from Amazon EC2 
instances, which are
used for processing highly sensitive data. Enclaves have no 
persistent storage
and no external networking. The enclave VMs are based on 
Firecracker microvm
and have a vhost-vsock device for communication with the parent 
EC2 instance
that spawned it and a Nitro Secure Module (NSM) device for 
cryptographic
attestation. The parent instance VM always has CID 3 while the 
enclave VM gets
a dynamic CID. The enclave VMs can communicate with the parent 
instance over
various ports to CID 3, for example, the init process inside an 
enclave sends a
heartbeat to port 9000 upon boot, expecting a heartbeat reply, 
letting the

parent instance know that the enclave VM has successfully booted.

The plan is to eventually make the nitro enclave emulation in 
QEMU standalone

i.e., without needing to run another VM with CID 3 with proper vsock
If you don't have to launch another VM, maybe we can avoid 
vhost-vsock
and emulate virtio-vsock in user-space, having complete control 
over the

behavior.

So we could use this opportunity to implement virtio-vsock in QEMU 
[4]

or use vhost-user-vsock [5] and customize it somehow.
(Note: vhost-user-vsock already supports sibling communication, so 
maybe

with a few modifications it fits your case perfectly)

[4] https://gitlab.com/qemu-project/qemu/-/issues/2095
[5] 
https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock



Thanks for letting me know. Right now I don't have a complete picture
but I will look into them. Thank you.



communication support. For this to work, one approach could be to 
teach the

vhost driver in kernel to forward CID 3 messages to another CID N

So in this case both CID 3 and N would be assigned to the same QEMU
process?



CID N is assigned to the enclave VM. CID 3 was supposed to be the
parent VM that spawns the enclave VM (this is how it is in AWS, where
an EC2 instance VM spawns the enclave VM from inside it and that
parent EC2 instance always has CID 3). But in the QEMU case as we
don't want a parent VM (we want to run enclave VMs standalone) we
would need to forward the CID 3 messages to host CID. I don't know if
it means CID 3 and CID N is assigned to the same QEMU process. Sorry.



There are 2 use cases here:

1) Enclave wants to treat host as parent (default). In this scenario,
the "parent instance" that shows up as CID 3 in the Enclave doesn't
really exist. Instead, when the Enclave attempts to talk to CID 3, it
should really land on CID 0 (hypervisor). When the hypervisor tries to
connect to the Enclave on port X, it should look as if it originates
from CID 3, not CID 0.

2) Multiple parent VMs. Think of an actual cloud hosting scenario.
Here, we have multiple "parent instances". Each of them thinks it's
CID 3. Each can spawn an Enclave that talks to CID 3 and reach the
parent. For this case, I think implementing all of virtio-vsock in
user space is the best path forward. But in theory, you could also
swizzle CIDs to make random "real" CIDs appear as CID 3.



Thank you for clarifying the use cases!

Also for case 1, vhost-vsock doesn't support CID 0, so in my opinion
it's easier to go into user-space with vhost-user-vsock or the built-in
device.



Sorry, I believe I meant CID 2. Effectively for case 1, when a process 
on the hypervisor listens on port 1234, it should be visible as 3:1234 
from the VM and when the hypervisor process connects to :1234, 
it should look as if that connection came from CID 3.



Now that I'm thinking about my message again: What if we just introduce 
a sysfs/sysctl file for vsock that indicates the "host CID" (default: 
2)? Users that want vhost-vsock to behave as if the host is CID 3 can 
just write 3 to it.


It means we'd need to change all references to VMADDR_CID_HOST to 
instead refer to a global variable that indicates the new "host CID". 
It'd need some more careful massaging to not break number namespace 
assumptions (<= CID_HOST no longer works), but the idea should fly.


That would give us all 3 options:

1) User sets vsock.host_cid = 3 to simulate that the host is in reality 
an enclave parent

2) User spa

Re: How to implement message forwarding from one CID to another in vhost driver

2024-05-29 Thread Alexander Graf


On 29.05.24 10:04, Stefano Garzarella wrote:


On Tue, May 28, 2024 at 06:38:24PM GMT, Paolo Bonzini wrote:
On Tue, May 28, 2024 at 5:53 PM Stefano Garzarella 
 wrote:


On Tue, May 28, 2024 at 05:49:32PM GMT, Paolo Bonzini wrote:
>On Tue, May 28, 2024 at 5:41 PM Stefano Garzarella 
 wrote:

>> >I think it's either that or implementing virtio-vsock in userspace
>> 
>(https://lore.kernel.org/qemu-devel/30baeb56-64d2-4ea3-8e53-6a5c50999...@redhat.com/,

>> >search for "To connect host<->guest").
>>
>> For in this case AF_VSOCK can't be used in the host, right?
>> So it's similar to vhost-user-vsock.
>
>Not sure if I understand but in this case QEMU knows which CIDs are
>forwarded to the host (either listen on vsock and connect to the host,
>or vice versa), so there is no kernel and no VMADDR_FLAG_TO_HOST
>involved.

I meant that the application in the host that wants to connect to the
guest cannot use AF_VSOCK in the host, but must use the one where QEMU
is listening (e.g. AF_INET, AF_UNIX), right?

I think one of Alex's requirements was that the application in the host
continue to use AF_VSOCK as in their environment.


Can the host use VMADDR_CID_LOCAL for host-to-host communication?


Yep!


If
so, the proposed "-object vsock-forward" syntax can connect to it and
it should work as long as the application on the host does not assume
that it is on CID 3.


Right, good point!
We can also support something similar in vhost-user-vsock, where instead
of using AF_UNIX and firecracker's hybrid vsock, we can redirect
everything to VMADDR_CID_LOCAL.

Alex what do you think? That would simplify things a lot to do.
The only difference is that the application in the host has to talk to
VMADDR_CID_LOCAL (1).



The application in the host would see an incoming connection from CID 1 
(which is probably fine) and would still be able to establish outgoing 
connections to the actual VM's CID as long as the Enclave doesn't check 
for the peer CID (I haven't seen anyone check yet). So yes, indeed, this 
should work.


The only case where I can see it breaking is when you run multiple 
Enclave VMs in parallel. In that case, each would try to listen to CID 3 
and the second that does would fail. But it's a well solvable problem: 
We could (in addition to the simple in-QEMU case) build an external 
daemon that does the proxying and hence owns CID3.


So the immediate plan would be to:

  1) Build a new vhost-vsock-forward object model that connects to 
vhost as CID 3 and then forwards every packet from CID 1 to the 
Enclave-CID and every packet that arrives on to CID 3 to CID 2.
  2) Create a machine option for -M nitro-enclave that automatically 
spawns the vhost-vsock-forward object. (default: off)



The above may need some fiddling with object creation times to ensure 
that the forward object gets CID 3, not the Enclave as auto-assigned CID.



Thanks,

Alex




Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597


[PATCH] lan78xx: Connect phy early

2018-03-14 Thread Alexander Graf
When using wicked with a lan78xx device attached to the system, we
end up with ethtool commands issued on the device before an ifup
got issued. That lead to the following crash:

Unable to handle kernel NULL pointer dereference at virtual address 039c
pgd = 800035b3
[039c] *pgd=
Internal error: Oops: 9604 [#1] SMP
Modules linked in: [...]
Supported: Yes
CPU: 3 PID: 638 Comm: wickedd Tainted: GE  
4.12.14-0-default #1
Hardware name: raspberrypi rpi/rpi, BIOS 2018.03-rc2 02/21/2018
task: 800035e74180 task.stack: 800036718000
PC is at phy_ethtool_ksettings_get+0x20/0x98
LR is at lan78xx_get_link_ksettings+0x44/0x60 [lan78xx]
pc : [] lr : [] pstate: 2005
sp : 80003671bb20
x29: 80003671bb20 x28: 800035e74180
x27: 08912000 x26: 001d
x25: 0124 x24: 08f74d00
x23: 004000114809 x22: 
x21: 80003671bbd0 x20: 
x19: 80003671bbd0 x18: 040d
x17: 0001 x16: 
x15:  x14: 
x13:  x12: 0020
x11: 0101010101010101 x10: fefefefefefefeff
x9 : 7f7f7f7f7f7f7f7f x8 : fefefeff31677364
x7 : 80808080 x6 : 80003671bc9c
x5 : 80003671b9f8 x4 : 80002c296190
x3 :  x2 : 
x1 : 80003671bbd0 x0 : 80003671bc00
Process wickedd (pid: 638, stack limit = 0x800036718000)
Call trace:
Exception stack(0x80003671b9e0 to 0x80003671bb20)
b9e0: 80003671bc00 80003671bbd0  
ba00: 80002c296190 80003671b9f8 80003671bc9c 80808080
ba20: fefefeff31677364 7f7f7f7f7f7f7f7f fefefefefefefeff 0101010101010101
ba40: 0020   
ba60:  0001 040d 80003671bbd0
ba80:  80003671bbd0  004000114809
baa0: 08f74d00 0124 001d 08912000
bac0: 800035e74180 80003671bb20 00dcca84 80003671bb20
bae0: 086f7f30 2005 80002c296000 800035223900
bb00:   80003671bb20 086f7f30
[] phy_ethtool_ksettings_get+0x20/0x98
[] lan78xx_get_link_ksettings+0x44/0x60 [lan78xx]
[] ethtool_get_settings+0x68/0x210
[] dev_ethtool+0x214/0x2180
[] dev_ioctl+0x400/0x630
[] sock_do_ioctl+0x70/0x88
[] sock_ioctl+0x208/0x368
[] do_vfs_ioctl+0xb0/0x848
[] SyS_ioctl+0x8c/0xa8
Exception stack(0x80003671bec0 to 0x80003671c000)
bec0: 0009 8946 f4e841d0 aa0032687465
bee0: fa2319d4 f4e841d4 32687465 32687465
bf00: 001d 7f7fff7f7f7f7f7f 72606b622e71ff4c 7f7f7f7f7f7f7f7f
bf20: 0101010101010101 0020  7f510c68
bf40: 7f6a9d18 7f44ce30 040d 7f6f98f0
bf60: f4e842c0 0001 fa2c2e00 7f6ab000
bf80: f4e842c0 7f62a000 fa2b9f20 fa2c2e00
bfa0: f4e84818 f4e841a0 7f5ad0cc f4e841a0
bfc0: 7f44ce3c 8000 0009 001d
bfe0:    

The culprit is quite simple: The driver tries to access the phy left and right,
but only actually has a working reference to it when the device is up.

The fix thus is quite simple too: Get a reference to the phy on probe already
and keep it even when the device is going down.

With this patch applied, I can successfully run wicked on my system and bring
the interface up and down as many times as I want, without getting NULL pointer
dereferences in between.

Signed-off-by: Alexander Graf 
---
 drivers/net/usb/lan78xx.c | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index 60a604cc7647..931cc124ab0c 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -2082,8 +2082,6 @@ static int lan78xx_phy_init(struct lan78xx_net *dev)
 
dev->fc_autoneg = phydev->autoneg;
 
-   phy_start(phydev);
-
netif_dbg(dev, ifup, dev->net, "phy initialised successfully");
 
return 0;
@@ -2512,9 +2510,7 @@ static int lan78xx_open(struct net_device *net)
if (ret < 0)
goto done;
 
-   ret = lan78xx_phy_init(dev);
-   if (ret < 0)
-   goto done;
+   phy_start(net->phydev);
 
/* for Link Check */
if (dev->urb_intr) {
@@ -2575,13 +2571,7 @@ static int lan78xx_stop(str

[PATCH v2] lan78xx: Connect phy early

2018-04-03 Thread Alexander Graf
When using wicked with a lan78xx device attached to the system, we
end up with ethtool commands issued on the device before an ifup
got issued. That lead to the following crash:

Unable to handle kernel NULL pointer dereference at virtual address 039c
pgd = 800035b3
[039c] *pgd=
Internal error: Oops: 9604 [#1] SMP
Modules linked in: [...]
Supported: Yes
CPU: 3 PID: 638 Comm: wickedd Tainted: GE  
4.12.14-0-default #1
Hardware name: raspberrypi rpi/rpi, BIOS 2018.03-rc2 02/21/2018
task: 800035e74180 task.stack: 800036718000
PC is at phy_ethtool_ksettings_get+0x20/0x98
LR is at lan78xx_get_link_ksettings+0x44/0x60 [lan78xx]
pc : [] lr : [] pstate: 2005
sp : 80003671bb20
x29: 80003671bb20 x28: 800035e74180
x27: 08912000 x26: 001d
x25: 0124 x24: 08f74d00
x23: 004000114809 x22: 
x21: 80003671bbd0 x20: 
x19: 80003671bbd0 x18: 040d
x17: 0001 x16: 
x15:  x14: 
x13:  x12: 0020
x11: 0101010101010101 x10: fefefefefefefeff
x9 : 7f7f7f7f7f7f7f7f x8 : fefefeff31677364
x7 : 80808080 x6 : 80003671bc9c
x5 : 80003671b9f8 x4 : 80002c296190
x3 :  x2 : 
x1 : 80003671bbd0 x0 : 80003671bc00
Process wickedd (pid: 638, stack limit = 0x800036718000)
Call trace:
Exception stack(0x80003671b9e0 to 0x80003671bb20)
b9e0: 80003671bc00 80003671bbd0  
ba00: 80002c296190 80003671b9f8 80003671bc9c 80808080
ba20: fefefeff31677364 7f7f7f7f7f7f7f7f fefefefefefefeff 0101010101010101
ba40: 0020   
ba60:  0001 040d 80003671bbd0
ba80:  80003671bbd0  004000114809
baa0: 08f74d00 0124 001d 08912000
bac0: 800035e74180 80003671bb20 00dcca84 80003671bb20
bae0: 086f7f30 2005 80002c296000 800035223900
bb00:   80003671bb20 086f7f30
[] phy_ethtool_ksettings_get+0x20/0x98
[] lan78xx_get_link_ksettings+0x44/0x60 [lan78xx]
[] ethtool_get_settings+0x68/0x210
[] dev_ethtool+0x214/0x2180
[] dev_ioctl+0x400/0x630
[] sock_do_ioctl+0x70/0x88
[] sock_ioctl+0x208/0x368
[] do_vfs_ioctl+0xb0/0x848
[] SyS_ioctl+0x8c/0xa8
Exception stack(0x80003671bec0 to 0x80003671c000)
bec0: 0009 8946 f4e841d0 aa0032687465
bee0: fa2319d4 f4e841d4 32687465 32687465
bf00: 001d 7f7fff7f7f7f7f7f 72606b622e71ff4c 7f7f7f7f7f7f7f7f
bf20: 0101010101010101 0020  7f510c68
bf40: 7f6a9d18 7f44ce30 040d 7f6f98f0
bf60: f4e842c0 0001 fa2c2e00 7f6ab000
bf80: f4e842c0 7f62a000 fa2b9f20 fa2c2e00
bfa0: f4e84818 f4e841a0 7f5ad0cc f4e841a0
bfc0: 7f44ce3c 8000 0009 001d
bfe0:    

The culprit is quite simple: The driver tries to access the phy left and right,
but only actually has a working reference to it when the device is up.

The fix thus is quite simple too: Get a reference to the phy on probe already
and keep it even when the device is going down.

With this patch applied, I can successfully run wicked on my system and bring
the interface up and down as many times as I want, without getting NULL pointer
dereferences in between.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - move debug message (Andrew Lunn)
  - check for phydev != NULL before phy_stop (Sayed Nisar)
  - move phy_disconnect before unregister_netdev (Sayed Nisar)
  - properly unroll failing probe phy_init (Sayed Nisar)
  - change phy_init to phy_start in reset_resume (Sayed Nisar)
---
 drivers/net/usb/lan78xx.c | 34 ++
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index 60a604cc7647..6986f53fe967 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -2082,10 +2082,6 @@ static int lan78xx_phy_init(struct lan78xx_net *dev)
 
dev->fc_autoneg = phydev->autoneg;
 
-   phy_start(phydev);
-
-   netif_dbg(dev, ifup, dev->net, "phy initialised successfully");
-
return 0;
 
 error:
@@ -2512,9 +2508,9 @@ static i

[PATCH] [iputils] Print received packets as icmp_seq

2007-08-07 Thread Alexander Graf
Now, ping and ping6 print the packets which are actually received, too, not
only the amount of sent packets.

It has the format:
icmp_seq=received/seq

Signed-off-by: Alexander Graf <[EMAIL PROTECTED]>
---
 ping_common.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/ping_common.c b/ping_common.c
index be36cbd..83be553 100644
--- a/ping_common.c
+++ b/ping_common.c
@@ -711,7 +711,7 @@ restamp:
} else {
int i;
__u8 *cp, *dp;
-   printf("%d bytes from %s: icmp_seq=%u", cc, from, seq);
+   printf("%d bytes from %s: icmp_seq=%li/%u", cc, from, 
nreceived, seq);
 
if (hops >= 0)
printf(" ttl=%d", hops);
-- 
1.5.2.4


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [iputils] Print packet loss with more precision

2007-08-07 Thread Alexander Graf
Signed-off-by: Alexander Graf <[EMAIL PROTECTED]>
---
 ping_common.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/ping_common.c b/ping_common.c
index 83be553..49acab2 100644
--- a/ping_common.c
+++ b/ping_common.c
@@ -795,9 +795,9 @@ void finish(void)
if (nerrors)
printf(", +%ld errors", nerrors);
if (ntransmitted) {
-   printf(", %d%% packet loss",
-  (int) long long)(ntransmitted - nreceived)) * 100) /
- ntransmitted));
+   printf(", %f%% packet loss",
+  (((long long)(ntransmitted - nreceived)) * 100.0) /
+ ntransmitted);
printf(", time %ldms", 1000*tv.tv_sec+tv.tv_usec/1000);
}
putchar('\n');
-- 
1.5.2.4


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] phy dp83867: depend on CONFIG_OF_MDIO

2016-05-16 Thread Alexander Graf
The DP83867 phy driver doesn't actually work when CONFIG_OF_MDIO isn't enabled.
It simply passes the device tree test, but leaves all internal configuration
initialized at 0. Then it configures the phy with those values and renders a
previously working configuration useless.

This patch makes sure that we only build the DP83867 phy code when
CONFIG_OF_MDIO is set, to not run into that problem.

Signed-off-by: Alexander Graf 
---
 drivers/net/phy/Kconfig   | 1 +
 drivers/net/phy/dp83867.c | 7 ---
 2 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index 6dad9a9..4265ad5 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -148,6 +148,7 @@ config DP83848_PHY
 
 config DP83867_PHY
tristate "Drivers for Texas Instruments DP83867 Gigabit PHY"
+   depends on OF_MDIO
---help---
  Currently supports the DP83867 PHY.
 
diff --git a/drivers/net/phy/dp83867.c b/drivers/net/phy/dp83867.c
index 2afa61b..ff867ba 100644
--- a/drivers/net/phy/dp83867.c
+++ b/drivers/net/phy/dp83867.c
@@ -99,7 +99,6 @@ static int dp83867_config_intr(struct phy_device *phydev)
return phy_write(phydev, MII_DP83867_MICR, micr_status);
 }
 
-#ifdef CONFIG_OF_MDIO
 static int dp83867_of_init(struct phy_device *phydev)
 {
struct dp83867_private *dp83867 = phydev->priv;
@@ -123,12 +122,6 @@ static int dp83867_of_init(struct phy_device *phydev)
return of_property_read_u32(of_node, "ti,fifo-depth",
   &dp83867->fifo_depth);
 }
-#else
-static int dp83867_of_init(struct phy_device *phydev)
-{
-   return 0;
-}
-#endif /* CONFIG_OF_MDIO */
 
 static int dp83867_config_init(struct phy_device *phydev)
 {
-- 
1.8.5.6



Re: [PATCH] phy dp83867: depend on CONFIG_OF_MDIO

2016-05-16 Thread Alexander Graf
Hi Dan,

On 16.05.16 15:38, Dan Murphy wrote:
> Alexander
> 
> On 05/16/2016 06:28 AM, Alexander Graf wrote:
>> The DP83867 phy driver doesn't actually work when CONFIG_OF_MDIO isn't 
>> enabled.
>> It simply passes the device tree test, but leaves all internal configuration
>> initialized at 0. Then it configures the phy with those values and renders a
>> previously working configuration useless.
>>
>> This patch makes sure that we only build the DP83867 phy code when
>> CONFIG_OF_MDIO is set, to not run into that problem.
>>
>> Signed-off-by: Alexander Graf 
>> ---
>>  drivers/net/phy/Kconfig   | 1 +
>>  drivers/net/phy/dp83867.c | 7 ---
>>  2 files changed, 1 insertion(+), 7 deletions(-)
>>
>> diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
>> index 6dad9a9..4265ad5 100644
>> --- a/drivers/net/phy/Kconfig
>> +++ b/drivers/net/phy/Kconfig
>> @@ -148,6 +148,7 @@ config DP83848_PHY
>>  
>>  config DP83867_PHY
>>  tristate "Drivers for Texas Instruments DP83867 Gigabit PHY"
>> +depends on OF_MDIO
>>  ---help---
>>Currently supports the DP83867 PHY.
>>  
>> diff --git a/drivers/net/phy/dp83867.c b/drivers/net/phy/dp83867.c
>> index 2afa61b..ff867ba 100644
>> --- a/drivers/net/phy/dp83867.c
>> +++ b/drivers/net/phy/dp83867.c
>> @@ -99,7 +99,6 @@ static int dp83867_config_intr(struct phy_device *phydev)
>>  return phy_write(phydev, MII_DP83867_MICR, micr_status);
>>  }
>>  
>> -#ifdef CONFIG_OF_MDIO
>>  static int dp83867_of_init(struct phy_device *phydev)
>>  {
>>  struct dp83867_private *dp83867 = phydev->priv;
>> @@ -123,12 +122,6 @@ static int dp83867_of_init(struct phy_device *phydev)
>>  return of_property_read_u32(of_node, "ti,fifo-depth",
>> &dp83867->fifo_depth);
>>  }
>> -#else
>> -static int dp83867_of_init(struct phy_device *phydev)
>> -{
>> -return 0;
>> -}
>> -#endif /* CONFIG_OF_MDIO */
>>  
>>  static int dp83867_config_init(struct phy_device *phydev)
>>  {
> I don't think we want this to depend solely on OF_MDIO.
> 
> The #else case should probably be coded to look at platform data, if
> it exists.  I don't have any boards that still used platform data to test this
> out so I did not feel comfortable adding code I could not test.

Since there was no code to look at platform data, those boards would be
broken just as well today, no? So at the end of the day, this change
should be no regression for them.


Alex


Re: [PATCH] phy dp83867: depend on CONFIG_OF_MDIO

2016-05-16 Thread Alexander Graf


On 16.05.16 14:44, Andrew Lunn wrote:
> On Mon, May 16, 2016 at 01:28:15PM +0200, Alexander Graf wrote:
>> The DP83867 phy driver doesn't actually work when CONFIG_OF_MDIO isn't 
>> enabled.
>> It simply passes the device tree test, but leaves all internal configuration
>> initialized at 0. Then it configures the phy with those values and renders a
>> previously working configuration useless.
>>
>> This patch makes sure that we only build the DP83867 phy code when
>> CONFIG_OF_MDIO is set, to not run into that problem.
> 
> Hi Alexander
> 
> Looking at the code, the parameters read from device tree are needed
> for RGMII mode. What about the case the PHY is used not in RGMII mode?
> Could it be there are device trees which legitimately don't have these
> properties since they are not needed, and are now going to get
> -ENODEV?

To be honest, I don't know. I only stumbled over this because we compile
all those phy drivers as modules and the driver simply rendered one
system I was working on unusable wrt network.

Dan?


Alex


[PATCH v2 1/2] phy dp83867: Fix compilation with CONFIG_OF_MDIO=m

2016-05-16 Thread Alexander Graf
When CONFIG_OF_MDIO is configured as module, the #define for it really
is CONFIG_OF_MDIO_MODULE, not CONFIG_OF_MDIO. So if we are compiling it
as module, the dp83867 doesn't see that OF_MDIO was selected and doesn't
read the dt rgmii parameters.

The fix is simple: Use IS_ENABLED(). It checks for both - module as well
as compiled in code.

Signed-off-by: Alexander Graf 
---
 drivers/net/phy/dp83867.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/phy/dp83867.c b/drivers/net/phy/dp83867.c
index 2afa61b..94cc278 100644
--- a/drivers/net/phy/dp83867.c
+++ b/drivers/net/phy/dp83867.c
@@ -99,7 +99,7 @@ static int dp83867_config_intr(struct phy_device *phydev)
return phy_write(phydev, MII_DP83867_MICR, micr_status);
 }
 
-#ifdef CONFIG_OF_MDIO
+#if IS_ENABLED(CONFIG_OF_MDIO)
 static int dp83867_of_init(struct phy_device *phydev)
 {
struct dp83867_private *dp83867 = phydev->priv;
-- 
1.8.5.6



[PATCH v2 2/2] phy dp83867: Make rgmii parameters optional

2016-05-16 Thread Alexander Graf
If you compile without OF_MDIO support in an RGMII configuration, we fail
to configure the dp83867 phy today by writing garbage into its configuration
registers.

On the other hand if you do compile with OF_MDIO and the phy gets loaded via
device tree, you have to have the properties set in the device tree, otherwise
we fail to load the driver and don't even attach the generic phy driver to
the interface anymore.

To make things slightly more consistent, make the rgmii configuration properties
optional and allow a user to omit them in their device tree.

Signed-off-by: Alexander Graf 
---
 drivers/net/phy/dp83867.c | 31 ---
 1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/drivers/net/phy/dp83867.c b/drivers/net/phy/dp83867.c
index 94cc278..1b01680 100644
--- a/drivers/net/phy/dp83867.c
+++ b/drivers/net/phy/dp83867.c
@@ -65,6 +65,7 @@ struct dp83867_private {
int rx_id_delay;
int tx_id_delay;
int fifo_depth;
+   int values_are_sane;
 };
 
 static int dp83867_ack_interrupt(struct phy_device *phydev)
@@ -113,15 +114,30 @@ static int dp83867_of_init(struct phy_device *phydev)
ret = of_property_read_u32(of_node, "ti,rx-internal-delay",
   &dp83867->rx_id_delay);
if (ret)
-   return ret;
+   goto invalid_dt;
 
ret = of_property_read_u32(of_node, "ti,tx-internal-delay",
   &dp83867->tx_id_delay);
if (ret)
-   return ret;
+   goto invalid_dt;
 
-   return of_property_read_u32(of_node, "ti,fifo-depth",
+   ret = of_property_read_u32(of_node, "ti,fifo-depth",
   &dp83867->fifo_depth);
+   if (ret)
+   goto invalid_dt;
+
+   dp83867->values_are_sane = 1;
+
+   return 0;
+
+invalid_dt:
+   phydev_err(phydev, "missing properties in device tree");
+
+   /*
+* We can still run with a broken dt by not using any of the optional
+* parameters, so just don't set dp83867->values_are_sane.
+*/
+   return 0;
 }
 #else
 static int dp83867_of_init(struct phy_device *phydev)
@@ -150,6 +166,15 @@ static int dp83867_config_init(struct phy_device *phydev)
dp83867 = (struct dp83867_private *)phydev->priv;
}
 
+   /*
+* With no or broken device tree, we don't have the values that we would
+* want to configure the phy with. In that case, cross our fingers and
+* assume that firmware did everything correctly for us or that we don't
+* need them.
+*/
+   if (!dp83867->values_are_sane)
+   return 0;
+
if (phy_interface_is_rgmii(phydev)) {
ret = phy_write(phydev, MII_DP83867_PHYCTRL,
(dp83867->fifo_depth << 
DP83867_PHYCR_FIFO_DEPTH_SHIFT));
-- 
1.8.5.6



Re: [PATCH v2 2/2] phy dp83867: Make rgmii parameters optional

2016-05-17 Thread Alexander Graf
Hi David,

> Am 17.05.2016 um 20:48 schrieb David Miller :
> 
> From: Dan Murphy 
> Date: Tue, 17 May 2016 13:34:34 -0500
> 
>> David
>> 
>>> On 05/17/2016 01:22 PM, David Miller wrote:
>>> From: Alexander Graf 
>>> Date: Mon, 16 May 2016 20:52:43 +0200
>>> 
>>>> If you compile without OF_MDIO support in an RGMII configuration, we fail
>>>> to configure the dp83867 phy today by writing garbage into its 
>>>> configuration
>>>> registers.
>>>> 
>>>> On the other hand if you do compile with OF_MDIO and the phy gets loaded 
>>>> via
>>>> device tree, you have to have the properties set in the device tree, 
>>>> otherwise
>>>> we fail to load the driver and don't even attach the generic phy driver to
>>>> the interface anymore.
>>>> 
>>>> To make things slightly more consistent, make the rgmii configuration 
>>>> properties
>>>> optional and allow a user to omit them in their device tree.
>>>> 
>>>> Signed-off-by: Alexander Graf 
>>> Applied.
>> 
>> This patch should not have been applied.
>> 
>> I did not believe the implementation was proper for that driver.
>> 
>> It seems my objection to the code was not seen.  Nor was Andrew's point 
>> about the DT bindings document
>> 
>> https://patchwork.kernel.org/patch/9105371/
> 
> The discussions around the recent phy patches have been a labrynth that I've
> found hard to follow, sorry.
> 
> I'll revert these two, sigh

The first patch is an obvious and correct fix. Discussions were only about the 
second one (which I'm happy to drop given the rat hole this turned out to be).

Alex




Re: [PATCH v2 2/2] phy dp83867: Make rgmii parameters optional

2016-05-17 Thread Alexander Graf


> Am 17.05.2016 um 22:40 schrieb Dan Murphy :
> 
>> On 05/17/2016 03:37 PM, Alexander Graf wrote:
>> Hi David,
>> 
>>> Am 17.05.2016 um 20:48 schrieb David Miller :
>>> 
>>> From: Dan Murphy 
>>> Date: Tue, 17 May 2016 13:34:34 -0500
>>> 
>>>> David
>>>> 
>>>>> On 05/17/2016 01:22 PM, David Miller wrote:
>>>>> From: Alexander Graf 
>>>>> Date: Mon, 16 May 2016 20:52:43 +0200
>>>>> 
>>>>>> If you compile without OF_MDIO support in an RGMII configuration, we fail
>>>>>> to configure the dp83867 phy today by writing garbage into its 
>>>>>> configuration
>>>>>> registers.
>>>>>> 
>>>>>> On the other hand if you do compile with OF_MDIO and the phy gets loaded 
>>>>>> via
>>>>>> device tree, you have to have the properties set in the device tree, 
>>>>>> otherwise
>>>>>> we fail to load the driver and don't even attach the generic phy driver 
>>>>>> to
>>>>>> the interface anymore.
>>>>>> 
>>>>>> To make things slightly more consistent, make the rgmii configuration 
>>>>>> properties
>>>>>> optional and allow a user to omit them in their device tree.
>>>>>> 
>>>>>> Signed-off-by: Alexander Graf 
>>>>> Applied.
>>>> This patch should not have been applied.
>>>> 
>>>> I did not believe the implementation was proper for that driver.
>>>> 
>>>> It seems my objection to the code was not seen.  Nor was Andrew's point 
>>>> about the DT bindings document
>>>> 
>>>> https://patchwork.kernel.org/patch/9105371/
>>> The discussions around the recent phy patches have been a labrynth that I've
>>> found hard to follow, sorry.
>>> 
>>> I'll revert these two, sigh
>> The first patch is an obvious and correct fix. Discussions were only about 
>> the second one (which I'm happy to drop given the rat hole this turned out 
>> to be).
> 
> So are you going to abandon the second patch all together?
> If you do, let me know I can submit a patch.

I'm fairly sure you'll be done with it much quicker than me and the real bug I 
wanted to fix is that the driver broke with OF_MDIO=m ;)

So yes, please take over from here.


Alex




Re: [PATCH] qeth: Default to allow promiscuous mode

2016-03-19 Thread Alexander Graf


On 17.03.16 18:52, Evgeny Cherkashin wrote:
> Hello all,
> 
> -Alexander Graf  wrote: -
> 
>> To: linux-s...@vger.kernel.org
>> From: Alexander Graf 
>> Date: 2016-03-17 20:01
>> Cc: Evgeny Cherkashin/Russia/IBM@IBMRU, linux-ker...@vger.kernel.org,
>> Heiko Carstens , Martin Schwidefsky
>> , Ursula Braun ,
>> i...@suse.de, m...@suse.com, pwieczorkiew...@suse.de
>> Subject: [PATCH] qeth: Default to allow promiscuous mode
>>
>> When a qeth device is in bridge role, one of the ports of an adapter
>> can
>> ask for promiscuous mode and get it enabled.
>>
>> The default until now was to not allow user space to enable
>> promiscuous mode
>> without switching sysfs attributes as well though, diverging from
>> usual
>> network device semantics.
>>
>> This patch sets the default to allow promiscuous enablement. That way
>> all
>> existing tools "just work", albeit only one port of an adapter can be
>> in
>> promiscuous mode at a given time.
>>
>> Signed-off-by: Alexander Graf 
>> ---
>> drivers/s390/net/qeth_l2_sys.c | 4 
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/s390/net/qeth_l2_sys.c
>> b/drivers/s390/net/qeth_l2_sys.c
>> index 692db49..98c7ac5 100644
>> --- a/drivers/s390/net/qeth_l2_sys.c
>> +++ b/drivers/s390/net/qeth_l2_sys.c
>> @@ -258,6 +258,10 @@ void qeth_l2_setup_bridgeport_attrs(struct
>> qeth_card *card)
>>  return;
>>  if (!card->options.sbp.supported_funcs)
>>  return;
>> +
>> +/* Allow to set promiscuous by default */
>> +card->options.sbp.reflect_promisc = 1;
>> +
>>  if (card->options.sbp.role != QETH_SBP_ROLE_NONE) {
>>  /* Conditional to avoid spurious error messages */
>>  qeth_bridgeport_setrole(card, card->options.sbp.role);
>> -- 
>> 1.8.5.6
>>
>>
> 
> 1. The patch changes the default behaviour (which is, on s390, to ignore 
> promisc setting) and has potential to break existing setups. One potentially 
> dangerous scenario is that a Linux instance may inadvertently snatch the 
> BRIDGEPORT role from a zVM VSWITCH that bridges the HiperSockets LAN to the 
> outer world, disrupting connectivity for many users.

Look at it the other way around. Hardware allows you to do X, why should
Linux prevent you from doing X then?

If you can break another LPAR on the system, then your virtualization
solution sucks and people should be aware of it and simply configure it
in a way that it doesn't. If you can't trust your admin of a Linux
system to call the right commands, you can't trust him to not be
malicious either and break it on purpose.

> 2. The patch sets the reflect_promisc field without setting the 
> reflect_promisc_primary field, making it unclear to the user what is the 
> default role. (It will be "secondary" but it's better to set it explicitly.)

I can send it explicitly if that makes it move obvious, sure.

> 3. The patch should probably go via the -networking maillist.

Ah, get_maintainers.pl didn't list it. You may want to send a patch for
the MAINTAINERS file here :).

> It seems to me that it would be better to use the sysfs attribute from a 
> startup script or a udev rule, *only* when the system configuration requires 
> "real" promiscuous mode, e.g. when the interface is a member of a software 
> bridge.

You've now successfully added a lot of headaches, curses and pain to the
plate of the admin of such a system. As an admin of such a system, I
don't want to have to deal with s390x specifics. That's why I'm using
Linux there.

And who decides what "real" promiscuous mode is? Nobody should set a
network device into promiscuous mode today already unless it's required.

The current thinking in your company seems to be that "real" users
should explicitly set sysfs attribute. Let's think this a bit ahead.
Imagine a world with this sysfs attribute. Every "real" user of
promiscuous mode will now set the sysfs attribute as well in parallel to
setting promiscuous mode on the device. Once you have patched 50 user
space packages to touch something they really shouldn't have to care
about, the same applications that today set an interface to promiscuous
mode, set it to promiscuous mode, just by writing to sysfs plus enabling it.

So at the end of the day, the only thing you've done is you've created a
lot of work for people who are not you, for no gain.

That leaves only one option. We can create a udev rule that *always*
sets the device to promiscuous capable. But that's the same net result
as the patch I've sent, no?


Alex