[PATCH net-next] net/mlxfw: Use kzalloc for allocating only one thing
Use kzalloc rather than kcalloc(1,...) The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // @@ @@ - kcalloc(1, + kzalloc( ...) // Signed-off-by: Zheng Yongjun --- drivers/net/ethernet/mellanox/mlxfw/mlxfw_mfa2.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlxfw/mlxfw_mfa2.c b/drivers/net/ethernet/mellanox/mlxfw/mlxfw_mfa2.c index 5d9ddf36fb4e..e6f677e42007 100644 --- a/drivers/net/ethernet/mellanox/mlxfw/mlxfw_mfa2.c +++ b/drivers/net/ethernet/mellanox/mlxfw/mlxfw_mfa2.c @@ -267,7 +267,7 @@ struct mlxfw_mfa2_file *mlxfw_mfa2_file_init(const struct firmware *fw) const void *first_tlv_ptr; const void *cb_top_ptr; - mfa2_file = kcalloc(1, sizeof(*mfa2_file), GFP_KERNEL); + mfa2_file = kzalloc(sizeof(*mfa2_file), GFP_KERNEL); if (!mfa2_file) return ERR_PTR(-ENOMEM); -- 2.22.0
Re: [PATCH rfc 2/3] virtio-net: support receive timestamp
On 2020/12/29 下午10:20, Willem de Bruijn wrote: On Tue, Dec 29, 2020 at 4:23 AM Jason Wang wrote: - Original Message - On Mon, Dec 28, 2020 at 7:55 PM Michael S. Tsirkin wrote: On Mon, Dec 28, 2020 at 02:30:31PM -0500, Willem de Bruijn wrote: On Mon, Dec 28, 2020 at 12:29 PM Michael S. Tsirkin wrote: On Mon, Dec 28, 2020 at 11:22:32AM -0500, Willem de Bruijn wrote: From: Willem de Bruijn Add optional PTP hardware timestamp offload for virtio-net. Accurate RTT measurement requires timestamps close to the wire. Introduce virtio feature VIRTIO_NET_F_RX_TSTAMP. If negotiated, the virtio-net header is expanded with room for a timestamp. A host may pass receive timestamps for all or some packets. A timestamp is valid if non-zero. The timestamp straddles (virtual) hardware domains. Like PTP, use international atomic time (CLOCK_TAI) as global clock base. It is guest responsibility to sync with host, e.g., through kvm-clock. Signed-off-by: Willem de Bruijn diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h index f6881b5b77ee..0ffe2eeebd4a 100644 --- a/include/uapi/linux/virtio_net.h +++ b/include/uapi/linux/virtio_net.h @@ -57,6 +57,7 @@ * Steering */ #define VIRTIO_NET_F_CTRL_MAC_ADDR 23/* Set MAC address */ +#define VIRTIO_NET_F_RX_TSTAMP 55/* Host sends TAI receive time */ #define VIRTIO_NET_F_TX_HASH 56/* Guest sends hash report */ #define VIRTIO_NET_F_HASH_REPORT 57 /* Supports hash report */ #define VIRTIO_NET_F_RSS 60/* Supports RSS RX steering */ @@ -182,6 +183,17 @@ struct virtio_net_hdr_v1_hash { }; }; +struct virtio_net_hdr_v12 { + struct virtio_net_hdr_v1 hdr; + struct { + __le32 value; + __le16 report; + __le16 flow_state; + } hash; + __virtio32 reserved; + __virtio64 tstamp; +}; + #ifndef VIRTIO_NET_NO_LEGACY /* This header comes first in the scatter-gather list. * For legacy virtio, if VIRTIO_F_ANY_LAYOUT is not negotiated, it must So it looks like VIRTIO_NET_F_RX_TSTAMP should depend on both VIRTIO_NET_F_RX_TSTAMP and VIRTIO_NET_F_HASH_REPORT then? Do you mean VIRTIO_NET_F_TX_TSTAMP depends on VIRTIO_NET_F_RX_TSTAMP? I think if either is enabled we need to enable the extended layout. Regardless of whether TX_HASH or HASH_REPORT are enabled. If they are not, then those fields are ignored. I mean do we waste the 8 bytes for hash if TSTAMP is set but HASH is not? If TSTAMP depends on HASH then point is moot. True, but the two features really are independent. I did consider using configurable metadata layout depending on features negotiated. If there are tons of optional extensions, that makes sense. But it is more complex and parsing error prone. With a handful of options each of a few bytes, that did not seem worth the cost to me at this point. Consider NFV workloads(64B) packet. Most fields of current vnet header is even a burdern. Such workloads will not negotiate these features, I imagine. I think this could only become an issue if a small packet workload would want to negotiate one optional feature, without the others. Yes. Even then, the actual cost of a few untouched bytes is negligible, as long as the headers don't span cache-lines. Right. It looks to me with hash report support the current header has exceeds 32 bytes which is the cacheline size for some archs. And it can be imagined that after two or more features it could be larger than 64 bytes. I expect it to be cheaper than having to parse a dynamic layout. The only overhead is probably analyzing "type" which should be cheap I guess. It might be the time to introduce configurable or self-descriptive vnet header. I did briefly toy with a type-val encoding. Not type-val-len, as all options have fixed length (so far), so length can be left implicit. But as each feature is negotiated before hand, the type can be left implicit too. Leaving just the data elements tightly packed. As long as order is defined. Right, so in this case there should be even no overhead. And importantly, such a mode can always be added later as a separate VIRTIO_NET_F_PACKED_HEADER option. What will do feature provide? The tightly packed data elements. Strip out the elements for non negotiated features. So if either tstamp option is negotiated, but hash is not, exclude the 64b of hash fields. Note that I'm not actually arguing for this (at this point). I second for this. Thanks
Re: [RFC v2 09/13] vduse: Add support for processing vhost iotlb message
On 2020/12/30 下午3:09, Yongji Xie wrote: On Wed, Dec 30, 2020 at 2:11 PM Jason Wang wrote: On 2020/12/29 下午6:26, Yongji Xie wrote: On Tue, Dec 29, 2020 at 5:11 PM Jason Wang wrote: - Original Message - On Mon, Dec 28, 2020 at 4:43 PM Jason Wang wrote: On 2020/12/28 下午4:14, Yongji Xie wrote: I see. So all the above two questions are because VHOST_IOTLB_INVALIDATE is expected to be synchronous. This need to be solved by tweaking the current VDUSE API or we can re-visit to go with descriptors relaying first. Actually all vdpa related operations are synchronous in current implementation. The ops.set_map/dma_map/dma_unmap should not return until the VDUSE_UPDATE_IOTLB/VDUSE_INVALIDATE_IOTLB message is replied by userspace. Could it solve this problem? I was thinking whether or not we need to generate IOTLB_INVALIDATE message to VDUSE during dma_unmap (vduse_dev_unmap_page). If we don't, we're probably fine. It seems not feasible. This message will be also used in the virtio-vdpa case to notify userspace to unmap some pages during consistent dma unmapping. Maybe we can document it to make sure the users can handle the message correctly. Just to make sure I understand your point. Do you mean you plan to notify the unmap of 1) streaming DMA or 2) coherent DMA? For 1) you probably need a workqueue to do that since dma unmap can be done in irq or bh context. And if usrspace does't do the unmap, it can still access the bounce buffer (if you don't zap pte)? I plan to do it in the coherent DMA case. Any reason for treating coherent DMA differently? Now the memory of the bounce buffer is allocated page by page in the page fault handler. So it can't be used in coherent DMA mapping case which needs some memory with contiguous virtual addresses. I can use vmalloc() to do allocation for the bounce buffer instead. But it might cause some memory waste. Any suggestion? I may miss something. But I don't see a relationship between the IOTLB_UNMAP and vmalloc(). It's true that userspace can access the dma buffer if userspace doesn't do the unmap. But the dma pages would not be freed and reused unless user space called munmap() for them. I wonder whether or not we could recycle IOVA in this case to avoid the IOTLB_UMAP message. We can achieve that if we use vmalloc() to do allocation for the bounce buffer which can be used in coherent DMA mapping case. But looks like we still have no way to avoid the IOTLB_UMAP message in vhost-vdpa case. I think that's fine. For virtio-vdpa, from VDUSE userspace perspective, it works like a driver that is using SWIOTLB in this case. Thanks Thanks, Yongji
Re: [PATCH rfc 2/3] virtio-net: support receive timestamp
On 2020/12/29 上午8:57, Willem de Bruijn wrote: On Mon, Dec 28, 2020 at 5:59 PM Jakub Kicinski wrote: On Mon, 28 Dec 2020 11:22:32 -0500 Willem de Bruijn wrote: From: Willem de Bruijn Add optional PTP hardware timestamp offload for virtio-net. Accurate RTT measurement requires timestamps close to the wire. Introduce virtio feature VIRTIO_NET_F_RX_TSTAMP. If negotiated, the virtio-net header is expanded with room for a timestamp. A host may pass receive timestamps for all or some packets. A timestamp is valid if non-zero. The timestamp straddles (virtual) hardware domains. Like PTP, use international atomic time (CLOCK_TAI) as global clock base. It is guest responsibility to sync with host, e.g., through kvm-clock. Would this not be confusing to some user space SW to have a NIC with no PHC deliver HW stamps? I'd CC Richard on this, unless you already discussed with him offline. Thanks, good point. I should have included Richard. There is a well understood method for synchronizing guest and host clock in KVM using ptp_kvm. For virtual environments without NIC hardware offload, the when host timestamps in software, this suffices. Syncing host with NIC is assumed if the host advertises the feature and implements using real hardware timestamps. Or it could be useful for virtio hardware when there's no KVM that provides PTP. Thanks
[PATCH] Allow user to set metric on default route learned via Router Advertisement.
Allow user to set metric on default route learned via Router Advertisement. Not: RFC 4191 does not say anything for metric for IPv6 default route. Fix: For IPv4, default route is learned via DHCPv4 and user is allowed to change metric using config etc/network/interfaces. But for IPv6, default route can be learned via RA, for which, currently a fixed metric value 1024 is used. Ideally, user should be able to configure metric on default route for IPv6 similar to IPv4. This fix adds sysctl for the same. Logs: For IPv4: Config in etc/network/interfaces ``` auto eth0 iface eth0 inet dhcp metric 4261413864 ``` IPv4 Kernel Route Table: ``` $ sudo route -n Kernel IP routing table Destination Gateway Genmask Flags Metric RefUse Iface 0.0.0.0 172.11.44.1 0.0.0.0 UG-33553432 00 eth0 ``` FRR Table, if default route is learned via routing protocol too. ``` # show ip route Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, P - PIM, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP, > - selected route, * - FIB route S>* 0.0.0.0/0 [20/0] is directly connected, eth0, 00:00:03 K 0.0.0.0/0 [254/1000] via 172.21.47.1, eth0, 6d08h51m ``` i.e. User can prefer Default Router learned via Routing Protocol, Similar behavior is not possible for IPv6, without this fix. After fix [for IPv6]: ``` sudo sysctl -w net.ipv6.conf.eth0.net.ipv6.conf.eth0.accept_ra_defrtr_metric=0x770003e9 ``` IP monitor: ``` default via fe80::be16:65ff:feb3:ce8e dev eth0 proto ra metric 1996489705 pref high ``` Kernel IPv6 routing table ``` DestinationNext Hop Flag Met Ref Use If ::/0 fe80::be16:65ff:feb3:ce8e UGDAe 1996489705 0 0 eth0 ``` FRR Routing Table, if default route is learned via routing protocol. # show ipv6 route Codes: K - kernel route, C - connected, S - static, R - RIPng, O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP, > - selected route, * - FIB route S>* ::/0 [20/0] is directly connected, eth0, 00:00:06 K ::/0 [119/1001] via fe80::be16:65ff:feb3:ce8e, eth0, 6d07h43m Praveen Chaudhary (1): Allow user to set metric on default route learned via Router Advertisement. Documentation/networking/ip-sysctl.rst | 8 include/linux/ipv6.h | 1 + include/net/ip6_route.h| 3 ++- include/uapi/linux/ipv6.h | 1 + include/uapi/linux/sysctl.h| 1 + net/ipv6/addrconf.c| 10 ++ net/ipv6/ndisc.c | 15 +++ net/ipv6/route.c | 8 +--- 8 files changed, 39 insertions(+), 8 deletions(-) -- 2.7.4
[PATCH] Allow user to set metric on default route learned via Router Advertisement.
Allow user to set metric on default route learned via Router Advertisement. Not: RFC 4191 does not say anything for metric for IPv6 default route. Fix: For IPv4, default route is learned via DHCPv4 and user is allowed to change metric using config etc/network/interfaces. But for IPv6, default route can be learned via RA, for which, currently a fixed metric value 1024 is used. Ideally, user should be able to configure metric on default route for IPv6 similar to IPv4. This fix adds sysctl for the same. Logs: For IPv4: Config in etc/network/interfaces ``` auto eth0 iface eth0 inet dhcp metric 4261413864 ``` IPv4 Kernel Route Table: ``` $ sudo route -n Kernel IP routing table Destination Gateway Genmask Flags Metric RefUse Iface 0.0.0.0 172.11.44.1 0.0.0.0 UG-33553432 00 eth0 ``` FRR Table, if default route is learned via routing protocol too. ``` # show ip route Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, P - PIM, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP, > - selected route, * - FIB route S>* 0.0.0.0/0 [20/0] is directly connected, eth0, 00:00:03 K 0.0.0.0/0 [254/1000] via 172.21.47.1, eth0, 6d08h51m ``` i.e. User can prefer Default Router learned via Routing Protocol, Similar behavior is not possible for IPv6, without this fix. After fix [for IPv6]: ``` sudo sysctl -w net.ipv6.conf.eth0.net.ipv6.conf.eth0.accept_ra_defrtr_metric=0x770003e9 ``` IP monitor: ``` default via fe80::be16:65ff:feb3:ce8e dev eth0 proto ra metric 1996489705 pref high ``` Kernel IPv6 routing table ``` DestinationNext Hop Flag Met Ref Use If ::/0 fe80::be16:65ff:feb3:ce8e UGDAe 1996489705 0 0 eth0 ``` FRR Routing Table, if default route is learned via routing protocol. # show ipv6 route Codes: K - kernel route, C - connected, S - static, R - RIPng, O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP, > - selected route, * - FIB route S>* ::/0 [20/0] is directly connected, eth0, 00:00:06 K ::/0 [119/1001] via fe80::be16:65ff:feb3:ce8e, eth0, 6d07h43m Praveen Chaudhary (1): Allow user to set metric on default route learned via Router Advertisement. Documentation/networking/ip-sysctl.rst | 8 include/linux/ipv6.h | 1 + include/net/ip6_route.h| 3 ++- include/uapi/linux/ipv6.h | 1 + include/uapi/linux/sysctl.h| 1 + net/ipv6/addrconf.c| 10 ++ net/ipv6/ndisc.c | 15 +++ net/ipv6/route.c | 8 +--- 8 files changed, 39 insertions(+), 8 deletions(-) -- 2.7.4
[PATCH] Allow user to set metric on default route learned via Router Advertisement.
Fix: For IPv4, default route is learned via DHCPv4 and user is allowed to change metric using config etc/network/interfaces. But for IPv6, default route can be learned via RA, for which, currently a fixed metric value 1024 is used. Ideally, user should be able to configure metric on default route for IPv6 similar to IPv4. This fix adds sysctl for the same. Signed-off-by: Praveen Chaudhary Signed-off-by: Zhenggen Xu --- Documentation/networking/ip-sysctl.rst | 8 include/linux/ipv6.h | 1 + include/net/ip6_route.h| 3 ++- include/uapi/linux/ipv6.h | 1 + include/uapi/linux/sysctl.h| 1 + net/ipv6/addrconf.c| 10 ++ net/ipv6/ndisc.c | 15 +++ net/ipv6/route.c | 8 +--- 8 files changed, 39 insertions(+), 8 deletions(-) diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index dd2b12a..073c1f3 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -1871,6 +1871,14 @@ accept_ra_defrtr - BOOLEAN - enabled if accept_ra is enabled. - disabled if accept_ra is disabled. +accept_ra_defrtr_metric - INTEGER + Metric for default router learned in Router Advertisement. + + Functional default: + + * 0 if accept_ra_defrtr is enabled. + * Ignored, if accept_ra_defrtr is enabled. + accept_ra_from_local - BOOLEAN Accept RA with source-address that is found on local machine if the RA is otherwise proper and able to be accepted. diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h index dda61d1..19af90c 100644 --- a/include/linux/ipv6.h +++ b/include/linux/ipv6.h @@ -31,6 +31,7 @@ struct ipv6_devconf { __s32 max_desync_factor; __s32 max_addresses; __s32 accept_ra_defrtr; + __s32 accept_ra_defrtr_metric; __s32 accept_ra_min_hop_limit; __s32 accept_ra_pinfo; __s32 ignore_routes_with_linkdown; diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h index 2a52777..a470bda 100644 --- a/include/net/ip6_route.h +++ b/include/net/ip6_route.h @@ -174,7 +174,8 @@ struct fib6_info *rt6_get_dflt_router(struct net *net, struct net_device *dev); struct fib6_info *rt6_add_dflt_router(struct net *net, const struct in6_addr *gwaddr, -struct net_device *dev, unsigned int pref); +struct net_device *dev, unsigned int pref, +unsigned int defrtr_usr_metric); void rt6_purge_dflt_routers(struct net *net); diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h index 13e8751..945de5d 100644 --- a/include/uapi/linux/ipv6.h +++ b/include/uapi/linux/ipv6.h @@ -189,6 +189,7 @@ enum { DEVCONF_ACCEPT_RA_RT_INFO_MIN_PLEN, DEVCONF_NDISC_TCLASS, DEVCONF_RPL_SEG_ENABLED, + DEVCONF_ACCEPT_RA_DEFRTR_METRIC, DEVCONF_MAX }; diff --git a/include/uapi/linux/sysctl.h b/include/uapi/linux/sysctl.h index 458179d..5e79c19 100644 --- a/include/uapi/linux/sysctl.h +++ b/include/uapi/linux/sysctl.h @@ -571,6 +571,7 @@ enum { NET_IPV6_ACCEPT_SOURCE_ROUTE=25, NET_IPV6_ACCEPT_RA_FROM_LOCAL=26, NET_IPV6_ACCEPT_RA_RT_INFO_MIN_PLEN=27, + NET_IPV6_ACCEPT_RA_DEFRTR_METRIC=28, __NET_IPV6_MAX }; diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index eff2cac..702ec4a 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -205,6 +205,7 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = { .max_desync_factor = MAX_DESYNC_FACTOR, .max_addresses = IPV6_MAX_ADDRESSES, .accept_ra_defrtr = 1, + .accept_ra_defrtr_metric = 0, .accept_ra_from_local = 0, .accept_ra_min_hop_limit= 1, .accept_ra_pinfo= 1, @@ -260,6 +261,7 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = { .max_desync_factor = MAX_DESYNC_FACTOR, .max_addresses = IPV6_MAX_ADDRESSES, .accept_ra_defrtr = 1, + .accept_ra_defrtr_metric = 0, .accept_ra_from_local = 0, .accept_ra_min_hop_limit= 1, .accept_ra_pinfo= 1, @@ -5475,6 +5477,7 @@ static inline void ipv6_store_devconf(struct ipv6_devconf *cnf, array[DEVCONF_MAX_DESYNC_FACTOR] = cnf->max_desync_factor; array[DEVCONF_MAX_ADDRESSES] = cnf->max_addresses; array[DEVCONF_ACCEPT_RA_DEFRTR] = cnf->accept_ra_defrtr; + array[DEVCONF_ACCEPT_RA_DEFRTR_METRIC] = cnf->accept_ra_defrtr_metric; array[DEVCONF_ACCEPT_RA_MIN_HOP_LIMIT] = cnf->accept_ra_min_hop_limit; array[DEVCONF_ACCEPT_RA_PINFO] = cnf->acc
Re: BTFIDS: FAILED unresolved symbol udp6_sock
On Tue, Dec 29, 2020 at 11:28:35PM +, Qais Yousef wrote: > Hi Jiri > > On 12/29/20 18:34, Jiri Olsa wrote: > > On Tue, Dec 29, 2020 at 03:13:52PM +, Qais Yousef wrote: > > > Hi > > > > > > When I enable CONFIG_DEBUG_INFO_BTF I get the following error in the > > > BTFIDS > > > stage > > > > > > FAILED unresolved symbol udp6_sock > > > > > > I cross compile for arm64. My .config is attached. > > > > > > I managed to reproduce the problem on v5.9 and v5.10. Plus 5.11-rc1. > > > > > > Have you seen this before? I couldn't find a specific report about this > > > problem. > > > > > > Let me know if you need more info. > > > > hi, > > this looks like symptom of the gcc DWARF bug we were > > dealing with recently: > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97060 > > > > https://lore.kernel.org/lkml/CAE1WUT75gu9G62Q9uAALGN6vLX=o7vz9uhqtvwnbuv81dgm...@mail.gmail.com/#r > > > > what pahole/gcc version are you using? > > I'm on gcc 9.3.0 > > aarch64-linux-gnu-gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 > > I was on pahole v1.17. I moved to v1.19 but I still see the same problem. I can reproduce with your .config, but make 'defconfig' works, so I guess it's some config option issue, I'll check later today jirka
Re: Registering IRQ for MT7530 internal PHYs
Hi Heiner, Thanks for your reply. On Wed, Dec 30, 2020 at 3:39 PM Heiner Kallweit wrote: > I don't think that's the best option. I'm well aware of that. > You may want to add a PHY driver for your chip. Supposedly it > supports at least PHY suspend/resume. You can use the RTL8366RB > PHY driver as template. There's no MediaTek PHY driver yet. Do we really need a new one just for the interrupts? > > + dev_info_ratelimited(priv->dev, "interrupt status: 0x%08x\n", val); > > + dev_info_ratelimited(priv->dev, "interrupt enable: 0x%08x\n", > > mt7530_read(priv, MT7530_SYS_INT_EN)); > > + > This is debug code to be removed in the final version? Yes. > > + for (phy = 0; phy < MT7530_NUM_PHYS; phy++) { > > + if (val & BIT(phy)) { > > + unsigned int child_irq; > > + > > + child_irq = irq_find_mapping(priv->irq_domain, phy); > > + handle_nested_irq(child_irq); > > + handled = true; > > + } > > + } > > + > > + return handled ? IRQ_HANDLED : IRQ_NONE; > > IRQ_RETVAL() could be used here. Good to know :) > > > +} > > + > > +static void mt7530_irq_mask(struct irq_data *d) > > +{ > > + struct mt7530_priv *priv = irq_data_get_irq_chip_data(d); > > + > > + priv->irq_enable &= ~BIT(d->hwirq); > > Here you don't actually do something. HW doesn't support masking > interrupt generation for a port? priv->irq_enable will be written to MT7530_SYS_INT_EN in mt7530_irq_bus_sync_unlock. You can think of it as an inverted mask.
Re: Registering IRQ for MT7530 internal PHYs
On 30.12.2020 10:07, DENG Qingfang wrote: > Hi Heiner, > Thanks for your reply. > > On Wed, Dec 30, 2020 at 3:39 PM Heiner Kallweit wrote: >> I don't think that's the best option. > > I'm well aware of that. > >> You may want to add a PHY driver for your chip. Supposedly it >> supports at least PHY suspend/resume. You can use the RTL8366RB >> PHY driver as template. > > There's no MediaTek PHY driver yet. Do we really need a new one just > for the interrupts? > Not only for the interrupts. The genphy driver e.g. doesn't support PHY suspend/resume. And the PHY driver needs basically no code, just set the proper callbacks. >>> + dev_info_ratelimited(priv->dev, "interrupt status: 0x%08x\n", val); >>> + dev_info_ratelimited(priv->dev, "interrupt enable: 0x%08x\n", >>> mt7530_read(priv, MT7530_SYS_INT_EN)); >>> + >> This is debug code to be removed in the final version? > > Yes. > >>> + for (phy = 0; phy < MT7530_NUM_PHYS; phy++) { >>> + if (val & BIT(phy)) { >>> + unsigned int child_irq; >>> + >>> + child_irq = irq_find_mapping(priv->irq_domain, phy); >>> + handle_nested_irq(child_irq); >>> + handled = true; >>> + } >>> + } >>> + >>> + return handled ? IRQ_HANDLED : IRQ_NONE; >> >> IRQ_RETVAL() could be used here. > > Good to know :) > >> >>> +} >>> + >>> +static void mt7530_irq_mask(struct irq_data *d) >>> +{ >>> + struct mt7530_priv *priv = irq_data_get_irq_chip_data(d); >>> + >>> + priv->irq_enable &= ~BIT(d->hwirq); >> >> Here you don't actually do something. HW doesn't support masking >> interrupt generation for a port? > > priv->irq_enable will be written to MT7530_SYS_INT_EN in > mt7530_irq_bus_sync_unlock. You can think of it as an inverted mask. >
[PATCH v3 net-next] net: kcm: Replace fput with sockfd_put
The function sockfd_lookup uses fget on the value that is stored in the file field of the returned structure, so fput should ultimately be applied to this value. This can be done directly, but it seems better to use the specific macro sockfd_put, which does the same thing. Perform a source code refactoring by using the following semantic patch. // @@ expression s; @@ s = sockfd_lookup(...) ... + sockfd_put(s); - fput(s->file); // Signed-off-by: Zheng Yongjun --- net/kcm/kcmsock.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c index 56dad9565bc9..a9eb616f5521 100644 --- a/net/kcm/kcmsock.c +++ b/net/kcm/kcmsock.c @@ -1496,7 +1496,7 @@ static int kcm_attach_ioctl(struct socket *sock, struct kcm_attach *info) return 0; out: - fput(csock->file); + sockfd_put(csock); return err; } @@ -1644,7 +1644,7 @@ static int kcm_unattach_ioctl(struct socket *sock, struct kcm_unattach *info) spin_unlock_bh(&mux->lock); out: - fput(csock->file); + sockfd_put(csock); return err; } -- 2.22.0
Re: [PATCH] mlx4: style: replace zero-length array with flexible-array member.
On 12/30/2020 8:28 AM, YANG LI wrote: There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use "flexible array members"[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.9/process/ deprecated.html#zero-length-and-one-element-arrays Signed-off-by: YANG LI Reported-by: Abaci --- drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h index e8ed2319..4029a8b 100644 --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h @@ -314,7 +314,7 @@ struct mlx4_en_tx_ring { struct mlx4_en_rx_desc { /* actual number of entries depends on rx ring stride */ - struct mlx4_wqe_data_seg data[0]; + struct mlx4_wqe_data_seg data[]; }; struct mlx4_en_rx_ring { Reviewed-by: Tariq Toukan Thanks.
Re: Registering IRQ for MT7530 internal PHYs
Hi Qingfang, On 2020-12-30 04:22, DENG Qingfang wrote: Hi, I added MT7530 IRQ support and registered its internal PHYs to IRQ. It works but my patch used two hacks. 1. Removed phy_drv_supports_irq check, because config_intr and handle_interrupt are not set for Generic PHY. 2. Allocated ds->slave_mii_bus before calling ds->ops->setup, because we cannot call dsa_slave_mii_bus_init which is private. Any better ideas? Regards, Qingfang diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c index a67cac15a724..d59a8c50ede3 100644 --- a/drivers/net/dsa/mt7530.c +++ b/drivers/net/dsa/mt7530.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -1639,6 +1640,125 @@ mtk_get_tag_protocol(struct dsa_switch *ds, int port, } } +static irqreturn_t +mt7530_irq(int irq, void *data) +{ + struct mt7530_priv *priv = data; + bool handled = false; + int phy; + u32 val; + + val = mt7530_read(priv, MT7530_SYS_INT_STS); + mt7530_write(priv, MT7530_SYS_INT_STS, val); If that is an ack operation, it should be dealt with as such in an irqchip callback instead of being open-coded here. + + dev_info_ratelimited(priv->dev, "interrupt status: 0x%08x\n", val); + dev_info_ratelimited(priv->dev, "interrupt enable: 0x%08x\n", mt7530_read(priv, MT7530_SYS_INT_EN)); + I don't think printing these from an interrupt handler is a good idea. Use the debug primitives if you really want these messages. + for (phy = 0; phy < MT7530_NUM_PHYS; phy++) { + if (val & BIT(phy)) { + unsigned int child_irq; + + child_irq = irq_find_mapping(priv->irq_domain, phy); + handle_nested_irq(child_irq); + handled = true; + } + } + + return handled ? IRQ_HANDLED : IRQ_NONE; +} What is the reason for not implementing this as a chained interrupt flow? + +static void mt7530_irq_mask(struct irq_data *d) +{ + struct mt7530_priv *priv = irq_data_get_irq_chip_data(d); + + priv->irq_enable &= ~BIT(d->hwirq); +} + +static void mt7530_irq_unmask(struct irq_data *d) +{ + struct mt7530_priv *priv = irq_data_get_irq_chip_data(d); + + priv->irq_enable |= BIT(d->hwirq); +} + +static void mt7530_irq_bus_lock(struct irq_data *d) +{ + struct mt7530_priv *priv = irq_data_get_irq_chip_data(d); + + mutex_lock(&priv->reg_mutex); Are you always guaranteed to be in a thread context? I guess that is the case, given that you request a threaded interrupt, but it would be worth documenting. +} + +static void mt7530_irq_bus_sync_unlock(struct irq_data *d) +{ + struct mt7530_priv *priv = irq_data_get_irq_chip_data(d); + + mt7530_write(priv, MT7530_SYS_INT_EN, priv->irq_enable); + mutex_unlock(&priv->reg_mutex); +} + +static struct irq_chip mt7530_irq_chip = { + .name = MT7530_NAME, + .irq_mask = mt7530_irq_mask, + .irq_unmask = mt7530_irq_unmask, + .irq_bus_lock = mt7530_irq_bus_lock, + .irq_bus_sync_unlock = mt7530_irq_bus_sync_unlock, +}; + +static int +mt7530_irq_map(struct irq_domain *domain, unsigned int irq, + irq_hw_number_t hwirq) +{ + irq_set_chip_data(irq, domain->host_data); + irq_set_chip_and_handler(irq, &mt7530_irq_chip, handle_simple_irq); + irq_set_noprobe(irq); + + return 0; +} + +static const struct irq_domain_ops mt7530_irq_domain_ops = { + .map = mt7530_irq_map, + .xlate = irq_domain_xlate_onecell, +}; + +static void +mt7530_irq_init(struct mt7530_priv *priv) +{ + struct mii_bus *bus = priv->ds->slave_mii_bus; + struct device *dev = priv->dev; + struct device_node *np = dev->of_node; + int parent_irq; + int phy, ret; + + parent_irq = of_irq_get(np, 0); + if (parent_irq <= 0) { + dev_err(dev, "failed to get parent IRQ: %d\n", parent_irq); + return; It seems odd not to propagate the error, since I assume the device will not be functional. + } + + mt7530_set(priv, MT7530_TOP_SIG_CTRL, TOP_SIG_CTRL_NORMAL); + ret = devm_request_threaded_irq(dev, parent_irq, NULL, mt7530_irq, + IRQF_ONESHOT, MT7530_NAME, priv); + if (ret) { + dev_err(dev, "failed to request IRQ: %d\n", ret); + return; + } + + priv->irq_domain = irq_domain_add_linear(np, MT7530_NUM_PHYS, + &mt7530_irq_domain_ops, priv); The creation order is... interesting. You have a handler ready to fire, nothing seems to initialise the HW state, and the priv data structure is not fully populated. If any interrupt is pending at this stage, you have a panic in your hands. + if (!priv->irq_domain) { + dev_err(dev, "failed to create IRQ domain\n"); + return; + } + +
Re: [PATCH 07/21] vdpa: multiple address spaces support
On Wed, Dec 30, 2020 at 12:04:30PM +0800, Jason Wang wrote: > > On 2020/12/29 下午3:28, Eli Cohen wrote: > > > @@ -43,6 +43,8 @@ struct vdpa_vq_state { > > >* @index: device index > > >* @features_valid: were features initialized? for legacy guests > > >* @nvqs: the number of virtqueues > > > + * @ngroups: the number of virtqueue groups > > > + * @nas: the number of address spaces > > I am not sure these can be categorised as part of the state of the VQ. > > It's more of a property so maybe we can have a callback to get the > > properties of the VQ? > > > Or maybe there's a misunderstanding of the patch. > Yes, I misinterpreted the hunk. No issue here. > Those two attributes belongs to vdpa_device instead of vdpa_vq_state > actually. > > Thanks >
[PATCH] selftests: xfrm: fix test return value override issue in xfrm_policy.sh
When running this xfrm_policy.sh test script, even with some cases marked as FAIL, the overall test result will still be PASS: $ sudo ./xfrm_policy.sh PASS: policy before exception matches FAIL: expected ping to .254 to fail (exceptions) PASS: direct policy matches (exceptions) PASS: policy matches (exceptions) FAIL: expected ping to .254 to fail (exceptions and block policies) PASS: direct policy matches (exceptions and block policies) PASS: policy matches (exceptions and block policies) FAIL: expected ping to .254 to fail (exceptions and block policies after hresh changes) PASS: direct policy matches (exceptions and block policies after hresh changes) PASS: policy matches (exceptions and block policies after hresh changes) FAIL: expected ping to .254 to fail (exceptions and block policies after hthresh change in ns3) PASS: direct policy matches (exceptions and block policies after hthresh change in ns3) PASS: policy matches (exceptions and block policies after hthresh change in ns3) FAIL: expected ping to .254 to fail (exceptions and block policies after htresh change to normal) PASS: direct policy matches (exceptions and block policies after htresh change to normal) PASS: policy matches (exceptions and block policies after htresh change to normal) PASS: policies with repeated htresh change $ echo $? 0 This is because the $lret in check_xfrm() is not a local variable. Therefore when a test failed in check_exceptions(), the non-zero $lret will later get reset to 0 when the next test calls check_xfrm(). With this fix, the final return value will be 1. Make it easier for testers to spot this failure. Fixes: 39aa6928d462d0 ("xfrm: policy: fix netlink/pf_key policy lookups") Signed-off-by: Po-Hsu Lin --- tools/testing/selftests/net/xfrm_policy.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/xfrm_policy.sh b/tools/testing/selftests/net/xfrm_policy.sh index 7a1bf94..5922941 100755 --- a/tools/testing/selftests/net/xfrm_policy.sh +++ b/tools/testing/selftests/net/xfrm_policy.sh @@ -202,7 +202,7 @@ check_xfrm() { # 1: iptables -m policy rule count != 0 rval=$1 ip=$2 - lret=0 + local lret=0 ip netns exec ns1 ping -q -c 1 10.0.2.$ip > /dev/null -- 2.7.4
Re: [PATCH 12/21] vhost-vdpa: introduce uAPI to get the number of virtqueue groups
On Wed, Dec 16, 2020 at 02:48:09PM +0800, Jason Wang wrote: > Follows the vDPA support for multiple address spaces, this patch > introduce uAPI for the userspace to know the number of virtqueue > groups supported by the vDPA device. > > Signed-off-by: Jason Wang > --- > drivers/vhost/vdpa.c | 4 > include/uapi/linux/vhost.h | 3 +++ > 2 files changed, 7 insertions(+) > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c > index 060d5b5b7e64..1ba5901b28e7 100644 > --- a/drivers/vhost/vdpa.c > +++ b/drivers/vhost/vdpa.c > @@ -536,6 +536,10 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep, > case VHOST_VDPA_GET_VRING_NUM: > r = vhost_vdpa_get_vring_num(v, argp); > break; > + case VHOST_VDPA_GET_GROUP_NUM: > + r = copy_to_user(argp, &v->vdpa->ngroups, > + sizeof(v->vdpa->ngroups)); > + break; Is this and other ioctls already supported in qemu? > case VHOST_SET_LOG_BASE: > case VHOST_SET_LOG_FD: > r = -ENOIOCTLCMD; > diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h > index 59c6c0fbaba1..8a4e6e426bbf 100644 > --- a/include/uapi/linux/vhost.h > +++ b/include/uapi/linux/vhost.h > @@ -145,4 +145,7 @@ > /* Get the valid iova range */ > #define VHOST_VDPA_GET_IOVA_RANGE_IOR(VHOST_VIRTIO, 0x78, \ >struct vhost_vdpa_iova_range) > +/* Get the number of virtqueue groups. */ > +#define VHOST_VDPA_GET_GROUP_NUM _IOR(VHOST_VIRTIO, 0x79, unsigned int) > + > #endif > -- > 2.25.1 >
Re: Re: [RFC v2 09/13] vduse: Add support for processing vhost iotlb message
On Wed, Dec 30, 2020 at 4:41 PM Jason Wang wrote: > > > On 2020/12/30 下午3:09, Yongji Xie wrote: > > On Wed, Dec 30, 2020 at 2:11 PM Jason Wang wrote: > >> > >> On 2020/12/29 下午6:26, Yongji Xie wrote: > >>> On Tue, Dec 29, 2020 at 5:11 PM Jason Wang wrote: > > - Original Message - > > On Mon, Dec 28, 2020 at 4:43 PM Jason Wang wrote: > >> On 2020/12/28 下午4:14, Yongji Xie wrote: > I see. So all the above two questions are because > VHOST_IOTLB_INVALIDATE > is expected to be synchronous. This need to be solved by tweaking the > current VDUSE API or we can re-visit to go with descriptors relaying > first. > > >>> Actually all vdpa related operations are synchronous in current > >>> implementation. The ops.set_map/dma_map/dma_unmap should not return > >>> until the VDUSE_UPDATE_IOTLB/VDUSE_INVALIDATE_IOTLB message is replied > >>> by userspace. Could it solve this problem? > >> I was thinking whether or not we need to generate IOTLB_INVALIDATE > >> message to VDUSE during dma_unmap (vduse_dev_unmap_page). > >> > >> If we don't, we're probably fine. > >> > > It seems not feasible. This message will be also used in the > > virtio-vdpa case to notify userspace to unmap some pages during > > consistent dma unmapping. Maybe we can document it to make sure the > > users can handle the message correctly. > Just to make sure I understand your point. > > Do you mean you plan to notify the unmap of 1) streaming DMA or 2) > coherent DMA? > > For 1) you probably need a workqueue to do that since dma unmap can > be done in irq or bh context. And if usrspace does't do the unmap, it > can still access the bounce buffer (if you don't zap pte)? > > >>> I plan to do it in the coherent DMA case. > >> > >> Any reason for treating coherent DMA differently? > >> > > Now the memory of the bounce buffer is allocated page by page in the > > page fault handler. So it can't be used in coherent DMA mapping case > > which needs some memory with contiguous virtual addresses. I can use > > vmalloc() to do allocation for the bounce buffer instead. But it might > > cause some memory waste. Any suggestion? > > > I may miss something. But I don't see a relationship between the > IOTLB_UNMAP and vmalloc(). > In the vmalloc() case, the coherent DMA page will be taken from the memory allocated by vmalloc(). So IOTLB_UNMAP is not needed anymore during coherent DMA unmapping because those vmalloc'ed memory which has been mapped into userspace address space during initialization can be reused. And userspace should not unmap the region until we destroy the device. > > > > >>> It's true that userspace can > >>> access the dma buffer if userspace doesn't do the unmap. But the dma > >>> pages would not be freed and reused unless user space called munmap() > >>> for them. > >> > >> I wonder whether or not we could recycle IOVA in this case to avoid the > >> IOTLB_UMAP message. > >> > > We can achieve that if we use vmalloc() to do allocation for the > > bounce buffer which can be used in coherent DMA mapping case. But > > looks like we still have no way to avoid the IOTLB_UMAP message in > > vhost-vdpa case. > > > I think that's fine. For virtio-vdpa, from VDUSE userspace perspective, > it works like a driver that is using SWIOTLB in this case. > OK, will do it in v3. Thanks, Yongji
[PATCH v2] net: macb: Correct usage of MACB_CAPS_CLK_HW_CHG flag
A new flag MACB_CAPS_CLK_HW_CHG was added and all callers of macb_set_tx_clk were gated on the presence of this flag. - if (!clk) + if (!bp->tx_clk || !(bp->caps & MACB_CAPS_CLK_HW_CHG)) However the flag was not added to anything other than the new sama7g5_gem, turning that function call into a no op for all other systems. This breaks the networking on Zynq. The commit message adding this states: a new capability so that macb_set_tx_clock() to not be called for IPs having this capability This strongly implies that present of the flag was intended to skip the function not absence of the flag. Update the if statement to this effect, which repairs the existing users. Fixes: daafa1d33cc9 ("net: macb: add capability to not set the clock rate") Suggested-by: Andrew Lunn Signed-off-by: Charles Keepax --- Changes since v1: - Updated flag semantics to skip function, as appears to have been intended by the initial commit. Thanks, Charles drivers/net/ethernet/cadence/macb_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c index d5d910916c2e8..814a5b10141d1 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -467,7 +467,7 @@ static void macb_set_tx_clk(struct macb *bp, int speed) { long ferr, rate, rate_rounded; - if (!bp->tx_clk || !(bp->caps & MACB_CAPS_CLK_HW_CHG)) + if (!bp->tx_clk || (bp->caps & MACB_CAPS_CLK_HW_CHG)) return; switch (speed) { -- 2.11.0
[PATCH net] macvlan: fix null pointer dereference in macvlan_changelink_sources()
From: Yunjian Wang Currently pointer data is dereferenced when declaring addr before pointer data is null checked. This could lead to a null pointer dereference. Fix this by checking if pointer data is null first. Fixes: 79cf79abce71 ("macvlan: add source mode") Signed-off-by: Yunjian Wang --- drivers/net/macvlan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index fb51329f8964..e412fd6b6798 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -1356,7 +1356,7 @@ static int macvlan_changelink_sources(struct macvlan_dev *vlan, u32 mode, struct nlattr *nla, *head; struct macvlan_source_entry *entry; - if (data[IFLA_MACVLAN_MACADDR]) + if (data && data[IFLA_MACVLAN_MACADDR]) addr = nla_data(data[IFLA_MACVLAN_MACADDR]); if (mode == MACVLAN_MACADDR_ADD) { -- 2.23.0
Re: [PATCH net-next] net/mlxfw: Use kzalloc for allocating only one thing
On Wed, Dec 30, 2020 at 04:18:35PM +0800, Zheng Yongjun wrote: > Use kzalloc rather than kcalloc(1,...) > > The semantic patch that makes this change is as follows: > (http://coccinelle.lip6.fr/) > > // > @@ > @@ > > - kcalloc(1, > + kzalloc( > ...) > // > > Signed-off-by: Zheng Yongjun Reviewed-by: Ido Schimmel Thanks
Re: [PATCH] CDC-NCM: remove "connected" log message
Am Dienstag, den 29.12.2020, 11:50 -0800 schrieb Roland Dreier: > > I looked at them again and found that there is a way to get > > the same effect that will make maintenance easier in the long run. > > Could I send them to you later this week for testing? > > Yes, please. I have a good test setup now so I can easily try out patches. Thank you, here we go. Regards Oliver From 7dfb5f35933ebbe12076e41606b178fa7d8d2e7b Mon Sep 17 00:00:00 2001 From: Oliver Neukum Date: Tue, 1 Dec 2020 11:31:15 +0100 Subject: [PATCH 1/2] usbnet: add method for reporting speed without MDIO The old method for reporting network speed upwards assumed that a device uses MDIO and uses the generic phy functions based on that. Add a a primitive internal version not making the assumption reporting back directly what the status operations record. Signed-off-by: Oliver Neukum --- drivers/net/usb/usbnet.c | 30 +- include/linux/usb/usbnet.h | 7 ++- 2 files changed, 35 insertions(+), 2 deletions(-) diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c index 1447da1d5729..bcd17f6d6de6 100644 --- a/drivers/net/usb/usbnet.c +++ b/drivers/net/usb/usbnet.c @@ -944,7 +944,7 @@ EXPORT_SYMBOL_GPL(usbnet_open); * they'll probably want to use this base set. */ -int usbnet_get_link_ksettings(struct net_device *net, +int usbnet_get_link_ksettings_mdio(struct net_device *net, struct ethtool_link_ksettings *cmd) { struct usbnet *dev = netdev_priv(net); @@ -956,6 +956,32 @@ int usbnet_get_link_ksettings(struct net_device *net, return 0; } +EXPORT_SYMBOL_GPL(usbnet_get_link_ksettings_mdio); + +int usbnet_get_link_ksettings(struct net_device *net, + struct ethtool_link_ksettings *cmd) +{ + struct usbnet *dev = netdev_priv(net); + + /* the assumption that speed is equal on tx and rx + * is deeply engrained into the networking layer. + * For wireless stuff it is not true. + * We assume that rxspeed matters more. + */ + if (dev->rxspeed != SPEED_UNKNOWN) + cmd->base.speed = dev->rxspeed / 100; + else if (dev->txspeed != SPEED_UNKNOWN) + cmd->base.speed = dev->txspeed / 100; + /* if a minidriver does not record speed we try to + * fall back on MDIO + */ + else if (!dev->mii.mdio_read) + cmd->base.speed = SPEED_UNKNOWN; + else + mii_ethtool_get_link_ksettings(&dev->mii, cmd); + + return 0; +} EXPORT_SYMBOL_GPL(usbnet_get_link_ksettings); int usbnet_set_link_ksettings(struct net_device *net, @@ -1661,6 +1687,8 @@ usbnet_probe (struct usb_interface *udev, const struct usb_device_id *prod) dev->intf = udev; dev->driver_info = info; dev->driver_name = name; + dev->rxspeed = -1; /* unknown or handled by MII */ + dev->txspeed = -1; net->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats); if (!net->tstats) diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h index 88a7673894d5..f748c758f82a 100644 --- a/include/linux/usb/usbnet.h +++ b/include/linux/usb/usbnet.h @@ -53,6 +53,8 @@ struct usbnet { u32 hard_mtu; /* count any extra framing */ size_t rx_urb_size; /* size for rx urbs */ struct mii_if_info mii; + int rxspeed; /* if MII is not used */ + int txspeed; /* if MII is not used */ /* various kinds of pending driver work */ struct sk_buff_head rxq; @@ -267,8 +269,11 @@ extern void usbnet_purge_paused_rxq(struct usbnet *); extern int usbnet_get_link_ksettings(struct net_device *net, struct ethtool_link_ksettings *cmd); -extern int usbnet_set_link_ksettings(struct net_device *net, +extern int usbnet_set_link_ksettings_mdio(struct net_device *net, const struct ethtool_link_ksettings *cmd); +/* Legacy - to be used if you really need an error to be returned */ +extern int usbnet_set_link_ksettings(struct net_device *net, + const struct ethtool_link_ksettings *cmd); extern u32 usbnet_get_link(struct net_device *net); extern u32 usbnet_get_msglevel(struct net_device *); extern void usbnet_set_msglevel(struct net_device *, u32); -- 2.26.2 From 447850c0e90aef5a8bb569d3888c76032a82c7c7 Mon Sep 17 00:00:00 2001 From: Oliver Neukum Date: Tue, 1 Dec 2020 11:33:38 +0100 Subject: [PATCH 2/2] CDC-NCM: record speed in status method The driver has a status method for receiving speed updates. The framework, however, had support functions only for devices that reported their speed upon an explicit query over a MDIO interface. CDC_NCM however gets direct notifications from the device. As new support functions have become available, we shall now record such notifications and tell the usbnet framework to make direct use of them without going through the PHY layer. Signed-off-by: Oliver Neukum --- drivers/net/usb/cdc_ncm.c | 17 ++--- 1 file changed, 2 insertions(+), 15 deletions(-) diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c index 2bac57d5e8d5..78cb3da8de0b 100644 --- a/drivers/net/usb/cdc_ncm.c +++ b/drivers/net/usb/cdc_ncm.c @@ -18
[PATCH net] selftests: mlxsw: Set headroom size of correct port
From: Ido Schimmel The test was setting the headroom size of the wrong port. This was not visible because of a firmware bug that canceled this bug. Set the headroom size of the correct port, so that the test will pass with both old and new firmware versions. Fixes: bfa804784e32 ("selftests: mlxsw: Add a PFC test") Signed-off-by: Ido Schimmel Reviewed-by: Petr Machata --- tools/testing/selftests/drivers/net/mlxsw/qos_pfc.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/drivers/net/mlxsw/qos_pfc.sh b/tools/testing/selftests/drivers/net/mlxsw/qos_pfc.sh index 4d900bc1f76c..5c7700212f75 100755 --- a/tools/testing/selftests/drivers/net/mlxsw/qos_pfc.sh +++ b/tools/testing/selftests/drivers/net/mlxsw/qos_pfc.sh @@ -230,7 +230,7 @@ switch_create() __mlnx_qos -i $swp4 --pfc=0,1,0,0,0,0,0,0 >/dev/null # PG0 will get autoconfigured to Xoff, give PG1 arbitrarily 100K, which # is (-2*MTU) about 80K of delay provision. - __mlnx_qos -i $swp3 --buffer_size=0,$_100KB,0,0,0,0,0,0 >/dev/null + __mlnx_qos -i $swp4 --buffer_size=0,$_100KB,0,0,0,0,0,0 >/dev/null # bridges # --- -- 2.29.2
Re: [PATCH] xfrm: Fix wraparound in xfrm_policy_addr_delta()
On Tue, Dec 29, 2020 at 05:01:27PM +0100, Florian Westphal wrote: > Visa Hankala wrote: > > Use three-way comparison for address elements to avoid integer > > wraparound in the result of xfrm_policy_addr_delta(). > > > > This ensures that the search trees are built and traversed correctly > > when the difference between compared address elements is larger > > than INT_MAX. > > Please provide an update to tools/testing/selftests/net/xfrm_policy.sh > that shows that this is a problem. I will do that in the next revision. > > switch (family) { > > case AF_INET: > > - if (sizeof(long) == 4 && prefixlen == 0) > > - return ntohl(a->a4) - ntohl(b->a4); > > - return (ntohl(a->a4) & ((~0UL << (32 - prefixlen - > > - (ntohl(b->a4) & ((~0UL << (32 - prefixlen; > > + mask = ~0U << (32 - prefixlen); > > + ma = ntohl(a->a4) & mask; > > + mb = ntohl(b->a4) & mask; > > This is suspicious. Is prefixlen == 0 impossible? > > If not, then after patch > mask = ~0U << 32; > > ... and function returns 0. prefixlen == 0 is possible. However, I now realize that left shift by 32 should be avoided with 32-bit integers. With prefixlen == 0, there is only one equivalence class, so returning 0 seems reasonable to me. Is there a reason why the function has treated /0 prefix as /32 with IPv4? IPv6 does not have this treatment.
[PATCH net] net: ipv6: Validate GSO SKB before finish IPv6 processing
There are cases where GSO segment's length exceeds the egress MTU. If so: - Consume the SKB and its segments. - Issue an ICMP packet with 'Packet Too Big' message containing the MTU, allowing the source host to reduce its Path MTU appropriately. Note: These cases are handled in the same manner in IPv4 output finish. This patch aligns the behavior of IPv6 and IPv4. Fixes: 9e50849054a4 ("netfilter: ipv6: move POSTROUTING invocation before fragmentation") Signed-off-by: Aya Levin Reviewed-by: Tariq Toukan --- net/ipv6/ip6_output.c | 41 - 1 file changed, 40 insertions(+), 1 deletion(-) Hi, Please queue to -stable >= v2.6.35. Daniel Axtens, does this solve the issue referred in your commit? 8914a595110a bnx2x: disable GSO where gso_size is too big for hardware Thanks, Aya diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 749ad72386b2..36466f2211bd 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -125,8 +125,43 @@ static int ip6_finish_output2(struct net *net, struct sock *sk, struct sk_buff * return -EINVAL; } +static int +ip6_finish_output_gso_slowpath_drop(struct net *net, struct sock *sk, + struct sk_buff *skb, unsigned int mtu) +{ + struct sk_buff *segs, *nskb; + netdev_features_t features; + int ret; + + /* Please see corresponding comment in ip_finish_output_gso +* describing the cases where GSO segment length exceeds the +* egress MTU. +*/ + features = netif_skb_features(skb); + segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK); + if (IS_ERR_OR_NULL(segs)) { + kfree_skb(skb); + return -ENOMEM; + } + + consume_skb(skb); + + skb_list_walk_safe(segs, segs, nskb) { + int err; + + skb_mark_not_on_list(segs); + err = ip6_fragment(net, sk, segs, ip6_finish_output2); + if (err && ret == 0) + ret = err; + } + + return ret; +} + static int __ip6_finish_output(struct net *net, struct sock *sk, struct sk_buff *skb) { + unsigned int mtu; + #if defined(CONFIG_NETFILTER) && defined(CONFIG_XFRM) /* Policy lookup after SNAT yielded a new policy */ if (skb_dst(skb)->xfrm) { @@ -135,7 +170,11 @@ static int __ip6_finish_output(struct net *net, struct sock *sk, struct sk_buff } #endif - if ((skb->len > ip6_skb_dst_mtu(skb) && !skb_is_gso(skb)) || + mtu = ip6_skb_dst_mtu(skb); + if (skb_is_gso(skb) && !skb_gso_validate_network_len(skb, mtu)) + return ip6_finish_output_gso_slowpath_drop(net, sk, skb, mtu); + + if ((skb->len > mtu && !skb_is_gso(skb)) || dst_allfrag(skb_dst(skb)) || (IP6CB(skb)->frag_max_size && skb->len > IP6CB(skb)->frag_max_size)) return ip6_fragment(net, sk, skb, ip6_finish_output2); -- 2.14.1
Re: [PATCH rfc 2/3] virtio-net: support receive timestamp
On Mon, Dec 28, 2020 at 07:57:20PM -0500, Willem de Bruijn wrote: > There is a well understood method for synchronizing guest and host > clock in KVM using ptp_kvm. For virtual environments without NIC > hardware offload, the when host timestamps in software, this suffices. > > Syncing host with NIC is assumed if the host advertises the feature > and implements using real hardware timestamps. Sounds reasonable. Thanks, Richard
Re: [PATCH rfc 3/3] virtio-net: support transmit timestamp
On Mon, Dec 28, 2020 at 11:22:33AM -0500, Willem de Bruijn wrote: > From: Willem de Bruijn > > Add optional delivery time (SO_TXTIME) offload for virtio-net. > > The Linux TCP/IP stack tries to avoid bursty transmission and network > congestion through pacing: computing an skb delivery time based on > congestion information. Userspace protocol implementations can achieve > the same with SO_TXTIME. This may also reduce scheduling jitter and > improve RTT estimation. This description is clear, but the Subject line is confusing. It made me wonder whether this series is somehow about host/guest synchronization (but your comments do explain that that isn't the case). How about this instead? virtio-net: support future packet transmit time Thanks, Richard
Re: [PATCH] xfrm: Fix wraparound in xfrm_policy_addr_delta()
Visa Hankala wrote: > On Tue, Dec 29, 2020 at 05:01:27PM +0100, Florian Westphal wrote: > > This is suspicious. Is prefixlen == 0 impossible? > > > > If not, then after patch > > mask = ~0U << 32; > > > > ... and function returns 0. > > With prefixlen == 0, there is only one equivalence class, so > returning 0 seems reasonable to me. Right, that seems reasonable indeed. > Is there a reason why the function has treated /0 prefix as /32 > with IPv4? IPv6 does not have this treatment. Not that I recall, looks like a bug.
[PATCH 2/2] net: ks8851: Register MDIO bus and the internal PHY
The KS8851 has a reduced internal PHY, which is accessible through its registers at offset 0xe4. The PHY is compatible with KS886x PHY present in Micrel switches, except the PHY ID Low/High registers are swapped. Register MDIO bus so this PHY can be detected and probed by phylib. Signed-off-by: Marek Vasut Cc: Andrew Lunn Cc: Heiner Kallweit Cc: Lukas Wunner To: netdev@vger.kernel.org --- drivers/net/ethernet/micrel/ks8851.h| 1 + drivers/net/ethernet/micrel/ks8851_common.c | 69 - 2 files changed, 67 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/micrel/ks8851.h b/drivers/net/ethernet/micrel/ks8851.h index 2b319e451121..9bed9e024d81 100644 --- a/drivers/net/ethernet/micrel/ks8851.h +++ b/drivers/net/ethernet/micrel/ks8851.h @@ -403,6 +403,7 @@ struct ks8851_net { struct regulator*vdd_reg; struct regulator*vdd_io; int gpio; + struct mii_bus *mii_bus; void(*lock)(struct ks8851_net *ks, unsigned long *flags); diff --git a/drivers/net/ethernet/micrel/ks8851_common.c b/drivers/net/ethernet/micrel/ks8851_common.c index 6fc7483aea03..c4f5cb9768ef 100644 --- a/drivers/net/ethernet/micrel/ks8851_common.c +++ b/drivers/net/ethernet/micrel/ks8851_common.c @@ -23,6 +23,7 @@ #include #include +#include #include #include "ks8851.h" @@ -983,6 +984,24 @@ static void ks8851_phy_write(struct net_device *dev, } } +static int ks8851_mdio_read(struct mii_bus *bus, int phy_id, int reg) +{ + struct ks8851_net *ks = bus->priv; + + if (phy_id != 0) + return 0x; + + return ks8851_phy_read(ks->netdev, phy_id, reg); +} + +static int ks8851_mdio_write(struct mii_bus *bus, int phy_id, int reg, u16 val) +{ + struct ks8851_net *ks = bus->priv; + + ks8851_phy_write(ks->netdev, phy_id, reg, val); + return 0; +} + /** * ks8851_read_selftest - read the selftest memory info. * @ks: The device state @@ -1046,6 +1065,42 @@ int ks8851_resume(struct device *dev) } #endif +static int ks8851_register_mdiobus(struct ks8851_net *ks, struct device *dev) +{ + struct mii_bus *mii_bus; + int ret; + + mii_bus = mdiobus_alloc(); + if (!mii_bus) + return -ENOMEM; + + mii_bus->name = "ks8851_eth_mii"; + mii_bus->read = ks8851_mdio_read; + mii_bus->write = ks8851_mdio_write; + mii_bus->priv = ks; + mii_bus->parent = dev; + mii_bus->phy_mask = ~BIT(0); + snprintf(mii_bus->id, MII_BUS_ID_SIZE, "%s", dev_name(dev)); + + ret = mdiobus_register(mii_bus); + if (ret) + goto err_mdiobus_register; + + ks->mii_bus = mii_bus; + + return 0; + +err_mdiobus_register: + mdiobus_free(mii_bus); + return ret; +} + +static void ks8851_unregister_mdiobus(struct ks8851_net *ks) +{ + mdiobus_unregister(ks->mii_bus); + mdiobus_free(ks->mii_bus); +} + int ks8851_probe_common(struct net_device *netdev, struct device *dev, int msg_en) { @@ -1104,6 +1159,8 @@ int ks8851_probe_common(struct net_device *netdev, struct device *dev, INIT_WORK(&ks->rxctrl_work, ks8851_rxctrl_work); + SET_NETDEV_DEV(netdev, dev); + /* setup EEPROM state */ ks->eeprom.data = ks; ks->eeprom.width = PCI_EEPROM_WIDTH_93C46; @@ -1120,6 +1177,10 @@ int ks8851_probe_common(struct net_device *netdev, struct device *dev, dev_info(dev, "message enable is %d\n", msg_en); + ret = ks8851_register_mdiobus(ks, dev); + if (ret) + goto err_mdio; + /* set the default message enable */ ks->msg_enable = netif_msg_init(msg_en, NETIF_MSG_DRV | NETIF_MSG_PROBE | @@ -1128,7 +1189,6 @@ int ks8851_probe_common(struct net_device *netdev, struct device *dev, skb_queue_head_init(&ks->txq); netdev->ethtool_ops = &ks8851_ethtool_ops; - SET_NETDEV_DEV(netdev, dev); dev_set_drvdata(dev, ks); @@ -1156,7 +1216,7 @@ int ks8851_probe_common(struct net_device *netdev, struct device *dev, ret = register_netdev(netdev); if (ret) { dev_err(dev, "failed to register network device\n"); - goto err_netdev; + goto err_id; } netdev_info(netdev, "revision %d, MAC %pM, IRQ %d, %s EEPROM\n", @@ -1165,8 +1225,9 @@ int ks8851_probe_common(struct net_device *netdev, struct device *dev, return 0; -err_netdev: err_id: + ks8851_unregister_mdiobus(ks); +err_mdio: if (gpio_is_valid(gpio)) gpio_set_value(gpio, 0); regulator_disable(ks->vdd_reg); @@ -1180,6 +1241,8 @@ int ks8851_remove_common(struct device *dev) { struct ks8851_net *priv = dev_get_drvdata(dev); + ks8851_unregister_mdiobus(pri
[PATCH 1/2] net: phy: micrel: Add KS8851 PHY support
The KS8851 has a reduced internal PHY, which is accessible through its registers at offset 0xe4. The PHY is compatible with KS886x PHY present in Micrel switches, except the PHY ID Low/High registers are swapped. Add PHY ID for this KS8851 PHY and use custom PHY ID mask due to the swap. Signed-off-by: Marek Vasut Cc: Andrew Lunn Cc: Heiner Kallweit Cc: Lukas Wunner To: netdev@vger.kernel.org --- drivers/net/phy/micrel.c | 9 + include/linux/micrel_phy.h | 2 ++ 2 files changed, 11 insertions(+) diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c index 54e0d75203da..ca6da128e37a 100644 --- a/drivers/net/phy/micrel.c +++ b/drivers/net/phy/micrel.c @@ -1386,6 +1386,14 @@ static struct phy_driver ksphy_driver[] = { .read_status= ksz8873mll_read_status, .suspend= genphy_suspend, .resume = genphy_resume, +}, { + .phy_id = PHY_ID_KSZ8851, + .phy_id_mask= 0xfff000ff, + .name = "Micrel KSZ8851 Ethernet MAC", + /* PHY_BASIC_FEATURES */ + .config_init= kszphy_config_init, + .suspend= genphy_suspend, + .resume = genphy_resume, }, { .phy_id = PHY_ID_KSZ886X, .phy_id_mask= MICREL_PHY_ID_MASK, @@ -1432,6 +1440,7 @@ static struct mdio_device_id __maybe_unused micrel_tbl[] = { { PHY_ID_KSZ8061, MICREL_PHY_ID_MASK }, { PHY_ID_KSZ8081, MICREL_PHY_ID_MASK }, { PHY_ID_KSZ8873MLL, MICREL_PHY_ID_MASK }, + { PHY_ID_KSZ8851, 0xfff000ff }, { PHY_ID_KSZ886X, MICREL_PHY_ID_MASK }, { PHY_ID_LAN8814, MICREL_PHY_ID_MASK }, { } diff --git a/include/linux/micrel_phy.h b/include/linux/micrel_phy.h index 416ee6dd2574..1c26e4ac0dc9 100644 --- a/include/linux/micrel_phy.h +++ b/include/linux/micrel_phy.h @@ -29,6 +29,8 @@ #define PHY_ID_KSZ9131 0x00221640 #define PHY_ID_LAN8814 0x00221660 +/* The PHY ID Low/High registers are swapped on KSZ8851 */ +#define PHY_ID_KSZ8851 0x14300022 #define PHY_ID_KSZ886X 0x00221430 #define PHY_ID_KSZ8863 0x00221435 -- 2.29.2
Re: [PATCH] selftests: xfrm: fix test return value override issue in xfrm_policy.sh
Po-Hsu Lin wrote: > When running this xfrm_policy.sh test script, even with some cases > marked as FAIL, the overall test result will still be PASS: > > $ sudo ./xfrm_policy.sh > PASS: policy before exception matches > FAIL: expected ping to .254 to fail (exceptions) > PASS: direct policy matches (exceptions) > PASS: policy matches (exceptions) > FAIL: expected ping to .254 to fail (exceptions and block policies) > PASS: direct policy matches (exceptions and block policies) > PASS: policy matches (exceptions and block policies) > FAIL: expected ping to .254 to fail (exceptions and block policies after > hresh changes) > PASS: direct policy matches (exceptions and block policies after hresh > changes) > PASS: policy matches (exceptions and block policies after hresh changes) > FAIL: expected ping to .254 to fail (exceptions and block policies after > hthresh change in ns3) > PASS: direct policy matches (exceptions and block policies after hthresh > change in ns3) > PASS: policy matches (exceptions and block policies after hthresh change in > ns3) > FAIL: expected ping to .254 to fail (exceptions and block policies after > htresh change to normal) > PASS: direct policy matches (exceptions and block policies after htresh > change to normal) > PASS: policy matches (exceptions and block policies after htresh change to > normal) > PASS: policies with repeated htresh change > $ echo $? > 0 > > This is because the $lret in check_xfrm() is not a local variable. Acked-by: Florian Westphal
Announcing Netdev 0x15
While we bid goodbye to 2020, here's some good hopeful news for 2021. cheers, jamal Forwarded Message Subject: [NetDev-People] Announcing Netdev 0x15 Date: Sun, 27 Dec 2020 15:54:31 -0700 From: Tom Herbert via people Reply-To: Tom Herbert To: peo...@netdevconf.info Hello Netdev'ers, On behalf of Netdev Society, I am very pleased to announce the Netdev 0x15 conference! The tentative dates are July 20-23, 2021 and the location will be San Francisco, California, USA. Our current plan is to hold an in person conference assuming that the conditions around Covid-19 have improved, however if conditions warrant we do have the option of holding a virtual conference again. We will try to make a definitive announcement about this at least six weeks before the conference date. Additionally, it seems likely that even if we do hold the conference in person there will be a hybrid component to accommodate remote participants that wish to opt out of attending in person. The call for papers and workshops will be made early next year, so stay tuned! Thanks, Tom ___ people mailing list peo...@netdevconf.org https://lists.netdevconf.info/cgi-bin/mailman/listinfo/people
Re: BTFIDS: FAILED unresolved symbol udp6_sock
On Wed, Dec 30, 2020 at 02:28:02PM +0100, Jiri Olsa wrote: > On Wed, Dec 30, 2020 at 10:03:37AM +0100, Jiri Olsa wrote: > > On Tue, Dec 29, 2020 at 11:28:35PM +, Qais Yousef wrote: > > > Hi Jiri > > > > > > On 12/29/20 18:34, Jiri Olsa wrote: > > > > On Tue, Dec 29, 2020 at 03:13:52PM +, Qais Yousef wrote: > > > > > Hi > > > > > > > > > > When I enable CONFIG_DEBUG_INFO_BTF I get the following error in the > > > > > BTFIDS > > > > > stage > > > > > > > > > > FAILED unresolved symbol udp6_sock > > > > > > > > > > I cross compile for arm64. My .config is attached. > > > > > > > > > > I managed to reproduce the problem on v5.9 and v5.10. Plus 5.11-rc1. > > > > > > > > > > Have you seen this before? I couldn't find a specific report about > > > > > this > > > > > problem. > > > > > > > > > > Let me know if you need more info. > > > > > > > > hi, > > > > this looks like symptom of the gcc DWARF bug we were > > > > dealing with recently: > > > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97060 > > > > > > > > https://lore.kernel.org/lkml/CAE1WUT75gu9G62Q9uAALGN6vLX=o7vz9uhqtvwnbuv81dgm...@mail.gmail.com/#r > > > > > > > > what pahole/gcc version are you using? > > > > > > I'm on gcc 9.3.0 > > > > > > aarch64-linux-gnu-gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 > > > > > > I was on pahole v1.17. I moved to v1.19 but I still see the same problem. > > > > I can reproduce with your .config, but make 'defconfig' works, > > so I guess it's some config option issue, I'll check later today > > so your .config has > CONFIG_CRYPTO_DEV_BCM_SPU=y > > and that defines 'struct device_private' which > clashes with the same struct defined in drivers/base/base.h > > so several networking structs will be doubled, like net_device: > > $ bpftool btf dump file ../vmlinux.config | grep net_device\' | grep > STRUCT > [2731] STRUCT 'net_device' size=2240 vlen=133 > [113981] STRUCT 'net_device' size=2240 vlen=133 > > each is using different 'struct device_private' when it's unwinded > > and that will confuse BTFIDS logic, becase we have multiple structs > with the same name, and we can't be sure which one to pick > > perhaps we should check on this in pahole and warn earlier with > better error message.. I'll check, but I'm not sure if pahole can > survive another hastab ;-) > > Andrii, any ideas on this? ;-) > > easy fix is the patch below that renames the bcm's structs, > it makes the kernel to compile.. but of course the new name > is probably wrong and we should push this through that code > authors also another quick fix is to switch it to module jirka
Re: BTFIDS: FAILED unresolved symbol udp6_sock
On Wed, Dec 30, 2020 at 10:03:37AM +0100, Jiri Olsa wrote: > On Tue, Dec 29, 2020 at 11:28:35PM +, Qais Yousef wrote: > > Hi Jiri > > > > On 12/29/20 18:34, Jiri Olsa wrote: > > > On Tue, Dec 29, 2020 at 03:13:52PM +, Qais Yousef wrote: > > > > Hi > > > > > > > > When I enable CONFIG_DEBUG_INFO_BTF I get the following error in the > > > > BTFIDS > > > > stage > > > > > > > > FAILED unresolved symbol udp6_sock > > > > > > > > I cross compile for arm64. My .config is attached. > > > > > > > > I managed to reproduce the problem on v5.9 and v5.10. Plus 5.11-rc1. > > > > > > > > Have you seen this before? I couldn't find a specific report about this > > > > problem. > > > > > > > > Let me know if you need more info. > > > > > > hi, > > > this looks like symptom of the gcc DWARF bug we were > > > dealing with recently: > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97060 > > > > > > https://lore.kernel.org/lkml/CAE1WUT75gu9G62Q9uAALGN6vLX=o7vz9uhqtvwnbuv81dgm...@mail.gmail.com/#r > > > > > > what pahole/gcc version are you using? > > > > I'm on gcc 9.3.0 > > > > aarch64-linux-gnu-gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 > > > > I was on pahole v1.17. I moved to v1.19 but I still see the same problem. > > I can reproduce with your .config, but make 'defconfig' works, > so I guess it's some config option issue, I'll check later today so your .config has CONFIG_CRYPTO_DEV_BCM_SPU=y and that defines 'struct device_private' which clashes with the same struct defined in drivers/base/base.h so several networking structs will be doubled, like net_device: $ bpftool btf dump file ../vmlinux.config | grep net_device\' | grep STRUCT [2731] STRUCT 'net_device' size=2240 vlen=133 [113981] STRUCT 'net_device' size=2240 vlen=133 each is using different 'struct device_private' when it's unwinded and that will confuse BTFIDS logic, becase we have multiple structs with the same name, and we can't be sure which one to pick perhaps we should check on this in pahole and warn earlier with better error message.. I'll check, but I'm not sure if pahole can survive another hastab ;-) Andrii, any ideas on this? ;-) easy fix is the patch below that renames the bcm's structs, it makes the kernel to compile.. but of course the new name is probably wrong and we should push this through that code authors jirka --- diff --git a/drivers/crypto/bcm/cipher.c b/drivers/crypto/bcm/cipher.c index 30390a7324b2..0e5537838ef3 100644 --- a/drivers/crypto/bcm/cipher.c +++ b/drivers/crypto/bcm/cipher.c @@ -42,7 +42,7 @@ /* = Device Structure == */ -struct device_private iproc_priv; +struct bcm_device_private iproc_priv; /* Parameters = */ diff --git a/drivers/crypto/bcm/cipher.h b/drivers/crypto/bcm/cipher.h index 0ad5892b445d..71281a3bdbdc 100644 --- a/drivers/crypto/bcm/cipher.h +++ b/drivers/crypto/bcm/cipher.h @@ -420,7 +420,7 @@ struct spu_hw { u32 num_chan; }; -struct device_private { +struct bcm_device_private { struct platform_device *pdev; struct spu_hw spu; @@ -467,6 +467,6 @@ struct device_private { struct mbox_chan **mbox; }; -extern struct device_private iproc_priv; +extern struct bcm_device_private iproc_priv; #endif diff --git a/drivers/crypto/bcm/util.c b/drivers/crypto/bcm/util.c index 2b304fc78059..77aeedb84055 100644 --- a/drivers/crypto/bcm/util.c +++ b/drivers/crypto/bcm/util.c @@ -348,7 +348,7 @@ char *spu_alg_name(enum spu_cipher_alg alg, enum spu_cipher_mode mode) static ssize_t spu_debugfs_read(struct file *filp, char __user *ubuf, size_t count, loff_t *offp) { - struct device_private *ipriv; + struct bcm_device_private *ipriv; char *buf; ssize_t ret, out_offset, out_count; int i;
Re: [PATCH v2 2/2] docs: networking: packet_mmap: fix old config reference
Hi Ulises, On Tue, Dec 29 2020, Ulises Alonso wrote: > Can you also replace the sentence > > "(like libpcap always does)." > > with > > "(which old libpcap versions used do)." Are you sure this change is correct? The text says: ... it requires two if you want to get packet's timestamp (like libpcap always does). I think libpcap still reads packets timestamps, though it most likely uses the newer interface for that. Maybe we should just drop the libpcap reference here? Thanks for reviewing the patch, baruch > On Tue, Dec 29, 2020 at 10:11 AM Baruch Siach wrote: > >> Before commit 889b8f964f2f ("packet: Kill CONFIG_PACKET_MMAP.") there >> used to be a CONFIG_PACKET_MMAP config symbol that depended on >> CONFIG_PACKET. The text still implies that PACKET_MMAP can be disabled. >> Remove that from the text, as well as reference to old kernel versions. >> >> Also, drop reference to broken link to information for pre 2.6.5 >> kernels. >> >> Make a slight working improvement (s/In/On/) while at it. >> >> Signed-off-by: Baruch Siach >> --- >> v2: Address comments from Jakub Kicinski and Willem de Bruijn >> >> * Don't change PACKET_MMAP >> >> * Remove mention of specific kernel versions >> >> * Don't reflow paragraphs >> >> * s/In/On/ >> --- >> Documentation/networking/packet_mmap.rst | 9 - >> 1 file changed, 4 insertions(+), 5 deletions(-) >> >> diff --git a/Documentation/networking/packet_mmap.rst >> b/Documentation/networking/packet_mmap.rst >> index f3646c80b019..500ef60b1b82 100644 >> --- a/Documentation/networking/packet_mmap.rst >> +++ b/Documentation/networking/packet_mmap.rst >> @@ -8,7 +8,7 @@ Abstract >> >> >> This file documents the mmap() facility available with the PACKET >> -socket interface on 2.4/2.6/3.x kernels. This type of sockets is used for >> +socket interface. This type of sockets is used for >> >> i) capture network traffic with utilities like tcpdump, >> ii) transmit network traffic, or any other that needs raw >> @@ -25,12 +25,12 @@ Please send your comments to >> Why use PACKET_MMAP >> === >> >> -In Linux 2.4/2.6/3.x if PACKET_MMAP is not enabled, the capture process >> is very >> +Non PACKET_MMAP capture process (plain AF_PACKET) is very >> inefficient. It uses very limited buffers and requires one system call to >> capture each packet, it requires two if you want to get packet's timestamp >> (like libpcap always does). >> >> -In the other hand PACKET_MMAP is very efficient. PACKET_MMAP provides a >> size >> +On the other hand PACKET_MMAP is very efficient. PACKET_MMAP provides a >> size >> configurable circular buffer mapped in user space that can be used to >> either >> send or receive packets. This way reading packets just needs to wait for >> them, >> most of the time there is no need to issue a single system call. >> Concerning >> @@ -252,8 +252,7 @@ PACKET_MMAP setting constraints >> >> In kernel versions prior to 2.4.26 (for the 2.4 branch) and 2.6.5 (2.6 >> branch), >> the PACKET_MMAP buffer could hold only 32768 frames in a 32 bit >> architecture or >> -16384 in a 64 bit architecture. For information on these kernel versions >> -see >> http://pusa.uv.es/~ulisses/packet_mmap/packet_mmap.pre-2.4.26_2.6.5.txt >> +16384 in a 64 bit architecture. >> >> Block size limit >> >> -- >> 2.29.2 -- ~. .~ Tk Open Systems =}ooO--U--Ooo{= - bar...@tkos.co.il - tel: +972.52.368.4656, http://www.tkos.co.il -
Re: [PATCH] ibmvnic: fix: NULL pointer dereference.
On Wed, Dec 30, 2020 at 03:23:14PM +0800, YANG LI wrote: > The error is due to dereference a null pointer in function > reset_one_sub_crq_queue(): > > if (!scrq) { > netdev_dbg(adapter->netdev, >"Invalid scrq reset. irq (%d) or msgs(%p).\n", > scrq->irq, scrq->msgs); > return -EINVAL; > } > > If the expression is true, scrq must be a null pointer and cannot > dereference. > > Signed-off-by: YANG LI > Reported-by: Abaci Fixes: 9281cf2d5840 ("ibmvnic: avoid memset null scrq msgs") > --- > drivers/net/ethernet/ibm/ibmvnic.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/drivers/net/ethernet/ibm/ibmvnic.c > b/drivers/net/ethernet/ibm/ibmvnic.c > index f302504..d7472be 100644 > --- a/drivers/net/ethernet/ibm/ibmvnic.c > +++ b/drivers/net/ethernet/ibm/ibmvnic.c > @@ -2981,9 +2981,7 @@ static int reset_one_sub_crq_queue(struct > ibmvnic_adapter *adapter, > int rc; > > if (!scrq) { > - netdev_dbg(adapter->netdev, > -"Invalid scrq reset. irq (%d) or msgs (%p).\n", > -scrq->irq, scrq->msgs); > + netdev_dbg(adapter->netdev, "Invalid scrq reset.\n"); > return -EINVAL; > } > > -- > 1.8.3.1 >
Re: [PATCH net-next v2 0/2] Add support for DSFP transceiver type
On 29-Dec-20 18:25, Andrew Lunn wrote: Hi Andrew, Following this conversation, I wrote some pseudocode checking if I'm on right path here. Please review: struct eeprom_page { u8 page_number; u8 bank_number; u16 offset; u16 data_length; u8 *data; } I'm wondering about offset and data_length, in this context. I would expect you always ask the kernel for the full page, not part of it. Even when user space asks for just part of a page. That keeps you cache management simpler. As far as I know, there may be bytes, which may change on read. For example, clear on read values in CMIS 4.0. Then, retrieving whole page every time may be incorrect. So I kept these for cases, when user asks for specific few bytes. But maybe some indicator of low/high is needed, since many pages are actually 1/2 pages? I was planning to use offset and data_length fields to indicate the available page. For example, high page will have offset 128 and data_length 128. The other thing to consider is SFF-8472 and its use of two different i2c addresses, A0h and A2h. These are different to pages and banks. I wasn't aware of that. It complicates things a bit, should we add a parameter of i2c address? So in this case page 0 will be with i2c address A0h. And if user needs page 0 from i2c address A2h, he will specify it in command line. And for other specs, this parameter will not be supported. print_human_readable() { spec_id = cache_get_page(0, 0, 0, 128)->data[0]; switch (spec_id) { case sff_: print_sff_(); case cmis_y: print_cmis_y(); default: print_hex(); } } You want to keep as much of the existing ethtool code as you can, but the basic idea looks O.K. Yes, under print_sff_(), for example, I meant using existing functions. Either as is, or refactoring according to cache requirements. getmodule_reply_cb() { if (offset || hex || bank_number || page number) print_hex(); else // if _human_readable() decoder needs more than page 00, it will // fetch it on demand print_human_readable(); } Things get interesting here. Say this is page 0, and print_human_readable() finds a bit indicating page 1 is valid. So it requests page 1. We go recursive. While deep down in print_human_readable(), we send the next netlink message and call getmodule_reply_cb() when the answer appears. I've had problems with some of the netlink code not liking recursive calls. So i suggest you try to find a different structure for the code. Try to complete the netlink call before doing the decoding. So add the page to the cache and then return. Do the decode after nlsock_sendmsg() has returned. I'm thinking about a standard-specific function, which will prefetch pages needed by print_human_readable(). It will check the standard ID, and go request pages and add them to the cache. Then, decoder kicks with already cached pages. This will eliminate recursive netlink calls. Driver It is required to implement get_module_eeprom_page() ndo, where it queries its EEPROM and copies to u8 *data array allocated by the kernel previously. The ndo has the following prototype: int get_module_eeprom_page(struct net_device *, u16 offset, u16 length, u8 page_number, u8 bank_number, u8 *data); I would include extack here, so we can get better error messages. OK, I will add extack. Thanks, Vlad
Re: BTFIDS: FAILED unresolved symbol udp6_sock
Em Wed, Dec 30, 2020 at 02:28:52PM +0100, Jiri Olsa escreveu: > On Wed, Dec 30, 2020 at 02:28:02PM +0100, Jiri Olsa wrote: > > On Wed, Dec 30, 2020 at 10:03:37AM +0100, Jiri Olsa wrote: > > > On Tue, Dec 29, 2020 at 11:28:35PM +, Qais Yousef wrote: > > > > Hi Jiri > > > > > > > > On 12/29/20 18:34, Jiri Olsa wrote: > > > > > On Tue, Dec 29, 2020 at 03:13:52PM +, Qais Yousef wrote: > > > > > > Hi > > > > > > > > > > > > When I enable CONFIG_DEBUG_INFO_BTF I get the following error in > > > > > > the BTFIDS > > > > > > stage > > > > > > > > > > > > FAILED unresolved symbol udp6_sock > > > > > > > > > > > > I cross compile for arm64. My .config is attached. > > > > > > > > > > > > I managed to reproduce the problem on v5.9 and v5.10. Plus 5.11-rc1. > > > > > > > > > > > > Have you seen this before? I couldn't find a specific report about > > > > > > this > > > > > > problem. > > > > > > > > > > > > Let me know if you need more info. > > > > > > > > > > hi, > > > > > this looks like symptom of the gcc DWARF bug we were > > > > > dealing with recently: > > > > > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97060 > > > > > > > > > > https://lore.kernel.org/lkml/CAE1WUT75gu9G62Q9uAALGN6vLX=o7vz9uhqtvwnbuv81dgm...@mail.gmail.com/#r > > > > > > > > > > what pahole/gcc version are you using? > > > > > > > > I'm on gcc 9.3.0 > > > > > > > > aarch64-linux-gnu-gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 > > > > > > > > I was on pahole v1.17. I moved to v1.19 but I still see the same > > > > problem. There are some changes post v1.19 in the git repo: [acme@five pahole]$ git log --oneline v1.19.. b688e35970600c15 (HEAD -> master) btf_encoder: fix skipping per-CPU variables at offset 0 8c009d6ce762dfc9 btf_encoder: fix BTF variable generation for kernel modules b94e97e015a94e6b dwarves: Fix compilation on 32-bit architectures 17df51c700248f02 btf_encoder: Detect kernel module ftrace addresses 06ca639505fc56c6 btf_encoder: Use address size based on ELF's class aff60970d16b909e btf_encoder: Factor filter_functions function 1e6a3fed6e52d365 (quaco/master) rpm: Fix changelog date [acme@five pahole]$ But I think these won't matter in this case :-\ - Arnaldo > > > I can reproduce with your .config, but make 'defconfig' works, > > > so I guess it's some config option issue, I'll check later today > > > > so your .config has > > CONFIG_CRYPTO_DEV_BCM_SPU=y > > > > and that defines 'struct device_private' which > > clashes with the same struct defined in drivers/base/base.h > > > > so several networking structs will be doubled, like net_device: > > > > $ bpftool btf dump file ../vmlinux.config | grep net_device\' | grep > > STRUCT > > [2731] STRUCT 'net_device' size=2240 vlen=133 > > [113981] STRUCT 'net_device' size=2240 vlen=133 > > > > each is using different 'struct device_private' when it's unwinded > > > > and that will confuse BTFIDS logic, becase we have multiple structs > > with the same name, and we can't be sure which one to pick > > > > perhaps we should check on this in pahole and warn earlier with > > better error message.. I'll check, but I'm not sure if pahole can > > survive another hastab ;-) > > > > Andrii, any ideas on this? ;-) > > > > easy fix is the patch below that renames the bcm's structs, > > it makes the kernel to compile.. but of course the new name > > is probably wrong and we should push this through that code > > authors > > also another quick fix is to switch it to module > > jirka > -- - Arnaldo
[PATCH v1] af/rvu_cgx: Fix missing check bugs in rvu_cgx.c
From: Yingjie Wang In rvu_mbox_handler_cgx_mac_addr_get() and rvu_mbox_handler_cgx_mac_addr_set(), the msg is expected only from PFs that are mapped to CGX LMACs. It should be checked before mapping, so we add the is_cgx_config_permitted() in the functions. Signed-off-by: Yingjie Wang --- drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c index d298b93..6c6b411 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c @@ -469,6 +469,9 @@ int rvu_mbox_handler_cgx_mac_addr_set(struct rvu *rvu, int pf = rvu_get_pf(req->hdr.pcifunc); u8 cgx_id, lmac_id; + if (!is_cgx_config_permitted(rvu, req->hdr.pcifunc)) + return -EPERM; + rvu_get_cgx_lmac_id(rvu->pf2cgxlmac_map[pf], &cgx_id, &lmac_id); cgx_lmac_addr_set(cgx_id, lmac_id, req->mac_addr); @@ -485,6 +488,9 @@ int rvu_mbox_handler_cgx_mac_addr_get(struct rvu *rvu, int rc = 0, i; u64 cfg; + if (!is_cgx_config_permitted(rvu, req->hdr.pcifunc)) + return -EPERM; + rvu_get_cgx_lmac_id(rvu->pf2cgxlmac_map[pf], &cgx_id, &lmac_id); rsp->hdr.rc = rc; -- 2.7.4
[PATCH] mt76: fix enum conversion warning
From: Arnd Bergmann A recent patch changed some enum values, but not the type declaration for the assignment: drivers/net/wireless/mediatek/mt76/mt7615/mcu.c:238:9: error: implicit conversion from enumeration type 'enum mt76_mcuq_id' to different enumeration type 'enum mt76_txq_id' [-Werror,-Wenum-conversion] qid = MT_MCUQ_WM; ~ ^~ drivers/net/wireless/mediatek/mt76/mt7615/mcu.c:240:9: error: implicit conversion from enumeration type 'enum mt76_mcuq_id' to different enumeration type 'enum mt76_txq_id' [-Werror,-Wenum-conversion] qid = MT_MCUQ_FWDL; ~ ^~~~ Change the type to match again. Fixes: e637763b606b ("mt76: move mcu queues to mt76_dev q_mcu array") Signed-off-by: Arnd Bergmann --- drivers/net/wireless/mediatek/mt76/mt7615/mcu.c | 2 +- drivers/net/wireless/mediatek/mt76/mt7915/mcu.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/wireless/mediatek/mt76/mt7615/mcu.c b/drivers/net/wireless/mediatek/mt76/mt7615/mcu.c index a44b7766dec6..c13547841a4e 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7615/mcu.c +++ b/drivers/net/wireless/mediatek/mt76/mt7615/mcu.c @@ -231,7 +231,7 @@ mt7615_mcu_send_message(struct mt76_dev *mdev, struct sk_buff *skb, int cmd, int *seq) { struct mt7615_dev *dev = container_of(mdev, struct mt7615_dev, mt76); - enum mt76_txq_id qid; + enum mt76_mcuq_id qid; mt7615_mcu_fill_msg(dev, skb, cmd, seq); if (test_bit(MT76_STATE_MCU_RUNNING, &dev->mphy.state)) diff --git a/drivers/net/wireless/mediatek/mt76/mt7915/mcu.c b/drivers/net/wireless/mediatek/mt76/mt7915/mcu.c index 5fdd1a6d32ee..22753fbb05e5 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7915/mcu.c +++ b/drivers/net/wireless/mediatek/mt76/mt7915/mcu.c @@ -256,7 +256,7 @@ mt7915_mcu_send_message(struct mt76_dev *mdev, struct sk_buff *skb, struct mt7915_dev *dev = container_of(mdev, struct mt7915_dev, mt76); struct mt7915_mcu_txd *mcu_txd; u8 seq, pkt_fmt, qidx; - enum mt76_txq_id txq; + enum mt76_mcuq_id txq; __le32 *txd; u32 val; -- 2.29.2
Re: BTFIDS: FAILED unresolved symbol udp6_sock
On Wed, Dec 30, 2020 at 11:19:36AM -0300, Arnaldo Carvalho de Melo wrote: > Em Wed, Dec 30, 2020 at 02:28:52PM +0100, Jiri Olsa escreveu: > > On Wed, Dec 30, 2020 at 02:28:02PM +0100, Jiri Olsa wrote: > > > On Wed, Dec 30, 2020 at 10:03:37AM +0100, Jiri Olsa wrote: > > > > On Tue, Dec 29, 2020 at 11:28:35PM +, Qais Yousef wrote: > > > > > Hi Jiri > > > > > > > > > > On 12/29/20 18:34, Jiri Olsa wrote: > > > > > > On Tue, Dec 29, 2020 at 03:13:52PM +, Qais Yousef wrote: > > > > > > > Hi > > > > > > > > > > > > > > When I enable CONFIG_DEBUG_INFO_BTF I get the following error in > > > > > > > the BTFIDS > > > > > > > stage > > > > > > > > > > > > > > FAILED unresolved symbol udp6_sock > > > > > > > > > > > > > > I cross compile for arm64. My .config is attached. > > > > > > > > > > > > > > I managed to reproduce the problem on v5.9 and v5.10. Plus > > > > > > > 5.11-rc1. > > > > > > > > > > > > > > Have you seen this before? I couldn't find a specific report > > > > > > > about this > > > > > > > problem. > > > > > > > > > > > > > > Let me know if you need more info. > > > > > > > > > > > > hi, > > > > > > this looks like symptom of the gcc DWARF bug we were > > > > > > dealing with recently: > > > > > > > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97060 > > > > > > > > > > > > https://lore.kernel.org/lkml/CAE1WUT75gu9G62Q9uAALGN6vLX=o7vz9uhqtvwnbuv81dgm...@mail.gmail.com/#r > > > > > > > > > > > > what pahole/gcc version are you using? > > > > > > > > > > I'm on gcc 9.3.0 > > > > > > > > > > aarch64-linux-gnu-gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 > > > > > > > > > > I was on pahole v1.17. I moved to v1.19 but I still see the same > > > > > problem. > > There are some changes post v1.19 in the git repo: > > [acme@five pahole]$ git log --oneline v1.19.. > b688e35970600c15 (HEAD -> master) btf_encoder: fix skipping per-CPU variables > at offset 0 > 8c009d6ce762dfc9 btf_encoder: fix BTF variable generation for kernel modules > b94e97e015a94e6b dwarves: Fix compilation on 32-bit architectures > 17df51c700248f02 btf_encoder: Detect kernel module ftrace addresses > 06ca639505fc56c6 btf_encoder: Use address size based on ELF's class > aff60970d16b909e btf_encoder: Factor filter_functions function > 1e6a3fed6e52d365 (quaco/master) rpm: Fix changelog date > [acme@five pahole]$ > > But I think these won't matter in this case :-\ yep, it did not.. I used the latest dwarves code jirka
Re: Registering IRQ for MT7530 internal PHYs
> > +static void mt7530_irq_bus_lock(struct irq_data *d) > > +{ > > + struct mt7530_priv *priv = irq_data_get_irq_chip_data(d); > > + > > + mutex_lock(&priv->reg_mutex); > > Are you always guaranteed to be in a thread context? I guess that > is the case, given that you request a threaded interrupt, but > it would be worth documenting. Hi Marc These Ethernet switches are often connected by MDIO, SPI or I2C busses to the SoC. So in order to access switch registers over these busses, sleeping is required. So yes, threaded interrupts are required. Andrew
Re: Registering IRQ for MT7530 internal PHYs
> 2. Allocated ds->slave_mii_bus before calling ds->ops->setup, because > we cannot call dsa_slave_mii_bus_init which is private. > > Any better ideas? Do what mv88e6xxx does, allocate the MDIO bus inside the switch driver. Andrew
Re: Registering IRQ for MT7530 internal PHYs
On Wed, Dec 30, 2020 at 09:42:09AM +, Marc Zyngier wrote: > > +static irqreturn_t > > +mt7530_irq(int irq, void *data) > > +{ > > + struct mt7530_priv *priv = data; > > + bool handled = false; > > + int phy; > > + u32 val; > > + > > + val = mt7530_read(priv, MT7530_SYS_INT_STS); > > + mt7530_write(priv, MT7530_SYS_INT_STS, val); > > If that is an ack operation, it should be dealt with as such in > an irqchip callback instead of being open-coded here. Hi Qingfang Does the PHY itself have interrupt control and status registers? My experience with the Marvell Switch and its embedded PHYs is that the PHYs are just the same as the discrete PHYs. There are bits to enable different interrupts, and there are status bits indicating what event caused the interrupt. Clearing the interrupt in the PHY clears the interrupt in the switch interrupt controller. So in the mv88e6xxx interrupt code, you see i do a read of the switch interrupt controller status register, but i don't write to it as you have done. Andrew
Re: [PATCH rfc 3/3] virtio-net: support transmit timestamp
On Wed, Dec 30, 2020 at 7:38 AM Richard Cochran wrote: > > On Mon, Dec 28, 2020 at 11:22:33AM -0500, Willem de Bruijn wrote: > > From: Willem de Bruijn > > > > Add optional delivery time (SO_TXTIME) offload for virtio-net. > > > > The Linux TCP/IP stack tries to avoid bursty transmission and network > > congestion through pacing: computing an skb delivery time based on > > congestion information. Userspace protocol implementations can achieve > > the same with SO_TXTIME. This may also reduce scheduling jitter and > > improve RTT estimation. > > This description is clear, but the Subject line is confusing. It made > me wonder whether this series is somehow about host/guest synchronization > (but your comments do explain that that isn't the case). > > How about this instead? > >virtio-net: support future packet transmit time Yes, that's clearer. As is, this could easily be mistaken for SOF_TIMESTAMPING_TX_*, which it is not. Will update, thanks. Related terms already in use are SO_TXTIME, Earliest Delivery Time (EDT) and Earliest TxTime First (ETF). I should probably also s/TX_TSTAMP/TX_TIME/ in the code for the same reason.
Re: BTFIDS: FAILED unresolved symbol udp6_sock
On 12/30/20 14:28, Jiri Olsa wrote: > On Wed, Dec 30, 2020 at 02:28:02PM +0100, Jiri Olsa wrote: > > On Wed, Dec 30, 2020 at 10:03:37AM +0100, Jiri Olsa wrote: > > > On Tue, Dec 29, 2020 at 11:28:35PM +, Qais Yousef wrote: > > > > Hi Jiri > > > > > > > > On 12/29/20 18:34, Jiri Olsa wrote: > > > > > On Tue, Dec 29, 2020 at 03:13:52PM +, Qais Yousef wrote: > > > > > > Hi > > > > > > > > > > > > When I enable CONFIG_DEBUG_INFO_BTF I get the following error in > > > > > > the BTFIDS > > > > > > stage > > > > > > > > > > > > FAILED unresolved symbol udp6_sock > > > > > > > > > > > > I cross compile for arm64. My .config is attached. > > > > > > > > > > > > I managed to reproduce the problem on v5.9 and v5.10. Plus 5.11-rc1. > > > > > > > > > > > > Have you seen this before? I couldn't find a specific report about > > > > > > this > > > > > > problem. > > > > > > > > > > > > Let me know if you need more info. > > > > > > > > > > hi, > > > > > this looks like symptom of the gcc DWARF bug we were > > > > > dealing with recently: > > > > > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97060 > > > > > > > > > > https://lore.kernel.org/lkml/CAE1WUT75gu9G62Q9uAALGN6vLX=o7vz9uhqtvwnbuv81dgm...@mail.gmail.com/#r > > > > > > > > > > what pahole/gcc version are you using? > > > > > > > > I'm on gcc 9.3.0 > > > > > > > > aarch64-linux-gnu-gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 > > > > > > > > I was on pahole v1.17. I moved to v1.19 but I still see the same > > > > problem. > > > > > > I can reproduce with your .config, but make 'defconfig' works, > > > so I guess it's some config option issue, I'll check later today My .config was a defconfig + CONFIG_DEBUG_INFO_BTF + convert all modules to built-in. > > > > so your .config has > > CONFIG_CRYPTO_DEV_BCM_SPU=y Ah, how random. I removed this config and indeed it does work then, thanks! > > > > and that defines 'struct device_private' which > > clashes with the same struct defined in drivers/base/base.h > > > > so several networking structs will be doubled, like net_device: > > > > $ bpftool btf dump file ../vmlinux.config | grep net_device\' | grep > > STRUCT > > [2731] STRUCT 'net_device' size=2240 vlen=133 > > [113981] STRUCT 'net_device' size=2240 vlen=133 > > > > each is using different 'struct device_private' when it's unwinded > > > > and that will confuse BTFIDS logic, becase we have multiple structs > > with the same name, and we can't be sure which one to pick We can't tell which object/subsystem the struct come from? Or maybe we can introduce some annotation to help BTFIDS to pick the right one? > > > > perhaps we should check on this in pahole and warn earlier with > > better error message.. I'll check, but I'm not sure if pahole can > > survive another hastab ;-) > > > > Andrii, any ideas on this? ;-) > > > > easy fix is the patch below that renames the bcm's structs, > > it makes the kernel to compile.. but of course the new name > > is probably wrong and we should push this through that code > > authors > > also another quick fix is to switch it to module This works too. Thanks for your speedy response. Cheers -- Qais Yousef
[PATCH net,stable] net: usb: qmi_wwan: add Quectel EM160R-GL
New modem using ff/ff/30 for QCDM, ff/00/00 for AT and NMEA, and ff/ff/ff for RMNET/QMI. T: Bus=02 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=5000 MxCh= 0 D: Ver= 3.20 Cls=ef(misc ) Sub=02 Prot=01 MxPS= 9 #Cfgs= 1 P: Vendor=2c7c ProdID=0620 Rev= 4.09 S: Manufacturer=Quectel S: Product=EM160R-GL S: SerialNumber=e31cedc1 C:* #Ifs= 5 Cfg#= 1 Atr=a0 MxPwr=896mA I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=(none) E: Ad=81(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=01(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none) E: Ad=83(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=82(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none) E: Ad=85(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=84(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=03(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none) E: Ad=87(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=86(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=04(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none) E: Ad=88(I) Atr=03(Int.) MxPS= 8 Ivl=32ms E: Ad=8e(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=0f(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms Signed-off-by: Bjørn Mork --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index d166c321ee9b..af19513a9f75 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -1013,6 +1013,7 @@ static const struct usb_device_id products[] = { {QMI_MATCH_FF_FF_FF(0x2c7c, 0x0125)}, /* Quectel EC25, EC20 R2.0 Mini PCIe */ {QMI_MATCH_FF_FF_FF(0x2c7c, 0x0306)}, /* Quectel EP06/EG06/EM06 */ {QMI_MATCH_FF_FF_FF(0x2c7c, 0x0512)}, /* Quectel EG12/EM12 */ + {QMI_MATCH_FF_FF_FF(0x2c7c, 0x0620)}, /* Quectel EM160R-GL */ {QMI_MATCH_FF_FF_FF(0x2c7c, 0x0800)}, /* Quectel RM500Q-GL */ /* 3. Combined interface devices matching on interface number */ -- 2.29.2
Re: [PATCH net-next v2 0/2] Add support for DSFP transceiver type
On Wed, Dec 30, 2020 at 03:55:02PM +0200, Vladyslav Tarasiuk wrote: > > On 29-Dec-20 18:25, Andrew Lunn wrote: > > > Hi Andrew, > > > > > > Following this conversation, I wrote some pseudocode checking if I'm on > > > right path here. > > > Please review: > > > > > > struct eeprom_page { > > > u8 page_number; > > > u8 bank_number; > > > u16 offset; > > > u16 data_length; > > > u8 *data; > > > } > > I'm wondering about offset and data_length, in this context. I would > > expect you always ask the kernel for the full page, not part of > > it. Even when user space asks for just part of a page. That keeps you > > cache management simpler. > As far as I know, there may be bytes, which may change on read. > For example, clear on read values in CMIS 4.0. Ah, i did not know there were such bits. I will go read the spec. But it should not really matter. If the SFP driver is interested in these bits, it will have to intercept the read and act on the values. > I wasn't aware of that. It complicates things a bit, should we add a > parameter of i2c address? So in this case page 0 will be with i2c > address A0h. And if user needs page 0 from i2c address A2h, he will > specify it in command line. Not on the command line. You should be able to determine from reading page 0 at A0h is the diagnostics are at A2h or a page of A0h. That is the whole point of this API, we decode the first page, and that tells us what other pages should be available. So adding the i2c address to the netlink message would be sensible. And i would not be too surprised if there are SFPs with proprietary registers on other addresses, which could be interesting to dump, if you can get access to the needed datasheets. Andrew
Re: [PATCH 1/2] net: phy: micrel: Add KS8851 PHY support
On Wed, Dec 30, 2020 at 01:53:57PM +0100, Marek Vasut wrote: > The KS8851 has a reduced internal PHY, which is accessible through its > registers at offset 0xe4. The PHY is compatible with KS886x PHY present > in Micrel switches, except the PHY ID Low/High registers are swapped. Can you intercept the reads in the KS8851 driver and swap them back again? The mv88e6xxx driver does something similar. The mv88e6393 family of switches have PHYs with the Marvell OUI but no device ID. So the code traps these reads and provides an ID. Andrew
[PATCH 0/4] net: sfp: add support for GPON RTL8672/RTL9601C and Ubiquiti U-Fiber
This patch series add generic workaround for reading EEPROM content from broken GPON SFP modules based on Realtek RTL8672/RTL9601C chips and add another workarounds for GPON SFP module Ubiquiti U-Fiber Instant. GPON SFP modules based on Realtek RTL8672/RTL9601C chips do not have a real EEPROM but rather EEPROM emulator which is broken and needs special hack for reading its content. SFP module detection is done based on EEPROM content. But we obviously cannot read EEPROM correctly if we do not know what type of connected SFP module... And to have this chicken and egg problem more complicated, GPON vendors generally put garbage into their EEPROM content so even with knowing EEPROM content we do not know what kind of broken SFP is connected... Workaround for Realtek RTL8672/RTL9601C based modules is therefore done based on broken EEPROM reading characteristic. This patch series also available in my git branch sfp-rtl8672: https://git.kernel.org/pub/scm/linux/kernel/git/pali/linux.git/log/?h=sfp-rtl8672 Pali Rohár (4): net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips net: sfp: allow to use also SFP modules which are detected as SFF net: sfp: assume that LOS is not implemented if both LOS normal and inverted is set net: sfp: add mode quirk for GPON module Ubiquiti U-Fiber Instant drivers/net/phy/sfp-bus.c | 15 + drivers/net/phy/sfp.c | 117 ++ 2 files changed, 83 insertions(+), 49 deletions(-) -- 2.20.1
[PATCH 3/4] net: sfp: assume that LOS is not implemented if both LOS normal and inverted is set
Some GPON SFP modules (e.g. Ubiquiti U-Fiber Instant) have set both SFP_OPTIONS_LOS_INVERTED and SFP_OPTIONS_LOS_NORMAL bits in their EEPROM. Such combination of bits is meaningless so assume that LOS signal is not implemented. This patch fixes link carrier for GPON SFP module Ubiquiti U-Fiber Instant. Co-developed-by: Russell King Signed-off-by: Russell King Signed-off-by: Pali Rohár --- drivers/net/phy/sfp.c | 36 ++-- 1 file changed, 22 insertions(+), 14 deletions(-) diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c index 73f3ecf15260..d47485ed239c 100644 --- a/drivers/net/phy/sfp.c +++ b/drivers/net/phy/sfp.c @@ -1475,15 +1475,19 @@ static void sfp_sm_link_down(struct sfp *sfp) static void sfp_sm_link_check_los(struct sfp *sfp) { - unsigned int los = sfp->state & SFP_F_LOS; + const __be16 los_inverted = cpu_to_be16(SFP_OPTIONS_LOS_INVERTED); + const __be16 los_normal = cpu_to_be16(SFP_OPTIONS_LOS_NORMAL); + __be16 los_options = sfp->id.ext.options & (los_inverted | los_normal); + bool los = false; /* If neither SFP_OPTIONS_LOS_INVERTED nor SFP_OPTIONS_LOS_NORMAL -* are set, we assume that no LOS signal is available. +* are set, we assume that no LOS signal is available. If both are +* set, we assume LOS is not implemented (and is meaningless.) */ - if (sfp->id.ext.options & cpu_to_be16(SFP_OPTIONS_LOS_INVERTED)) - los ^= SFP_F_LOS; - else if (!(sfp->id.ext.options & cpu_to_be16(SFP_OPTIONS_LOS_NORMAL))) - los = 0; + if (los_options == los_inverted) + los = !(sfp->state & SFP_F_LOS); + else if (los_options == los_normal) + los = !!(sfp->state & SFP_F_LOS); if (los) sfp_sm_next(sfp, SFP_S_WAIT_LOS, 0); @@ -1493,18 +1497,22 @@ static void sfp_sm_link_check_los(struct sfp *sfp) static bool sfp_los_event_active(struct sfp *sfp, unsigned int event) { - return (sfp->id.ext.options & cpu_to_be16(SFP_OPTIONS_LOS_INVERTED) && - event == SFP_E_LOS_LOW) || - (sfp->id.ext.options & cpu_to_be16(SFP_OPTIONS_LOS_NORMAL) && - event == SFP_E_LOS_HIGH); + const __be16 los_inverted = cpu_to_be16(SFP_OPTIONS_LOS_INVERTED); + const __be16 los_normal = cpu_to_be16(SFP_OPTIONS_LOS_NORMAL); + __be16 los_options = sfp->id.ext.options & (los_inverted | los_normal); + + return (los_options == los_inverted && event == SFP_E_LOS_LOW) || + (los_options == los_normal && event == SFP_E_LOS_HIGH); } static bool sfp_los_event_inactive(struct sfp *sfp, unsigned int event) { - return (sfp->id.ext.options & cpu_to_be16(SFP_OPTIONS_LOS_INVERTED) && - event == SFP_E_LOS_HIGH) || - (sfp->id.ext.options & cpu_to_be16(SFP_OPTIONS_LOS_NORMAL) && - event == SFP_E_LOS_LOW); + const __be16 los_inverted = cpu_to_be16(SFP_OPTIONS_LOS_INVERTED); + const __be16 los_normal = cpu_to_be16(SFP_OPTIONS_LOS_NORMAL); + __be16 los_options = sfp->id.ext.options & (los_inverted | los_normal); + + return (los_options == los_inverted && event == SFP_E_LOS_HIGH) || + (los_options == los_normal && event == SFP_E_LOS_LOW); } static void sfp_sm_fault(struct sfp *sfp, unsigned int next_state, bool warn) -- 2.20.1
[PATCH 2/4] net: sfp: allow to use also SFP modules which are detected as SFF
Some GPON SFP modules (e.g. Ubiquiti U-Fiber Instant) have set SFF phys_id in their EEPROM. Kernel SFP subsystem currently does not allow to use modules detected as SFF. This change extends check for SFP modules so also those with SFF phys_id are allowed. With this change also GPON SFP module Ubiquiti U-Fiber Instant is recognized. Signed-off-by: Pali Rohár --- drivers/net/phy/sfp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c index 490e78a72dd6..73f3ecf15260 100644 --- a/drivers/net/phy/sfp.c +++ b/drivers/net/phy/sfp.c @@ -273,7 +273,8 @@ static const struct sff_data sff_data = { static bool sfp_module_supported(const struct sfp_eeprom_id *id) { - return id->base.phys_id == SFF8024_ID_SFP && + return (id->base.phys_id == SFF8024_ID_SFP || + id->base.phys_id == SFF8024_ID_SFF_8472) && id->base.phys_ext_id == SFP_PHYS_EXT_ID_SFP; } -- 2.20.1
[PATCH 1/4] net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips
Workaround for GPON SFP modules based on VSOL V2801F brand was added in commit 0d035bed2a4a ("net: sfp: VSOL V2801F / CarlitoxxPro CPGOS03-0490 v2.0 workaround"). But it works only for ids explicitly added to the list. As there are more rebraded VSOL V2801F modules and OEM vendors are putting into vendor name random strings we cannot build workaround based on ids. Moreover issue which that commit tried to workaround is generic not only to VSOL based modules, but rather to all GPON modules based on Realtek RTL8672 and RTL9601C chips. They can be found for example in following GPON modules: * V-SOL V2801F * C-Data FD511GX-RM0 * OPTON GP801R * BAUDCOM BD-1234-SFM * CPGOS03-0490 v2.0 * Ubiquiti U-Fiber Instant * EXOT EGS1 Those Realtek chips have broken EEPROM emulator which for N-byte read operation returns just one byte of EEPROM data followed by N-1 zeros. So introduce a new function sfp_id_needs_byte_io() which detects SFP modules with these Realtek chips which have broken EEPROM emulator based on N-1 zeros and switch to 1 byte EEPROM reading operation which workaround this issue. This patch fixes reading EEPROM content from SFP modules based on Realtek RTL8672 and RTL9601C chips. Fixes: 0d035bed2a4a ("net: sfp: VSOL V2801F / CarlitoxxPro CPGOS03-0490 v2.0 workaround") Co-developed-by: Russell King Signed-off-by: Russell King Signed-off-by: Pali Rohár --- drivers/net/phy/sfp.c | 78 --- 1 file changed, 44 insertions(+), 34 deletions(-) diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c index 91d74c1a920a..490e78a72dd6 100644 --- a/drivers/net/phy/sfp.c +++ b/drivers/net/phy/sfp.c @@ -336,19 +336,11 @@ static int sfp_i2c_read(struct sfp *sfp, bool a2, u8 dev_addr, void *buf, size_t len) { struct i2c_msg msgs[2]; - size_t block_size; + u8 bus_addr = a2 ? 0x51 : 0x50; + size_t block_size = sfp->i2c_block_size; size_t this_len; - u8 bus_addr; int ret; - if (a2) { - block_size = 16; - bus_addr = 0x51; - } else { - block_size = sfp->i2c_block_size; - bus_addr = 0x50; - } - msgs[0].addr = bus_addr; msgs[0].flags = 0; msgs[0].len = 1; @@ -1642,26 +1634,30 @@ static int sfp_sm_mod_hpower(struct sfp *sfp, bool enable) return 0; } -/* Some modules (Nokia 3FE46541AA) lock up if byte 0x51 is read as a - * single read. Switch back to reading 16 byte blocks unless we have - * a CarlitoxxPro module (rebranded VSOL V2801F). Even more annoyingly, - * some VSOL V2801F have the vendor name changed to OEM. +/* GPON modules based on Realtek RTL8672 and RTL9601C chips (e.g. V-SOL + * V2801F, CarlitoxxPro CPGOS03-0490, Ubiquiti U-Fiber Instant, ...) do + * not support multibyte reads from the EEPROM. Each multi-byte read + * operation returns just one byte of EEPROM followed by zeros. There is + * no way to identify which modules are using Realtek RTL8672 and RTL9601C + * chips. Moreover every OEM of V-SOL V2801F module puts its own vendor + * name and vendor id into EEPROM, so there is even no way to detect if + * module is V-SOL V2801F. Therefore check for those zeros in the read + * data and then based on check switch to reading EEPROM to one byte + * at a time. */ -static int sfp_quirk_i2c_block_size(const struct sfp_eeprom_base *base) +static bool sfp_id_needs_byte_io(struct sfp *sfp, void *buf, size_t len) { - if (!memcmp(base->vendor_name, "VSOL", 16)) - return 1; - if (!memcmp(base->vendor_name, "OEM ", 16) && - !memcmp(base->vendor_pn, "V2801F ", 16)) - return 1; + size_t i, block_size = sfp->i2c_block_size; - /* Some modules can't cope with long reads */ - return 16; -} + /* Already using byte IO */ + if (block_size == 1) + return false; -static void sfp_quirks_base(struct sfp *sfp, const struct sfp_eeprom_base *base) -{ - sfp->i2c_block_size = sfp_quirk_i2c_block_size(base); + for (i = 1; i < len; i += block_size) { + if (memchr_inv(buf + i, '\0', block_size - 1)) + return false; + } + return true; } static int sfp_cotsworks_fixup_check(struct sfp *sfp, struct sfp_eeprom_id *id) @@ -1705,11 +1701,7 @@ static int sfp_sm_mod_probe(struct sfp *sfp, bool report) u8 check; int ret; - /* Some modules (CarlitoxxPro CPGOS03-0490) do not support multibyte -* reads from the EEPROM, so start by reading the base identifying -* information one byte at a time. -*/ - sfp->i2c_block_size = 1; + sfp->i2c_block_size = 16; ret = sfp_read(sfp, false, 0, &id.base, sizeof(id.base)); if (ret < 0) { @@ -1723,6 +1715,27 @@ static int sfp_sm_mod_probe(struct sfp *sfp, bool report) return -EAGAIN; } +
[PATCH 4/4] net: sfp: add mode quirk for GPON module Ubiquiti U-Fiber Instant
SFP GPON module Ubiquiti U-Fiber Instant has in its EEPROM stored nonsense information. It claims that support all transceiver types including 10G Ethernet which is not truth. So clear all claimed modes and set only one mode which module supports: 1000baseX_Full. This change finally allows to detect and use SFP GPON module Ubiquiti U-Fiber Instant on Linux system. EEPROM content of this SFP module is (where XX is serial number): 00: 02 04 0b ff ff ff ff ff ff ff ff 03 0c 00 14 c8?????.?? 10: 00 00 00 00 55 42 4e 54 20 20 20 20 20 20 20 20UBNT 20: 20 20 20 20 00 18 e8 29 55 46 2d 49 4e 53 54 41.??)UF-INSTA 30: 4e 54 20 20 20 20 20 20 34 20 20 20 05 1e 00 36NT 4 ??.6 40: 00 06 00 00 55 42 4e 54 XX XX XX XX XX XX XX XX.?..UBNT 50: 20 20 20 20 31 34 30 31 32 33 20 20 60 80 02 41140123 `??A Signed-off-by: Pali Rohár --- drivers/net/phy/sfp-bus.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/drivers/net/phy/sfp-bus.c b/drivers/net/phy/sfp-bus.c index 20b91f5dfc6e..4cf874fb5c5b 100644 --- a/drivers/net/phy/sfp-bus.c +++ b/drivers/net/phy/sfp-bus.c @@ -44,6 +44,17 @@ static void sfp_quirk_2500basex(const struct sfp_eeprom_id *id, phylink_set(modes, 2500baseX_Full); } +static void sfp_quirk_ubnt_uf_instant(const struct sfp_eeprom_id *id, + unsigned long *modes) +{ + /* Ubiquiti U-Fiber Instant module claims that support all transceiver +* types including 10G Ethernet which is not truth. So clear all claimed +* modes and set only one mode which module supports: 1000baseX_Full. +*/ + phylink_zero(modes); + phylink_set(modes, 1000baseX_Full); +} + static const struct sfp_quirk sfp_quirks[] = { { // Alcatel Lucent G-010S-P can operate at 2500base-X, but @@ -63,6 +74,10 @@ static const struct sfp_quirk sfp_quirks[] = { .vendor = "HUAWEI", .part = "MA5671A", .modes = sfp_quirk_2500basex, + }, { + .vendor = "UBNT", + .part = "UF-INSTANT", + .modes = sfp_quirk_ubnt_uf_instant, }, }; -- 2.20.1
Re: [PATCH 2/2] net: ks8851: Register MDIO bus and the internal PHY
> +static int ks8851_mdio_read(struct mii_bus *bus, int phy_id, int reg) > +{ > + struct ks8851_net *ks = bus->priv; > + > + if (phy_id != 0) > + return 0x; > + Please check for C45 and return -EOPNOTSUPP. Andrew
Re: [PATCH 1/4] net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips
On Wed, Dec 30, 2020 at 04:47:52PM +0100, Pali Rohár wrote: > Workaround for GPON SFP modules based on VSOL V2801F brand was added in > commit 0d035bed2a4a ("net: sfp: VSOL V2801F / CarlitoxxPro CPGOS03-0490 > v2.0 workaround"). But it works only for ids explicitly added to the list. > As there are more rebraded VSOL V2801F modules and OEM vendors are putting > into vendor name random strings we cannot build workaround based on ids. > > Moreover issue which that commit tried to workaround is generic not only to > VSOL based modules, but rather to all GPON modules based on Realtek RTL8672 > and RTL9601C chips. > > They can be found for example in following GPON modules: > * V-SOL V2801F > * C-Data FD511GX-RM0 > * OPTON GP801R > * BAUDCOM BD-1234-SFM > * CPGOS03-0490 v2.0 > * Ubiquiti U-Fiber Instant > * EXOT EGS1 > > Those Realtek chips have broken EEPROM emulator which for N-byte read > operation returns just one byte of EEPROM data followed by N-1 zeros. > > So introduce a new function sfp_id_needs_byte_io() which detects SFP > modules with these Realtek chips which have broken EEPROM emulator based on > N-1 zeros and switch to 1 byte EEPROM reading operation which workaround > this issue. > > This patch fixes reading EEPROM content from SFP modules based on Realtek > RTL8672 and RTL9601C chips. > > Fixes: 0d035bed2a4a ("net: sfp: VSOL V2801F / CarlitoxxPro CPGOS03-0490 v2.0 > workaround") > Co-developed-by: Russell King > Signed-off-by: Russell King > Signed-off-by: Pali Rohár > --- > drivers/net/phy/sfp.c | 78 --- > 1 file changed, 44 insertions(+), 34 deletions(-) > > diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c > index 91d74c1a920a..490e78a72dd6 100644 > --- a/drivers/net/phy/sfp.c > +++ b/drivers/net/phy/sfp.c > @@ -336,19 +336,11 @@ static int sfp_i2c_read(struct sfp *sfp, bool a2, u8 > dev_addr, void *buf, > size_t len) > { > struct i2c_msg msgs[2]; > - size_t block_size; > + u8 bus_addr = a2 ? 0x51 : 0x50; > + size_t block_size = sfp->i2c_block_size; > size_t this_len; > - u8 bus_addr; > int ret; > > - if (a2) { > - block_size = 16; > - bus_addr = 0x51; > - } else { > - block_size = sfp->i2c_block_size; > - bus_addr = 0x50; > - } > - NAK. You are undoing something that is definitely needed. The diagnostics must be read with sequential reads to be able to properly read the 16-bit values. The rest of the patch is fine; it's a shame the entire thing has been spoilt by this extra addition that was not in the patch we had been discussing off-list. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Re: [PATCH 2/4] net: sfp: allow to use also SFP modules which are detected as SFF
On Wed, Dec 30, 2020 at 04:47:53PM +0100, Pali Rohár wrote: > Some GPON SFP modules (e.g. Ubiquiti U-Fiber Instant) have set SFF phys_id > in their EEPROM. Kernel SFP subsystem currently does not allow to use > modules detected as SFF. > > This change extends check for SFP modules so also those with SFF phys_id > are allowed. With this change also GPON SFP module Ubiquiti U-Fiber Instant > is recognized. I really don't want to do this for every single module out there. It's likely that Ubiquiti do this crap as a vendor lock-in measure. Let's make it specific to Ubiquiti modules _only_ until such time that we know better. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Re: [PATCH 3/4] net: sfp: assume that LOS is not implemented if both LOS normal and inverted is set
On Wed, Dec 30, 2020 at 04:47:54PM +0100, Pali Rohár wrote: > Some GPON SFP modules (e.g. Ubiquiti U-Fiber Instant) have set both > SFP_OPTIONS_LOS_INVERTED and SFP_OPTIONS_LOS_NORMAL bits in their EEPROM. > > Such combination of bits is meaningless so assume that LOS signal is not > implemented. > > This patch fixes link carrier for GPON SFP module Ubiquiti U-Fiber Instant. > > Co-developed-by: Russell King > Signed-off-by: Russell King No, this is not co-developed. The patch content is exactly what _I_ sent you, only the commit description is your own. > Signed-off-by: Pali Rohár > --- > drivers/net/phy/sfp.c | 36 ++-- > 1 file changed, 22 insertions(+), 14 deletions(-) > > diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c > index 73f3ecf15260..d47485ed239c 100644 > --- a/drivers/net/phy/sfp.c > +++ b/drivers/net/phy/sfp.c > @@ -1475,15 +1475,19 @@ static void sfp_sm_link_down(struct sfp *sfp) > > static void sfp_sm_link_check_los(struct sfp *sfp) > { > - unsigned int los = sfp->state & SFP_F_LOS; > + const __be16 los_inverted = cpu_to_be16(SFP_OPTIONS_LOS_INVERTED); > + const __be16 los_normal = cpu_to_be16(SFP_OPTIONS_LOS_NORMAL); > + __be16 los_options = sfp->id.ext.options & (los_inverted | los_normal); > + bool los = false; > > /* If neither SFP_OPTIONS_LOS_INVERTED nor SFP_OPTIONS_LOS_NORMAL > - * are set, we assume that no LOS signal is available. > + * are set, we assume that no LOS signal is available. If both are > + * set, we assume LOS is not implemented (and is meaningless.) >*/ > - if (sfp->id.ext.options & cpu_to_be16(SFP_OPTIONS_LOS_INVERTED)) > - los ^= SFP_F_LOS; > - else if (!(sfp->id.ext.options & cpu_to_be16(SFP_OPTIONS_LOS_NORMAL))) > - los = 0; > + if (los_options == los_inverted) > + los = !(sfp->state & SFP_F_LOS); > + else if (los_options == los_normal) > + los = !!(sfp->state & SFP_F_LOS); > > if (los) > sfp_sm_next(sfp, SFP_S_WAIT_LOS, 0); > @@ -1493,18 +1497,22 @@ static void sfp_sm_link_check_los(struct sfp *sfp) > > static bool sfp_los_event_active(struct sfp *sfp, unsigned int event) > { > - return (sfp->id.ext.options & cpu_to_be16(SFP_OPTIONS_LOS_INVERTED) && > - event == SFP_E_LOS_LOW) || > -(sfp->id.ext.options & cpu_to_be16(SFP_OPTIONS_LOS_NORMAL) && > - event == SFP_E_LOS_HIGH); > + const __be16 los_inverted = cpu_to_be16(SFP_OPTIONS_LOS_INVERTED); > + const __be16 los_normal = cpu_to_be16(SFP_OPTIONS_LOS_NORMAL); > + __be16 los_options = sfp->id.ext.options & (los_inverted | los_normal); > + > + return (los_options == los_inverted && event == SFP_E_LOS_LOW) || > +(los_options == los_normal && event == SFP_E_LOS_HIGH); > } > > static bool sfp_los_event_inactive(struct sfp *sfp, unsigned int event) > { > - return (sfp->id.ext.options & cpu_to_be16(SFP_OPTIONS_LOS_INVERTED) && > - event == SFP_E_LOS_HIGH) || > -(sfp->id.ext.options & cpu_to_be16(SFP_OPTIONS_LOS_NORMAL) && > - event == SFP_E_LOS_LOW); > + const __be16 los_inverted = cpu_to_be16(SFP_OPTIONS_LOS_INVERTED); > + const __be16 los_normal = cpu_to_be16(SFP_OPTIONS_LOS_NORMAL); > + __be16 los_options = sfp->id.ext.options & (los_inverted | los_normal); > + > + return (los_options == los_inverted && event == SFP_E_LOS_HIGH) || > +(los_options == los_normal && event == SFP_E_LOS_LOW); > } > > static void sfp_sm_fault(struct sfp *sfp, unsigned int next_state, bool warn) > -- > 2.20.1 > > -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
[PATCH v2] xfrm: Fix wraparound in xfrm_policy_addr_delta()
Use three-way comparison for address components to avoid integer wraparound in the result of xfrm_policy_addr_delta(). This ensures that the search trees are built and traversed correctly. Treat IPv4 and IPv6 similarly by returning 0 when prefixlen == 0. Prefix /0 has only one equivalence class. Fixes: 9cf545ebd591d ("xfrm: policy: store inexact policies in a tree ordered by destination address") Signed-off-by: Visa Hankala --- net/xfrm/xfrm_policy.c | 26 + tools/testing/selftests/net/xfrm_policy.sh | 43 ++ 2 files changed, 61 insertions(+), 8 deletions(-) diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index d622c2548d22..68258ff6a30b 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -793,15 +793,22 @@ static int xfrm_policy_addr_delta(const xfrm_address_t *a, const xfrm_address_t *b, u8 prefixlen, u16 family) { + u32 ma, mb, mask; unsigned int pdw, pbi; int delta = 0; switch (family) { case AF_INET: - if (sizeof(long) == 4 && prefixlen == 0) - return ntohl(a->a4) - ntohl(b->a4); - return (ntohl(a->a4) & ((~0UL << (32 - prefixlen - - (ntohl(b->a4) & ((~0UL << (32 - prefixlen; + if (prefixlen == 0) + return 0; + mask = ~0U << (32 - prefixlen); + ma = ntohl(a->a4) & mask; + mb = ntohl(b->a4) & mask; + if (ma < mb) + delta = -1; + else if (ma > mb) + delta = 1; + break; case AF_INET6: pdw = prefixlen >> 5; pbi = prefixlen & 0x1f; @@ -812,10 +819,13 @@ static int xfrm_policy_addr_delta(const xfrm_address_t *a, return delta; } if (pbi) { - u32 mask = ~0u << (32 - pbi); - - delta = (ntohl(a->a6[pdw]) & mask) - - (ntohl(b->a6[pdw]) & mask); + mask = ~0U << (32 - pbi); + ma = ntohl(a->a6[pdw]) & mask; + mb = ntohl(b->a6[pdw]) & mask; + if (ma < mb) + delta = -1; + else if (ma > mb) + delta = 1; } break; default: diff --git a/tools/testing/selftests/net/xfrm_policy.sh b/tools/testing/selftests/net/xfrm_policy.sh index 7a1bf94c5bd3..9e2dffb670d5 100755 --- a/tools/testing/selftests/net/xfrm_policy.sh +++ b/tools/testing/selftests/net/xfrm_policy.sh @@ -287,6 +287,47 @@ check_hthresh_repeat() return 0 } +# insert non-overlapping policies in a random order and check that +# all of them can be fetched using the traffic selectors. +check_random_order() +{ + local ns=$1 + local log=$2 + + for i in $(seq 100); do + ip -net $ns xfrm policy flush + for j in $(seq 0 16 255 | sort -R); do + ip -net $ns xfrm policy add dst $j.0.0.0/24 dir out priority 10 action allow + done + for j in $(seq 0 16 255); do + if ! ip -net $ns xfrm policy get dst $j.0.0.0/24 dir out > /dev/null; then + echo "FAIL: $log" 1>&2 + return 1 + fi + done + done + + for i in $(seq 100); do + ip -net $ns xfrm policy flush + for j in $(seq 0 16 255 | sort -R); do + local addr=$(printf "e000::%02x00::/56" $j) + ip -net $ns xfrm policy add dst $addr dir out priority 10 action allow + done + for j in $(seq 0 16 255); do + local addr=$(printf "e000::%02x00::/56" $j) + if ! ip -net $ns xfrm policy get dst $addr dir out > /dev/null; then + echo "FAIL: $log" 1>&2 + return 1 + fi + done + done + + ip -net $ns xfrm policy flush + + echo "PASS: $log" + return 0 +} + #check for needed privileges if [ "$(id -u)" -ne 0 ];then echo "SKIP: Need root privileges" @@ -438,6 +479,8 @@ check_exceptions "exceptions and block policies after htresh change to normal" check_hthresh_repeat "policies with repeated htresh change" +check_random_order ns3 "policies inserted in random order" + for i in 1 2 3 4;do ip netns del ns$i;done exit $ret
Re: Registering IRQ for MT7530 internal PHYs
On 12/30/2020 1:12 AM, Heiner Kallweit wrote: > On 30.12.2020 10:07, DENG Qingfang wrote: >> Hi Heiner, >> Thanks for your reply. >> >> On Wed, Dec 30, 2020 at 3:39 PM Heiner Kallweit wrote: >>> I don't think that's the best option. >> >> I'm well aware of that. >> >>> You may want to add a PHY driver for your chip. Supposedly it >>> supports at least PHY suspend/resume. You can use the RTL8366RB >>> PHY driver as template. >> >> There's no MediaTek PHY driver yet. Do we really need a new one just >> for the interrupts? >> > Not only for the interrupts. The genphy driver e.g. doesn't support > PHY suspend/resume. And the PHY driver needs basically no code, > just set the proper callbacks. That statement about not supporting suspend/resume is not exactly true, the generic "1g" PHY driver only implements suspend/resume through the use of the standard BMCR power down bit, but not anything more complicated than that. Interrupt handling within the PHY itself is not defined by the existing standard registers and will typically not reside in a standard register space either, so just for that reason you do need a custom PHY driver. There are other advantages if you need to expose additional PHY features down the road like PHY counters, energy detection, automatic power down etc. I don't believe we will see discrete/standalone Mediatek PHY chips, but if that happens, then you would already have a framework for supporting them. -- Florian
Re: Registering IRQ for MT7530 internal PHYs
On 12/30/2020 7:16 AM, Andrew Lunn wrote: >> 2. Allocated ds->slave_mii_bus before calling ds->ops->setup, because >> we cannot call dsa_slave_mii_bus_init which is private. >> >> Any better ideas? > > Do what mv88e6xxx does, allocate the MDIO bus inside the switch > driver. Yes, exactly, or you could add additional hooks to allow intercepting the initialization and de-initialization of the ds->slave_mii_bus, something like this: https://github.com/ffainelli/linux/commit/758da087a819cd1a284de074ea7d8eae9f875f0b which was part of a larger series adding threaded IRQ support to the b53 driver: https://github.com/ffainelli/linux/commits/b53-irq -- Florian
Re: [PATCH] Allow user to set metric on default route learned via Router Advertisement.
On Wed, Dec 30, 2020 at 12:48:32AM -0800, Praveen Chaudhary wrote: > Allow user to set metric on default route learned via Router Advertisement. > Not: RFC 4191 does not say anything for metric for IPv6 default route. Hi Praveen Please take a look at https://www.kernel.org/doc/html/latest/process/submitting-patches.html and https://www.kernel.org/doc/html/latest/networking/netdev-FAQ.html In particular, you should be using git format-patch, which will correctly number your patches, make use of a version number inside the [PATCH v0 net-next] subject part, and please indicate which tree this is for. Andrew
Re: [PATCH v2] net: macb: Correct usage of MACB_CAPS_CLK_HW_CHG flag
On Wed, Dec 30, 2020 at 10:33:09AM +, Charles Keepax wrote: > A new flag MACB_CAPS_CLK_HW_CHG was added and all callers of > macb_set_tx_clk were gated on the presence of this flag. > > - if (!clk) > + if (!bp->tx_clk || !(bp->caps & MACB_CAPS_CLK_HW_CHG)) > > However the flag was not added to anything other than the new > sama7g5_gem, turning that function call into a no op for all other > systems. This breaks the networking on Zynq. > > The commit message adding this states: a new capability so that > macb_set_tx_clock() to not be called for IPs having this > capability > > This strongly implies that present of the flag was intended to skip > the function not absence of the flag. Update the if statement to > this effect, which repairs the existing users. > > Fixes: daafa1d33cc9 ("net: macb: add capability to not set the clock rate") > Suggested-by: Andrew Lunn > Signed-off-by: Charles Keepax Reviewed-by: Andrew Lunn Hi Charles Since this is a fix, this should be based on the net tree. And you indicate this in the subject line with [PATCH net]. If this applies cleanly to net, Dave will probably just accept it, but please keep this in mind for any more submissions you make to netdev. Andrew
Re: Registering IRQ for MT7530 internal PHYs
On 30.12.2020 17:15, Florian Fainelli wrote: > > > On 12/30/2020 1:12 AM, Heiner Kallweit wrote: >> On 30.12.2020 10:07, DENG Qingfang wrote: >>> Hi Heiner, >>> Thanks for your reply. >>> >>> On Wed, Dec 30, 2020 at 3:39 PM Heiner Kallweit >>> wrote: I don't think that's the best option. >>> >>> I'm well aware of that. >>> You may want to add a PHY driver for your chip. Supposedly it supports at least PHY suspend/resume. You can use the RTL8366RB PHY driver as template. >>> >>> There's no MediaTek PHY driver yet. Do we really need a new one just >>> for the interrupts? >>> >> Not only for the interrupts. The genphy driver e.g. doesn't support >> PHY suspend/resume. And the PHY driver needs basically no code, >> just set the proper callbacks. > > That statement about not supporting suspend/resume is not exactly true, > the generic "1g" PHY driver only implements suspend/resume through the > use of the standard BMCR power down bit, but not anything more > complicated than that. > Oh, right. Somehow I had in the back of my mind that the genphy driver has no suspend/resume callbacks set. > Interrupt handling within the PHY itself is not defined by the existing > standard registers and will typically not reside in a standard register > space either, so just for that reason you do need a custom PHY driver. > There are other advantages if you need to expose additional PHY features > down the road like PHY counters, energy detection, automatic power down etc. > > I don't believe we will see discrete/standalone Mediatek PHY chips, but > if that happens, then you would already have a framework for supporting > them. >
Re: [PATCH 1/4] net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips
On Wednesday 30 December 2020 16:10:37 Russell King - ARM Linux admin wrote: > On Wed, Dec 30, 2020 at 04:47:52PM +0100, Pali Rohár wrote: > > Workaround for GPON SFP modules based on VSOL V2801F brand was added in > > commit 0d035bed2a4a ("net: sfp: VSOL V2801F / CarlitoxxPro CPGOS03-0490 > > v2.0 workaround"). But it works only for ids explicitly added to the list. > > As there are more rebraded VSOL V2801F modules and OEM vendors are putting > > into vendor name random strings we cannot build workaround based on ids. > > > > Moreover issue which that commit tried to workaround is generic not only to > > VSOL based modules, but rather to all GPON modules based on Realtek RTL8672 > > and RTL9601C chips. > > > > They can be found for example in following GPON modules: > > * V-SOL V2801F > > * C-Data FD511GX-RM0 > > * OPTON GP801R > > * BAUDCOM BD-1234-SFM > > * CPGOS03-0490 v2.0 > > * Ubiquiti U-Fiber Instant > > * EXOT EGS1 > > > > Those Realtek chips have broken EEPROM emulator which for N-byte read > > operation returns just one byte of EEPROM data followed by N-1 zeros. > > > > So introduce a new function sfp_id_needs_byte_io() which detects SFP > > modules with these Realtek chips which have broken EEPROM emulator based on > > N-1 zeros and switch to 1 byte EEPROM reading operation which workaround > > this issue. > > > > This patch fixes reading EEPROM content from SFP modules based on Realtek > > RTL8672 and RTL9601C chips. > > > > Fixes: 0d035bed2a4a ("net: sfp: VSOL V2801F / CarlitoxxPro CPGOS03-0490 > > v2.0 workaround") > > Co-developed-by: Russell King > > Signed-off-by: Russell King > > Signed-off-by: Pali Rohár > > --- > > drivers/net/phy/sfp.c | 78 --- > > 1 file changed, 44 insertions(+), 34 deletions(-) > > > > diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c > > index 91d74c1a920a..490e78a72dd6 100644 > > --- a/drivers/net/phy/sfp.c > > +++ b/drivers/net/phy/sfp.c > > @@ -336,19 +336,11 @@ static int sfp_i2c_read(struct sfp *sfp, bool a2, u8 > > dev_addr, void *buf, > > size_t len) > > { > > struct i2c_msg msgs[2]; > > - size_t block_size; > > + u8 bus_addr = a2 ? 0x51 : 0x50; > > + size_t block_size = sfp->i2c_block_size; > > size_t this_len; > > - u8 bus_addr; > > int ret; > > > > - if (a2) { > > - block_size = 16; > > - bus_addr = 0x51; > > - } else { > > - block_size = sfp->i2c_block_size; > > - bus_addr = 0x50; > > - } > > - > > NAK. You are undoing something that is definitely needed. The > diagnostics must be read with sequential reads to be able to properly > read the 16-bit values. This change is really required for those Realtek chips. I thought that it is obvious that from *both* addresses 0x50 and 0x51 can be read only one byte at the same time. Reading 2 bytes (for be16 value) cannot be really done by one i2 transfer, it must be done in two. Therefore I had to "undo" this change as it is breaking support for all those Realtek SFPs, including VSOL. That is why I also put Fixes tag into commit message as it is really fixing reading for VSOL SFP from address 0x51. I understand that you want to read __be16 val in sfp_hwmon_read_sensor() function via one i2c transfer to have "consistent" value, but it is impossible on these Realtek chips. I have already tested that sfp_hwmon_read_sensor() function see just zero on second byte when is trying to use 2-byte read via one i2c transfer. When it use two i2c transfers, one for low 8bits and one for high 8bits then reading is working. > The rest of the patch is fine; it's a shame the entire thing has > been spoilt by this extra addition that was not in the patch we had > been discussing off-list. Sorry for that. I really thought that it is obvious that this change needs to be undone and read must always use byte transfer if we detect these broken Realtek chips.
Re: [PATCH 3/4] net: sfp: assume that LOS is not implemented if both LOS normal and inverted is set
On Wednesday 30 December 2020 16:13:10 Russell King - ARM Linux admin wrote: > On Wed, Dec 30, 2020 at 04:47:54PM +0100, Pali Rohár wrote: > > Some GPON SFP modules (e.g. Ubiquiti U-Fiber Instant) have set both > > SFP_OPTIONS_LOS_INVERTED and SFP_OPTIONS_LOS_NORMAL bits in their EEPROM. > > > > Such combination of bits is meaningless so assume that LOS signal is not > > implemented. > > > > This patch fixes link carrier for GPON SFP module Ubiquiti U-Fiber Instant. > > > > Co-developed-by: Russell King > > Signed-off-by: Russell King > > No, this is not co-developed. The patch content is exactly what _I_ > sent you, only the commit description is your own. Sorry, in this case I misunderstood usage of this Co-developed-by tag. I will remove it in next iteration of patches. > > Signed-off-by: Pali Rohár > > --- > > drivers/net/phy/sfp.c | 36 ++-- > > 1 file changed, 22 insertions(+), 14 deletions(-) > > > > diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c > > index 73f3ecf15260..d47485ed239c 100644 > > --- a/drivers/net/phy/sfp.c > > +++ b/drivers/net/phy/sfp.c > > @@ -1475,15 +1475,19 @@ static void sfp_sm_link_down(struct sfp *sfp) > > > > static void sfp_sm_link_check_los(struct sfp *sfp) > > { > > - unsigned int los = sfp->state & SFP_F_LOS; > > + const __be16 los_inverted = cpu_to_be16(SFP_OPTIONS_LOS_INVERTED); > > + const __be16 los_normal = cpu_to_be16(SFP_OPTIONS_LOS_NORMAL); > > + __be16 los_options = sfp->id.ext.options & (los_inverted | los_normal); > > + bool los = false; > > > > /* If neither SFP_OPTIONS_LOS_INVERTED nor SFP_OPTIONS_LOS_NORMAL > > -* are set, we assume that no LOS signal is available. > > +* are set, we assume that no LOS signal is available. If both are > > +* set, we assume LOS is not implemented (and is meaningless.) > > */ > > - if (sfp->id.ext.options & cpu_to_be16(SFP_OPTIONS_LOS_INVERTED)) > > - los ^= SFP_F_LOS; > > - else if (!(sfp->id.ext.options & cpu_to_be16(SFP_OPTIONS_LOS_NORMAL))) > > - los = 0; > > + if (los_options == los_inverted) > > + los = !(sfp->state & SFP_F_LOS); > > + else if (los_options == los_normal) > > + los = !!(sfp->state & SFP_F_LOS); > > > > if (los) > > sfp_sm_next(sfp, SFP_S_WAIT_LOS, 0); > > @@ -1493,18 +1497,22 @@ static void sfp_sm_link_check_los(struct sfp *sfp) > > > > static bool sfp_los_event_active(struct sfp *sfp, unsigned int event) > > { > > - return (sfp->id.ext.options & cpu_to_be16(SFP_OPTIONS_LOS_INVERTED) && > > - event == SFP_E_LOS_LOW) || > > - (sfp->id.ext.options & cpu_to_be16(SFP_OPTIONS_LOS_NORMAL) && > > - event == SFP_E_LOS_HIGH); > > + const __be16 los_inverted = cpu_to_be16(SFP_OPTIONS_LOS_INVERTED); > > + const __be16 los_normal = cpu_to_be16(SFP_OPTIONS_LOS_NORMAL); > > + __be16 los_options = sfp->id.ext.options & (los_inverted | los_normal); > > + > > + return (los_options == los_inverted && event == SFP_E_LOS_LOW) || > > + (los_options == los_normal && event == SFP_E_LOS_HIGH); > > } > > > > static bool sfp_los_event_inactive(struct sfp *sfp, unsigned int event) > > { > > - return (sfp->id.ext.options & cpu_to_be16(SFP_OPTIONS_LOS_INVERTED) && > > - event == SFP_E_LOS_HIGH) || > > - (sfp->id.ext.options & cpu_to_be16(SFP_OPTIONS_LOS_NORMAL) && > > - event == SFP_E_LOS_LOW); > > + const __be16 los_inverted = cpu_to_be16(SFP_OPTIONS_LOS_INVERTED); > > + const __be16 los_normal = cpu_to_be16(SFP_OPTIONS_LOS_NORMAL); > > + __be16 los_options = sfp->id.ext.options & (los_inverted | los_normal); > > + > > + return (los_options == los_inverted && event == SFP_E_LOS_HIGH) || > > + (los_options == los_normal && event == SFP_E_LOS_LOW); > > } > > > > static void sfp_sm_fault(struct sfp *sfp, unsigned int next_state, bool > > warn) > > -- > > 2.20.1 > > > > > > -- > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ > FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Re: [PATCH 1/4] net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips
On Wed, Dec 30, 2020 at 05:56:34PM +0100, Pali Rohár wrote: > This change is really required for those Realtek chips. I thought that > it is obvious that from *both* addresses 0x50 and 0x51 can be read only > one byte at the same time. Reading 2 bytes (for be16 value) cannot be > really done by one i2 transfer, it must be done in two. Then these modules are even more broken than first throught, and quite simply it is pointless supporting the diagnostics on them because we can never read the values in an atomic way. It's also a violation of the SFF-8472 that _requires_ multi-byte reads to read these 16 byte values atomically. Reading them with individual byte reads results in a non-atomic read, and the 16-bit value can not be trusted to be correct. That is really not optional, no matter what any manufacturer says - if they claim the SFP MSAs allows it, they're quite simply talking out of a donkey's backside and you should dispose of the module in biohazard packaging. :) So no, I hadn't understood this from your emails, and as I say above, if this is the case, then we quite simply disable diagnostics on these modules since they are _highly_ noncompliant. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Re: [PATCH 3/4] net: sfp: assume that LOS is not implemented if both LOS normal and inverted is set
On Wed, Dec 30, 2020 at 05:57:58PM +0100, Pali Rohár wrote: > On Wednesday 30 December 2020 16:13:10 Russell King - ARM Linux admin wrote: > > On Wed, Dec 30, 2020 at 04:47:54PM +0100, Pali Rohár wrote: > > > Some GPON SFP modules (e.g. Ubiquiti U-Fiber Instant) have set both > > > SFP_OPTIONS_LOS_INVERTED and SFP_OPTIONS_LOS_NORMAL bits in their EEPROM. > > > > > > Such combination of bits is meaningless so assume that LOS signal is not > > > implemented. > > > > > > This patch fixes link carrier for GPON SFP module Ubiquiti U-Fiber > > > Instant. > > > > > > Co-developed-by: Russell King > > > Signed-off-by: Russell King > > > > No, this is not co-developed. The patch content is exactly what _I_ > > sent you, only the commit description is your own. > > Sorry, in this case I misunderstood usage of this Co-developed-by tag. > I will remove it in next iteration of patches. You need to mark me as the author of the code at the very least... > > > Signed-off-by: Pali Rohár > > > --- > > > drivers/net/phy/sfp.c | 36 ++-- > > > 1 file changed, 22 insertions(+), 14 deletions(-) > > > > > > diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c > > > index 73f3ecf15260..d47485ed239c 100644 > > > --- a/drivers/net/phy/sfp.c > > > +++ b/drivers/net/phy/sfp.c > > > @@ -1475,15 +1475,19 @@ static void sfp_sm_link_down(struct sfp *sfp) > > > > > > static void sfp_sm_link_check_los(struct sfp *sfp) > > > { > > > - unsigned int los = sfp->state & SFP_F_LOS; > > > + const __be16 los_inverted = cpu_to_be16(SFP_OPTIONS_LOS_INVERTED); > > > + const __be16 los_normal = cpu_to_be16(SFP_OPTIONS_LOS_NORMAL); > > > + __be16 los_options = sfp->id.ext.options & (los_inverted | los_normal); > > > + bool los = false; > > > > > > /* If neither SFP_OPTIONS_LOS_INVERTED nor SFP_OPTIONS_LOS_NORMAL > > > - * are set, we assume that no LOS signal is available. > > > + * are set, we assume that no LOS signal is available. If both are > > > + * set, we assume LOS is not implemented (and is meaningless.) > > >*/ > > > - if (sfp->id.ext.options & cpu_to_be16(SFP_OPTIONS_LOS_INVERTED)) > > > - los ^= SFP_F_LOS; > > > - else if (!(sfp->id.ext.options & cpu_to_be16(SFP_OPTIONS_LOS_NORMAL))) > > > - los = 0; > > > + if (los_options == los_inverted) > > > + los = !(sfp->state & SFP_F_LOS); > > > + else if (los_options == los_normal) > > > + los = !!(sfp->state & SFP_F_LOS); > > > > > > if (los) > > > sfp_sm_next(sfp, SFP_S_WAIT_LOS, 0); > > > @@ -1493,18 +1497,22 @@ static void sfp_sm_link_check_los(struct sfp *sfp) > > > > > > static bool sfp_los_event_active(struct sfp *sfp, unsigned int event) > > > { > > > - return (sfp->id.ext.options & cpu_to_be16(SFP_OPTIONS_LOS_INVERTED) && > > > - event == SFP_E_LOS_LOW) || > > > -(sfp->id.ext.options & cpu_to_be16(SFP_OPTIONS_LOS_NORMAL) && > > > - event == SFP_E_LOS_HIGH); > > > + const __be16 los_inverted = cpu_to_be16(SFP_OPTIONS_LOS_INVERTED); > > > + const __be16 los_normal = cpu_to_be16(SFP_OPTIONS_LOS_NORMAL); > > > + __be16 los_options = sfp->id.ext.options & (los_inverted | los_normal); > > > + > > > + return (los_options == los_inverted && event == SFP_E_LOS_LOW) || > > > +(los_options == los_normal && event == SFP_E_LOS_HIGH); > > > } > > > > > > static bool sfp_los_event_inactive(struct sfp *sfp, unsigned int event) > > > { > > > - return (sfp->id.ext.options & cpu_to_be16(SFP_OPTIONS_LOS_INVERTED) && > > > - event == SFP_E_LOS_HIGH) || > > > -(sfp->id.ext.options & cpu_to_be16(SFP_OPTIONS_LOS_NORMAL) && > > > - event == SFP_E_LOS_LOW); > > > + const __be16 los_inverted = cpu_to_be16(SFP_OPTIONS_LOS_INVERTED); > > > + const __be16 los_normal = cpu_to_be16(SFP_OPTIONS_LOS_NORMAL); > > > + __be16 los_options = sfp->id.ext.options & (los_inverted | los_normal); > > > + > > > + return (los_options == los_inverted && event == SFP_E_LOS_HIGH) || > > > +(los_options == los_normal && event == SFP_E_LOS_LOW); > > > } > > > > > > static void sfp_sm_fault(struct sfp *sfp, unsigned int next_state, bool > > > warn) > > > -- > > > 2.20.1 > > > > > > > > > > -- > > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ > > FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! > -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Re: [PATCH 2/4] net: sfp: allow to use also SFP modules which are detected as SFF
On Wednesday 30 December 2020 16:11:51 Russell King - ARM Linux admin wrote: > On Wed, Dec 30, 2020 at 04:47:53PM +0100, Pali Rohár wrote: > > Some GPON SFP modules (e.g. Ubiquiti U-Fiber Instant) have set SFF phys_id > > in their EEPROM. Kernel SFP subsystem currently does not allow to use > > modules detected as SFF. > > > > This change extends check for SFP modules so also those with SFF phys_id > > are allowed. With this change also GPON SFP module Ubiquiti U-Fiber Instant > > is recognized. > > I really don't want to do this for every single module out there. > It's likely that Ubiquiti do this crap as a vendor lock-in measure. > Let's make it specific to Ubiquiti modules _only_ until such time > that we know better. Ok. This module_supported() function is called in sfp_sm_mod_probe() function. Current code is: /* Check whether we support this module */ if (!sfp->type->module_supported(&id)) { dev_err(sfp->dev, "module is not supported - phys id 0x%02x 0x%02x\n", sfp->id.base.phys_id, sfp->id.base.phys_ext_id); return -EINVAL; } Do you want to change code to something like this? /* Check whether we support this module */ if (!sfp->type->module_supported(&id) && (memcmp(id.base.vendor_name, "UBNT", 16) || memcmp(id.base.vendor_pn, "UF-INSTANT ", 16))) dev_err(sfp->dev, "module is not supported - phys id 0x%02x 0x%02x\n", sfp->id.base.phys_id, sfp->id.base.phys_ext_id); return -EINVAL; } Or do you have a better idea how to skip that module_supported check for this UBNT SFP?
Re: [PATCH 1/4] net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips
On Wed, Dec 30, 2020 at 05:05:46PM +, Russell King - ARM Linux admin wrote: > On Wed, Dec 30, 2020 at 05:56:34PM +0100, Pali Rohár wrote: > > This change is really required for those Realtek chips. I thought that > > it is obvious that from *both* addresses 0x50 and 0x51 can be read only > > one byte at the same time. Reading 2 bytes (for be16 value) cannot be > > really done by one i2 transfer, it must be done in two. > > Then these modules are even more broken than first throught, and > quite simply it is pointless supporting the diagnostics on them > because we can never read the values in an atomic way. > > It's also a violation of the SFF-8472 that _requires_ multi-byte reads > to read these 16 byte values atomically. Reading them with individual > byte reads results in a non-atomic read, and the 16-bit value can not > be trusted to be correct. Hi Pali I have to agree with Russell here. I would rather have no diagnostics than untrustable diagnostics. The only way this is going to be accepted is if the manufacture says that reading the first byte of a word snapshots the second byte as well in an atomic way and returns that snapshot on the second read. But i highly doubt that happens, given how bad these SFPs are. Andrew
Re: [PATCH 3/4] net: sfp: assume that LOS is not implemented if both LOS normal and inverted is set
On Wed, Dec 30, 2020 at 05:06:23PM +, Russell King - ARM Linux admin wrote: > On Wed, Dec 30, 2020 at 05:57:58PM +0100, Pali Rohár wrote: > > On Wednesday 30 December 2020 16:13:10 Russell King - ARM Linux admin wrote: > > > On Wed, Dec 30, 2020 at 04:47:54PM +0100, Pali Rohár wrote: > > > > Some GPON SFP modules (e.g. Ubiquiti U-Fiber Instant) have set both > > > > SFP_OPTIONS_LOS_INVERTED and SFP_OPTIONS_LOS_NORMAL bits in their > > > > EEPROM. > > > > > > > > Such combination of bits is meaningless so assume that LOS signal is not > > > > implemented. > > > > > > > > This patch fixes link carrier for GPON SFP module Ubiquiti U-Fiber > > > > Instant. > > > > > > > > Co-developed-by: Russell King > > > > Signed-off-by: Russell King > > > > > > No, this is not co-developed. The patch content is exactly what _I_ > > > sent you, only the commit description is your own. > > > > Sorry, in this case I misunderstood usage of this Co-developed-by tag. > > I will remove it in next iteration of patches. > > You need to mark me as the author of the code at the very least... Hi Pali You also need to keep your own Signed-off-by, since the patch is coming through you. So basically, git commit --am --author="Russell King " and then two Signed-off-by: lines. Andrew
Re: [PATCH 2/4] net: sfp: allow to use also SFP modules which are detected as SFF
On Wed, 30 Dec 2020 18:06:52 +0100 Pali Rohár wrote: > if (!sfp->type->module_supported(&id) && > (memcmp(id.base.vendor_name, "UBNT", 16) || >memcmp(id.base.vendor_pn, "UF-INSTANT ", 16))) I would rather add a quirk member (bitfield) to the sfp structure and do something like this if (!sfp->type->module_supported(&id) && !(sfp->quirks & SFP_QUIRK_BAD_PHYS_ID)) or maybe put this check into the module_supported method. Marek
Re: [PATCH 1/4] net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips
On Wednesday 30 December 2020 17:05:46 Russell King - ARM Linux admin wrote: > On Wed, Dec 30, 2020 at 05:56:34PM +0100, Pali Rohár wrote: > > This change is really required for those Realtek chips. I thought that > > it is obvious that from *both* addresses 0x50 and 0x51 can be read only > > one byte at the same time. Reading 2 bytes (for be16 value) cannot be > > really done by one i2 transfer, it must be done in two. > > Then these modules are even more broken than first throught, and > quite simply it is pointless supporting the diagnostics on them > because we can never read the values in an atomic way. They are broken in a way that neither holy water help them... But from diagnostic 0x51 address we can read at least 8bit registers in atomic way :-) > It's also a violation of the SFF-8472 that _requires_ multi-byte reads > to read these 16 byte values atomically. Reading them with individual > byte reads results in a non-atomic read, and the 16-bit value can not > be trusted to be correct. > > That is really not optional, no matter what any manufacturer says - if > they claim the SFP MSAs allows it, they're quite simply talking out of > a donkey's backside and you should dispose of the module in biohazard > packaging. :) > > So no, I hadn't understood this from your emails, and as I say above, > if this is the case, then we quite simply disable diagnostics on these > modules since they are _highly_ noncompliant. We have just two options: Disable 2 (and more) bytes reads from 0x51 address and therefore disable sfp_hwmon_read_sensor() function. Or allow 2 bytes non-atomic reads and allow at least semi-correct values for hwmon. I guess that upper 8bits would not change between two single byte i2c transfers too much (when they are done immediately one by one). But in any case we really need to set read operation for this eeprom to one byte.
Re: [PATCH 3/4] net: sfp: assume that LOS is not implemented if both LOS normal and inverted is set
On Wednesday 30 December 2020 18:17:41 Andrew Lunn wrote: > On Wed, Dec 30, 2020 at 05:06:23PM +, Russell King - ARM Linux admin > wrote: > > On Wed, Dec 30, 2020 at 05:57:58PM +0100, Pali Rohár wrote: > > > On Wednesday 30 December 2020 16:13:10 Russell King - ARM Linux admin > > > wrote: > > > > On Wed, Dec 30, 2020 at 04:47:54PM +0100, Pali Rohár wrote: > > > > > Some GPON SFP modules (e.g. Ubiquiti U-Fiber Instant) have set both > > > > > SFP_OPTIONS_LOS_INVERTED and SFP_OPTIONS_LOS_NORMAL bits in their > > > > > EEPROM. > > > > > > > > > > Such combination of bits is meaningless so assume that LOS signal is > > > > > not > > > > > implemented. > > > > > > > > > > This patch fixes link carrier for GPON SFP module Ubiquiti U-Fiber > > > > > Instant. > > > > > > > > > > Co-developed-by: Russell King > > > > > Signed-off-by: Russell King > > > > > > > > No, this is not co-developed. The patch content is exactly what _I_ > > > > sent you, only the commit description is your own. > > > > > > Sorry, in this case I misunderstood usage of this Co-developed-by tag. > > > I will remove it in next iteration of patches. > > > > You need to mark me as the author of the code at the very least... > > Hi Pali > > You also need to keep your own Signed-off-by, since the patch is > coming through you. > > So basically, git commit --am --author="Russell King > " > and then two Signed-off-by: lines. Got it, thank you!
Re: [PATCH 1/4] net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips
On Wednesday 30 December 2020 18:13:15 Andrew Lunn wrote: > On Wed, Dec 30, 2020 at 05:05:46PM +, Russell King - ARM Linux admin > wrote: > > On Wed, Dec 30, 2020 at 05:56:34PM +0100, Pali Rohár wrote: > > > This change is really required for those Realtek chips. I thought that > > > it is obvious that from *both* addresses 0x50 and 0x51 can be read only > > > one byte at the same time. Reading 2 bytes (for be16 value) cannot be > > > really done by one i2 transfer, it must be done in two. > > > > Then these modules are even more broken than first throught, and > > quite simply it is pointless supporting the diagnostics on them > > because we can never read the values in an atomic way. > > > > It's also a violation of the SFF-8472 that _requires_ multi-byte reads > > to read these 16 byte values atomically. Reading them with individual > > byte reads results in a non-atomic read, and the 16-bit value can not > > be trusted to be correct. > > Hi Pali > > I have to agree with Russell here. I would rather have no diagnostics > than untrustable diagnostics. Ok! So should we completely skip hwmon_device_register_with_info() call if (i2c_block_size < 2) ? > The only way this is going to be accepted is if the manufacture says > that reading the first byte of a word snapshots the second byte as > well in an atomic way and returns that snapshot on the second > read. But i highly doubt that happens, given how bad these SFPs are. I do not think that manufacture says something. I think that they even do not know that their Realtek chips are completely broken. I can imagine that vendor just says: it is working in our branded boxes with SFP cages and if it does not work in your kernel then problem is with your custom kernel and we do not care about 3rd parties.
Re: [PATCH v2] btf: support ints larger than 128 bits
On 12/19/20 8:36 AM, Sean Young wrote: clang supports arbitrary length ints using the _ExtInt extension. This can be useful to hold very large values, e.g. 256 bit or 512 bit types. Larger types (e.g. 1024 bits) are possible but I am unaware of a use case for these. This requires the _ExtInt extension enabled in clang, which is under review. Link: https://clang.llvm.org/docs/LanguageExtensions.html#extended-integer-types Link: https://reviews.llvm.org/D93103 Signed-off-by: Sean Young --- changes since v2: - added tests as suggested by Yonghong Song - added kernel pretty-printer Documentation/bpf/btf.rst | 4 +- include/uapi/linux/btf.h | 2 +- kernel/bpf/btf.c | 54 +- tools/bpf/bpftool/btf_dumper.c| 40 ++ tools/include/uapi/linux/btf.h| 2 +- tools/lib/bpf/btf.c | 2 +- tools/testing/selftests/bpf/Makefile | 3 +- tools/testing/selftests/bpf/prog_tests/btf.c | 3 +- .../selftests/bpf/progs/test_btf_extint.c | 50 ++ tools/testing/selftests/bpf/test_extint.py| 535 ++ For easier review, maybe you can break this patch into a patch series like below? patch 1 (kernel related changes and doc) kernel/bpf/btf.c, include/uapi/linux/btf.h, tools/include/uapi/linux/btf.h Documentation/bpf/btf.rst patch 2 (libbpf support) tools/lib/bpf/btf.c patch 3 (bpftool support) tools/bpf/bpftool/btf_dumper.c patch 4 (testing) rest files 10 files changed, 679 insertions(+), 16 deletions(-) create mode 100644 tools/testing/selftests/bpf/progs/test_btf_extint.c create mode 100755 tools/testing/selftests/bpf/test_extint.py diff --git a/Documentation/bpf/btf.rst b/Documentation/bpf/btf.rst index 44dc789de2b4..784f1743dbc7 100644 --- a/Documentation/bpf/btf.rst +++ b/Documentation/bpf/btf.rst @@ -132,7 +132,7 @@ The following sections detail encoding of each kind. #define BTF_INT_ENCODING(VAL) (((VAL) & 0x0f00) >> 24) #define BTF_INT_OFFSET(VAL) (((VAL) & 0x00ff) >> 16) - #define BTF_INT_BITS(VAL) ((VAL) & 0x00ff) + #define BTF_INT_BITS(VAL) ((VAL) & 0x03ff) The ``BTF_INT_ENCODING`` has the following attributes:: @@ -147,7 +147,7 @@ pretty print. At most one encoding can be specified for the int type. The ``BTF_INT_BITS()`` specifies the number of actual bits held by this int type. For example, a 4-bit bitfield encodes ``BTF_INT_BITS()`` equals to 4. The ``btf_type.size * 8`` must be equal to or greater than ``BTF_INT_BITS()`` -for the type. The maximum value of ``BTF_INT_BITS()`` is 128. +for the type. The maximum value of ``BTF_INT_BITS()`` is 512. The ``BTF_INT_OFFSET()`` specifies the starting bit offset to calculate values for this int. For example, a bitfield struct member has: diff --git a/include/uapi/linux/btf.h b/include/uapi/linux/btf.h index 5a667107ad2c..1696fd02b302 100644 --- a/include/uapi/linux/btf.h +++ b/include/uapi/linux/btf.h @@ -84,7 +84,7 @@ struct btf_type { */ #define BTF_INT_ENCODING(VAL) (((VAL) & 0x0f00) >> 24) #define BTF_INT_OFFSET(VAL) (((VAL) & 0x00ff) >> 16) -#define BTF_INT_BITS(VAL) ((VAL) & 0x00ff) +#define BTF_INT_BITS(VAL) ((VAL) & 0x03ff) /* Attributes stored in the BTF_INT_ENCODING */ #define BTF_INT_SIGNED(1 << 0) diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 8d6bdb4f4d61..44bc17207e9b 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -166,7 +166,8 @@ * */ -#define BITS_PER_U128 (sizeof(u64) * BITS_PER_BYTE * 2) +#define BITS_PER_U128 128 +#define BITS_PER_U512 512 #define BITS_PER_BYTE_MASK (BITS_PER_BYTE - 1) #define BITS_PER_BYTE_MASKED(bits) ((bits) & BITS_PER_BYTE_MASK) #define BITS_ROUNDDOWN_BYTES(bits) ((bits) >> 3) @@ -1907,9 +1908,9 @@ static int btf_int_check_member(struct btf_verifier_env *env, nr_copy_bits = BTF_INT_BITS(int_data) + BITS_PER_BYTE_MASKED(struct_bits_off); - if (nr_copy_bits > BITS_PER_U128) { + if (nr_copy_bits > BITS_PER_U512) { btf_verifier_log_member(env, struct_type, member, - "nr_copy_bits exceeds 128"); + "nr_copy_bits exceeds 512"); return -EINVAL; } @@ -1963,9 +1964,9 @@ static int btf_int_check_kflag_member(struct btf_verifier_env *env, bytes_offset = BITS_ROUNDDOWN_BYTES(struct_bits_off); nr_copy_bits = nr_bits + BITS_PER_BYTE_MASKED(struct_bits_off); - if (nr_copy_bits > BITS_PER_U128) { + if (nr_copy_bits > BITS_PER_U512) { btf_verifier_log_member(env, struct_type, member, - "nr_copy_bits exceeds 128"); + "nr_copy_bits exceeds 512"); return -EINVAL; }
[PATCH net] r8169: work around power-saving bug on some chip versions
A user reported failing network with RTL8168dp (a quite rare chip version). Realtek confirmed that few chip versions suffer from a PLL power-down hw bug. Fixes: 07df5bd874f0 ("r8169: power down chip in probe") Signed-off-by: Heiner Kallweit --- Note that since the original change source file r8169.c was renamed to r8169_main.c. Also the name of the affected functions has changed from r8168_pll_power_down/up to rtl_pll_power_down/up. --- drivers/net/ethernet/realtek/r8169_main.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c index 46d8510b2..a569abe7f 100644 --- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -2207,7 +2207,8 @@ static void rtl_pll_power_down(struct rtl8169_private *tp) } switch (tp->mac_version) { - case RTL_GIGA_MAC_VER_25 ... RTL_GIGA_MAC_VER_33: + case RTL_GIGA_MAC_VER_25 ... RTL_GIGA_MAC_VER_26: + case RTL_GIGA_MAC_VER_32 ... RTL_GIGA_MAC_VER_33: case RTL_GIGA_MAC_VER_37: case RTL_GIGA_MAC_VER_39: case RTL_GIGA_MAC_VER_43: @@ -2233,7 +2234,8 @@ static void rtl_pll_power_down(struct rtl8169_private *tp) static void rtl_pll_power_up(struct rtl8169_private *tp) { switch (tp->mac_version) { - case RTL_GIGA_MAC_VER_25 ... RTL_GIGA_MAC_VER_33: + case RTL_GIGA_MAC_VER_25 ... RTL_GIGA_MAC_VER_26: + case RTL_GIGA_MAC_VER_32 ... RTL_GIGA_MAC_VER_33: case RTL_GIGA_MAC_VER_37: case RTL_GIGA_MAC_VER_39: case RTL_GIGA_MAC_VER_43: -- 2.29.2
Re: [PATCH v2] xfrm: Fix wraparound in xfrm_policy_addr_delta()
Visa Hankala wrote: > Use three-way comparison for address components to avoid integer > wraparound in the result of xfrm_policy_addr_delta(). This ensures > that the search trees are built and traversed correctly. > > Treat IPv4 and IPv6 similarly by returning 0 when prefixlen == 0. > Prefix /0 has only one equivalence class. Acked-by: Florian Westphal
Re: [PATCH 1/4] net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips
On Wed, Dec 30, 2020 at 06:43:07PM +0100, Pali Rohár wrote: > On Wednesday 30 December 2020 18:13:15 Andrew Lunn wrote: > > Hi Pali > > > > I have to agree with Russell here. I would rather have no diagnostics > > than untrustable diagnostics. > > Ok! > > So should we completely skip hwmon_device_register_with_info() call > if (i2c_block_size < 2) ? I don't think that alone is sufficient - there's also the matter of ethtool -m which will dump that information as well, and we don't want to offer it to userspace in an unreliable form. For reference, here is what SFF-8472 which defines the diagnostics, says about this: To guarantee coherency of the diagnostic monitoring data, the host is required to retrieve any multi-byte fields from the diagnostic monitoring data structure (IE: Rx Power MSB - byte 104 in A2h, Rx Power LSB - byte 105 in A2h) by the use of a single two-byte read sequence across the two-wire interface interface. The transceiver is required to ensure that any multi-byte fields which are updated with diagnostic monitoring data (e.g. Rx Power MSB - byte 104 in A2h, Rx Power LSB - byte 105 in A2h) must have this update done in a fashion which guarantees coherency and consistency of the data. In other words, the update of a multi-byte field by the transceiver must not occur such that a partially updated multi-byte field can be transferred to the host. Also, the transceiver shall not update a multi-byte field within the structure during the transfer of that multi-byte field to the host, such that partially updated data would be transferred to the host. The first paragraph is extremely definitive in how these fields shall be read atomically - by a _single_ two-byte read sequence. From what you are telling us, these modules do not support that. Therefore, by definition, they do *not* support proper and reliable reporting of diagnostic data, and are non-conformant with the SFP MSAs. So, they are basically broken, and the diagnostics can't be used to retrieve data that can be said to be useful. > I do not think that manufacture says something. I think that they even > do not know that their Realtek chips are completely broken. Oh, they do know. I had a response from CarlitoxxPro passed to me, which was: That is a behavior related on how your router/switch try to read the EEPROM, as described in the datasheet of the GPON ONU SFP, the EEPROM can be read in Sequential Single-Byte mode, not in Multi-byte mode as you router do, basically, your router is trying to read the full a0h table in a single call, and retrieve a null response. that is normal and not affect the operations of the GPON ONU SFP, because these values are informational only. so the Software for your router should be able to read in Single-Byte mode to read the content of the EEPROM in concordance to SFF-8431 which totally misses the point that it is /not/ up to the module to choose whether multi-byte reads are supported or not. If they bothered to gain a proper understanding of the MSAs, they would have noticed that the device on 0xA0 is required to behave as an Atmel AT24Cxx EEPROM. The following from INF-8074i, which is the very first definition of the SFP form factor modules: The SFP serial ID provides access to sophisticated identification information that describes the transceiver's capabilities, standard interfaces, manufacturer, and other information. The serial interface uses the 2-wire serial CMOS E2PROM protocol defined for the ATMEL AT24C01A/02/04 family of components. As they took less than one working day to provide the above response, I suspect they know full well that there's a problem - and it likely affects other "routers" as well. They're also confused about their SFF specifications. SFF-8431 is: "SFP+ 10 Gb/s and Low Speed Electrical Interface" which is not the correct specification for a 1Gbps module. > I can imagine that vendor just says: it is working in our branded boxes > with SFP cages and if it does not work in your kernel then problem is > with your custom kernel and we do not care about 3rd parties. Which shows why it's pointless producing an EEPROM validation tool that runs under Linux (as has been your suggestion). They won't use it, since their testing only goes as far as "does it work in our product?" -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
[RFC PATCH v3 10/12] net: group skb_shinfo zerocopy related bits together.
From: Jonathan Lemon In preparation for expanded zerocopy (TX and RX), move the zerocopy related bits out of tx_flags into their own flag word. Signed-off-by: Jonathan Lemon --- drivers/net/tap.c | 3 +-- drivers/net/tun.c | 3 +-- drivers/net/xen-netback/interface.c | 4 +-- include/linux/skbuff.h | 38 - net/core/skbuff.c | 9 +++ net/ipv4/tcp.c | 2 +- net/kcm/kcmsock.c | 4 +-- 7 files changed, 32 insertions(+), 31 deletions(-) diff --git a/drivers/net/tap.c b/drivers/net/tap.c index 3f51f3766d18..f7a19d9b7c27 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -723,8 +723,7 @@ static ssize_t tap_get_user(struct tap_queue *q, void *msg_control, /* copy skb_ubuf_info for callback when skb has no error */ if (zerocopy) { skb_shinfo(skb)->destructor_arg = msg_control; - skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY; - skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG; + skb_shinfo(skb)->flags |= SKBFL_ZEROCOPY_FRAG; } else if (msg_control) { struct ubuf_info *uarg = msg_control; uarg->callback(NULL, uarg, false); diff --git a/drivers/net/tun.c b/drivers/net/tun.c index b64e2cc85751..dd9edbd72ae8 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1815,8 +1815,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, /* copy skb_ubuf_info for callback when skb has no error */ if (zerocopy) { skb_shinfo(skb)->destructor_arg = msg_control; - skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY; - skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG; + skb_shinfo(skb)->flags |= SKBFL_ZEROCOPY_FRAG; } else if (msg_control) { struct ubuf_info *uarg = msg_control; uarg->callback(NULL, uarg, false); diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c index acb786d8b1d8..08b0e3d0b7eb 100644 --- a/drivers/net/xen-netback/interface.c +++ b/drivers/net/xen-netback/interface.c @@ -47,7 +47,7 @@ /* Number of bytes allowed on the internal guest Rx queue. */ #define XENVIF_RX_QUEUE_BYTES (XEN_NETIF_RX_RING_SIZE/2 * PAGE_SIZE) -/* This function is used to set SKBTX_DEV_ZEROCOPY as well as +/* This function is used to set SKBFL_ZEROCOPY_ENABLE as well as * increasing the inflight counter. We need to increase the inflight * counter because core driver calls into xenvif_zerocopy_callback * which calls xenvif_skb_zerocopy_complete. @@ -55,7 +55,7 @@ void xenvif_skb_zerocopy_prepare(struct xenvif_queue *queue, struct sk_buff *skb) { - skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY; + skb_shinfo(skb)->flags |= SKBFL_ZEROCOPY_ENABLE; atomic_inc(&queue->inflight_packets); } diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 3c34c75ab004..66fde9a7b851 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -430,28 +430,32 @@ enum { /* device driver is going to provide hardware time stamp */ SKBTX_IN_PROGRESS = 1 << 2, - /* device driver supports TX zero-copy buffers */ - SKBTX_DEV_ZEROCOPY = 1 << 3, - /* generate wifi status information (where possible) */ SKBTX_WIFI_STATUS = 1 << 4, - /* This indicates at least one fragment might be overwritten -* (as in vmsplice(), sendfile() ...) -* If we need to compute a TX checksum, we'll need to copy -* all frags to avoid possible bad checksum -*/ - SKBTX_SHARED_FRAG = 1 << 5, - /* generate software time stamp when entering packet scheduling */ SKBTX_SCHED_TSTAMP = 1 << 6, }; -#define SKBTX_ZEROCOPY_FRAG(SKBTX_DEV_ZEROCOPY | SKBTX_SHARED_FRAG) #define SKBTX_ANY_SW_TSTAMP(SKBTX_SW_TSTAMP| \ SKBTX_SCHED_TSTAMP) #define SKBTX_ANY_TSTAMP (SKBTX_HW_TSTAMP | SKBTX_ANY_SW_TSTAMP) +/* Definitions for flags in struct skb_shared_info */ +enum { + /* use zcopy routines */ + SKBFL_ZEROCOPY_ENABLE = BIT(0), + + /* This indicates at least one fragment might be overwritten +* (as in vmsplice(), sendfile() ...) +* If we need to compute a TX checksum, we'll need to copy +* all frags to avoid possible bad checksum +*/ + SKBFL_SHARED_FRAG = BIT(1), +}; + +#define SKBFL_ZEROCOPY_FRAG(SKBFL_ZEROCOPY_ENABLE | SKBFL_SHARED_FRAG) + /* * The callback notifies userspace to release buffers when skb DMA is done in * lower device, the skb last reference should be 0 when calling this. @@ -506,7 +510,7 @@ int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb, * the end of the header data, ie. at skb->end. */ struct skb_shared_info { - __u8
[RFC PATCH v3 05/12] skbuff: replace sock_zerocopy_get with skb_zcopy_get
From: Jonathan Lemon Rename the get routines for consistency. Signed-off-by: Jonathan Lemon --- include/linux/skbuff.h | 12 ++-- net/core/skbuff.c | 2 +- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index a6c86839035b..5b8a53ab51fd 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -491,11 +491,6 @@ struct ubuf_info *sock_zerocopy_alloc(struct sock *sk, size_t size); struct ubuf_info *sock_zerocopy_realloc(struct sock *sk, size_t size, struct ubuf_info *uarg); -static inline void sock_zerocopy_get(struct ubuf_info *uarg) -{ - refcount_inc(&uarg->refcnt); -} - void sock_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref); void sock_zerocopy_callback(struct ubuf_info *uarg, bool success); @@ -1441,6 +1436,11 @@ static inline struct ubuf_info *skb_zcopy(struct sk_buff *skb) return is_zcopy ? skb_uarg(skb) : NULL; } +static inline void skb_zcopy_get(struct ubuf_info *uarg) +{ + refcount_inc(&uarg->refcnt); +} + static inline void skb_zcopy_set(struct sk_buff *skb, struct ubuf_info *uarg, bool *have_ref) { @@ -1448,7 +1448,7 @@ static inline void skb_zcopy_set(struct sk_buff *skb, struct ubuf_info *uarg, if (unlikely(have_ref && *have_ref)) *have_ref = false; else - sock_zerocopy_get(uarg); + skb_zcopy_get(uarg); skb_shinfo(skb)->destructor_arg = uarg; skb_shinfo(skb)->tx_flags |= SKBTX_ZEROCOPY_FRAG; } diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 0e028825367a..00f195908e79 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1163,7 +1163,7 @@ struct ubuf_info *sock_zerocopy_realloc(struct sock *sk, size_t size, /* no extra ref when appending to datagram (MSG_MORE) */ if (sk->sk_type == SOCK_STREAM) - sock_zerocopy_get(uarg); + skb_zcopy_get(uarg); return uarg; } -- 2.24.1
[RFC PATCH v3 06/12] skbuff: Add skb parameter to the ubuf zerocopy callback
From: Jonathan Lemon Add an optional skb parameter to the zerocopy callback parameter, which is passed down from skb_zcopy_clear(). This gives access to the original skb, which is needed for upcoming RX zero-copy error handling. Signed-off-by: Jonathan Lemon --- drivers/net/tap.c | 2 +- drivers/net/tun.c | 2 +- drivers/net/xen-netback/common.h | 3 ++- drivers/net/xen-netback/netback.c | 5 +++-- drivers/vhost/net.c | 3 ++- include/linux/skbuff.h| 17 - net/core/skbuff.c | 3 ++- 7 files changed, 19 insertions(+), 16 deletions(-) diff --git a/drivers/net/tap.c b/drivers/net/tap.c index 1f4bdd94407a..3f51f3766d18 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -727,7 +727,7 @@ static ssize_t tap_get_user(struct tap_queue *q, void *msg_control, skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG; } else if (msg_control) { struct ubuf_info *uarg = msg_control; - uarg->callback(uarg, false); + uarg->callback(NULL, uarg, false); } if (tap) { diff --git a/drivers/net/tun.c b/drivers/net/tun.c index fbed05ae7b0f..b64e2cc85751 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1819,7 +1819,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG; } else if (msg_control) { struct ubuf_info *uarg = msg_control; - uarg->callback(uarg, false); + uarg->callback(NULL, uarg, false); } skb_reset_network_header(skb); diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h index 8ee24e351bdc..4a16d6e33c09 100644 --- a/drivers/net/xen-netback/common.h +++ b/drivers/net/xen-netback/common.h @@ -399,7 +399,8 @@ void xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff *skb); void xenvif_carrier_on(struct xenvif *vif); /* Callback from stack when TX packet can be released */ -void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success); +void xenvif_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *ubuf, + bool zerocopy_success); /* Unmap a pending page and release it back to the guest */ void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx); diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index bc3421d14576..19a27dce79d2 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -1091,7 +1091,7 @@ static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct sk_buff *s uarg = skb_shinfo(skb)->destructor_arg; /* increase inflight counter to offset decrement in callback */ atomic_inc(&queue->inflight_packets); - uarg->callback(uarg, true); + uarg->callback(NULL, uarg, true); skb_shinfo(skb)->destructor_arg = NULL; /* Fill the skb with the new (local) frags. */ @@ -1228,7 +1228,8 @@ static int xenvif_tx_submit(struct xenvif_queue *queue) return work_done; } -void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success) +void xenvif_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *ubuf, + bool zerocopy_success) { unsigned long flags; pending_ring_idx_t index; diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 531a00d703cd..bf28d0b75c1b 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -381,7 +381,8 @@ static void vhost_zerocopy_signal_used(struct vhost_net *net, } } -static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success) +static void vhost_zerocopy_callback(struct sk_buff *skb, + struct ubuf_info *ubuf, bool success) { struct vhost_net_ubuf_ref *ubufs = ubuf->ctx; struct vhost_virtqueue *vq = ubufs->vq; diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 5b8a53ab51fd..b23c3b4b3209 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -461,7 +461,8 @@ enum { * The desc field is used to track userspace buffer index. */ struct ubuf_info { - void (*callback)(struct ubuf_info *, bool zerocopy_success); + void (*callback)(struct sk_buff *, struct ubuf_info *, +bool zerocopy_success); union { struct { unsigned long desc; @@ -493,7 +494,8 @@ struct ubuf_info *sock_zerocopy_realloc(struct sock *sk, size_t size, void sock_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref); -void sock_zerocopy_callback(struct ubuf_info *uarg, bool success); +void sock_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *uarg, + bool success); int skb_zerocopy_iter_dgram(struct sk_buff *skb, struct msghdr *msg, int len); int
[RFC PATCH v3 03/12] skbuff: Push status and refcounts into sock_zerocopy_callback
From: Jonathan Lemon Before this change, the caller of sock_zerocopy_callback would need to save the zerocopy status, decrement and check the refcount, and then call the callback function - the callback was only invoked when the refcount reached zero. Now, the caller just passes the status into the callback function, which saves the status and handles its own refcounts. This makes the behavior of the sock_zerocopy_callback identical to the tpacket and vhost callbacks. Signed-off-by: Jonathan Lemon --- include/linux/skbuff.h | 3 --- net/core/skbuff.c | 14 +++--- 2 files changed, 11 insertions(+), 6 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 3ca8d7c7b30c..52e96c35f5af 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1479,9 +1479,6 @@ static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy) if (uarg) { if (skb_zcopy_is_nouarg(skb)) { /* no notification callback */ - } else if (uarg->callback == sock_zerocopy_callback) { - uarg->zerocopy = uarg->zerocopy && zerocopy; - sock_zerocopy_put(uarg); } else { uarg->callback(uarg, zerocopy); } diff --git a/net/core/skbuff.c b/net/core/skbuff.c index d88963f47f7d..8c18940723ff 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1194,7 +1194,7 @@ static bool skb_zerocopy_notify_extend(struct sk_buff *skb, u32 lo, u16 len) return true; } -void sock_zerocopy_callback(struct ubuf_info *uarg, bool success) +static void __sock_zerocopy_callback(struct ubuf_info *uarg) { struct sk_buff *tail, *skb = skb_from_uarg(uarg); struct sock_exterr_skb *serr; @@ -1222,7 +1222,7 @@ void sock_zerocopy_callback(struct ubuf_info *uarg, bool success) serr->ee.ee_origin = SO_EE_ORIGIN_ZEROCOPY; serr->ee.ee_data = hi; serr->ee.ee_info = lo; - if (!success) + if (!uarg->zerocopy) serr->ee.ee_code |= SO_EE_CODE_ZEROCOPY_COPIED; q = &sk->sk_error_queue; @@ -1241,11 +1241,19 @@ void sock_zerocopy_callback(struct ubuf_info *uarg, bool success) consume_skb(skb); sock_put(sk); } + +void sock_zerocopy_callback(struct ubuf_info *uarg, bool success) +{ + uarg->zerocopy = uarg->zerocopy & success; + + if (refcount_dec_and_test(&uarg->refcnt)) + __sock_zerocopy_callback(uarg); +} EXPORT_SYMBOL_GPL(sock_zerocopy_callback); void sock_zerocopy_put(struct ubuf_info *uarg) { - if (uarg && refcount_dec_and_test(&uarg->refcnt)) + if (uarg) uarg->callback(uarg, uarg->zerocopy); } EXPORT_SYMBOL_GPL(sock_zerocopy_put); -- 2.24.1
[RFC PATCH v3 01/12] skbuff: remove unused skb_zcopy_abort function
From: Jonathan Lemon skb_zcopy_abort() has no in-tree consumers, remove it. Signed-off-by: Jonathan Lemon Acked-by: Willem de Bruijn --- include/linux/skbuff.h | 11 --- 1 file changed, 11 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 333bcdc39635..3ca8d7c7b30c 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1490,17 +1490,6 @@ static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy) } } -/* Abort a zerocopy operation and revert zckey on error in send syscall */ -static inline void skb_zcopy_abort(struct sk_buff *skb) -{ - struct ubuf_info *uarg = skb_zcopy(skb); - - if (uarg) { - sock_zerocopy_put_abort(uarg, false); - skb_shinfo(skb)->tx_flags &= ~SKBTX_ZEROCOPY_FRAG; - } -} - static inline void skb_mark_not_on_list(struct sk_buff *skb) { skb->next = NULL; -- 2.24.1
[RFC PATCH v3 07/12] skbuff: Call sock_zerocopy_put_abort from skb_zcopy_put_abort
From: Jonathan Lemon The sock_zerocopy_put_abort function contains logic which is specific to the current zerocopy implementation. Add a wrapper which checks the callback and dispatches apppropriately. Signed-off-by: Jonathan Lemon --- include/linux/skbuff.h | 10 ++ net/core/skbuff.c | 12 +--- net/ipv4/ip_output.c | 3 +-- net/ipv4/tcp.c | 2 +- net/ipv6/ip6_output.c | 3 +-- 5 files changed, 18 insertions(+), 12 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index b23c3b4b3209..9f7393167f0a 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1478,6 +1478,16 @@ static inline void skb_zcopy_put(struct ubuf_info *uarg) uarg->callback(NULL, uarg, true); } +static inline void skb_zcopy_put_abort(struct ubuf_info *uarg, bool have_uref) +{ + if (uarg) { + if (uarg->callback == sock_zerocopy_callback) + sock_zerocopy_put_abort(uarg, have_uref); + else if (have_uref) + skb_zcopy_put(uarg); + } +} + /* Release a reference on a zerocopy structure */ static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy_success) { diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 89130b21d9f0..5b9cd528d6a6 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1254,15 +1254,13 @@ EXPORT_SYMBOL_GPL(sock_zerocopy_callback); void sock_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref) { - if (uarg) { - struct sock *sk = skb_from_uarg(uarg)->sk; + struct sock *sk = skb_from_uarg(uarg)->sk; - atomic_dec(&sk->sk_zckey); - uarg->len--; + atomic_dec(&sk->sk_zckey); + uarg->len--; - if (have_uref) - skb_zcopy_put(uarg); - } + if (have_uref) + sock_zerocopy_callback(NULL, uarg, true); } EXPORT_SYMBOL_GPL(sock_zerocopy_put_abort); diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 879b76ae4435..65f2299fd682 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1230,8 +1230,7 @@ static int __ip_append_data(struct sock *sk, error_efault: err = -EFAULT; error: - if (uarg) - sock_zerocopy_put_abort(uarg, extra_uref); + skb_zcopy_put_abort(uarg, extra_uref); cork->length -= length; IP_INC_STATS(sock_net(sk), IPSTATS_MIB_OUTDISCARDS); refcount_add(wmem_alloc_delta, &sk->sk_wmem_alloc); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 298a1fae841c..fb58215972ba 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1440,7 +1440,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) if (copied + copied_syn) goto out; out_err: - sock_zerocopy_put_abort(uarg, true); + skb_zcopy_put_abort(uarg, true); err = sk_stream_error(sk, flags, err); /* make sure we wake any epoll edge trigger waiter */ if (unlikely(tcp_rtx_and_write_queues_empty(sk) && err == -EAGAIN)) { diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 749ad72386b2..c8c87891533a 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1715,8 +1715,7 @@ static int __ip6_append_data(struct sock *sk, error_efault: err = -EFAULT; error: - if (uarg) - sock_zerocopy_put_abort(uarg, extra_uref); + skb_zcopy_put_abort(uarg, extra_uref); cork->length -= length; IP6_INC_STATS(sock_net(sk), rt->rt6i_idev, IPSTATS_MIB_OUTDISCARDS); refcount_add(wmem_alloc_delta, &sk->sk_wmem_alloc); -- 2.24.1
Re: [PATCH 2/4] net: sfp: allow to use also SFP modules which are detected as SFF
On Wed, Dec 30, 2020 at 06:27:07PM +0100, Marek Behún wrote: > On Wed, 30 Dec 2020 18:06:52 +0100 > Pali Rohár wrote: > > > if (!sfp->type->module_supported(&id) && > > (memcmp(id.base.vendor_name, "UBNT", 16) || > > memcmp(id.base.vendor_pn, "UF-INSTANT ", 16))) > > I would rather add a quirk member (bitfield) to the sfp structure and do > something like this > > if (!sfp->type->module_supported(&id) && > !(sfp->quirks & SFP_QUIRK_BAD_PHYS_ID)) > > or maybe put this check into the module_supported method. Sorry, definitely not. If you've ever looked at the SDHCI driver with its multiple "quirks" bitfields, doing this is a recipe for creating a very horrid hard to understand mess. What you suggest just results in yet more complexity. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
[PATCH iproute2] build: Fix link errors on some systems
Since moving get_rate() and get_size() from tc to lib, on some systems we fail to link because of missing the math lib. Move the link flag from tc makefile to the main makefile. ../lib/libutil.a(utils.o): In function `get_rate': utils.c:(.text+0x10dc): undefined reference to `floor' ../lib/libutil.a(utils.o): In function `get_size': utils.c:(.text+0x1394): undefined reference to `floor' ../lib/libutil.a(json_print.o): In function `sprint_size': json_print.c:(.text+0x14c0): undefined reference to `rint' json_print.c:(.text+0x14f4): undefined reference to `rint' json_print.c:(.text+0x157c): undefined reference to `rint' Fixes: f3be0e6366ac ("lib: Move get_rate(), get_rate64() from tc here") Fixes: 44396bdfcc0a ("lib: Move get_size() from tc here") Fixes: adbe5de96662 ("lib: Move sprint_size() from tc here, add print_size()") Signed-off-by: Roi Dayan --- Makefile| 1 + tc/Makefile | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/Makefile b/Makefile index e64c65992585..2a604ec58905 100644 --- a/Makefile +++ b/Makefile @@ -59,6 +59,7 @@ SUBDIRS=lib ip tc bridge misc netem genl tipc devlink rdma dcb man LIBNETLINK=../lib/libutil.a ../lib/libnetlink.a LDLIBS += $(LIBNETLINK) +LDFLAGS += -lm all: config.mk @set -e; \ diff --git a/tc/Makefile b/tc/Makefile index 5a517af20b7c..8d91900716c1 100644 --- a/tc/Makefile +++ b/tc/Makefile @@ -110,7 +110,7 @@ ifneq ($(TC_CONFIG_NO_XT),y) endif TCOBJ += $(TCMODULES) -LDLIBS += -L. -lm +LDLIBS += -L. ifeq ($(SHARED_LIBS),y) LDLIBS += -ldl -- 1.8.3.1
[RFC PATCH v3 04/12] skbuff: replace sock_zerocopy_put() with skb_zcopy_put()
From: Jonathan Lemon Replace sock_zerocopy_put with the generic skb_zcopy_put() function. Pass 'true' as the success argument, as this is identical to no change. Signed-off-by: Jonathan Lemon --- include/linux/skbuff.h | 7 ++- net/core/skbuff.c | 9 + net/ipv4/tcp.c | 2 +- 3 files changed, 8 insertions(+), 10 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 52e96c35f5af..a6c86839035b 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -496,7 +496,6 @@ static inline void sock_zerocopy_get(struct ubuf_info *uarg) refcount_inc(&uarg->refcnt); } -void sock_zerocopy_put(struct ubuf_info *uarg); void sock_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref); void sock_zerocopy_callback(struct ubuf_info *uarg, bool success); @@ -1471,6 +1470,12 @@ static inline void *skb_zcopy_get_nouarg(struct sk_buff *skb) return (void *)((uintptr_t) skb_shinfo(skb)->destructor_arg & ~0x1UL); } +static inline void skb_zcopy_put(struct ubuf_info *uarg) +{ + if (uarg) + uarg->callback(uarg, true); +} + /* Release a reference on a zerocopy structure */ static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy) { diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 8c18940723ff..0e028825367a 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1251,13 +1251,6 @@ void sock_zerocopy_callback(struct ubuf_info *uarg, bool success) } EXPORT_SYMBOL_GPL(sock_zerocopy_callback); -void sock_zerocopy_put(struct ubuf_info *uarg) -{ - if (uarg) - uarg->callback(uarg, uarg->zerocopy); -} -EXPORT_SYMBOL_GPL(sock_zerocopy_put); - void sock_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref) { if (uarg) { @@ -1267,7 +1260,7 @@ void sock_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref) uarg->len--; if (have_uref) - sock_zerocopy_put(uarg); + skb_zcopy_put(uarg); } } EXPORT_SYMBOL_GPL(sock_zerocopy_put_abort); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index ed42d2193c5c..298a1fae841c 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1429,7 +1429,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) tcp_push(sk, flags, mss_now, tp->nonagle, size_goal); } out_nopush: - sock_zerocopy_put(uarg); + skb_zcopy_put(uarg); return copied + copied_syn; do_error: -- 2.24.1
[RFC PATCH v3 00/12] Generic zcopy_* functions
From: Jonathan Lemon This is set of cleanup patches for zerocopy which are intended to allow a introduction of a different zerocopy implementation. The top level API will use the skb_zcopy_*() functions, while the current TCP specific zerocopy ends up using msg_zerocopy_*() calls. There should be no functional changes from these patches. v2->v3: Rename zc_flags to 'flags'. Use SKBFL_xxx naming, similar to the SKBTX_xx naming. Leave zerocopy_success naming alone. Reorder patches. v1->v2: Break changes to skb_zcopy_put into 3 patches, in order to make it easier to follow the changes. Add Willem's suggestion about renaming sock_zerocopy_ Patch 1: remove dead function Patch 2: simplify sock_zerocopy_put Patch 3: push status/refcounts into sock_zerocopy_callback Patch 4: replace sock_zerocopy_put with skb_zcopy_put Patch 5: rename sock_zerocopy_get Patch 6: Add an optional skb parameter to callback, allowing access to the attached skb from the callback. Patch 7: Add skb_zcopy_put_abort, and move zerocopy logic into the callback function. There unfortunately is still a check against the callback type here. Patch 8: Relocate skb_zcopy_clear() in skb_release_data() Patch 9: rename sock_zerocopy_ to msg_zerocopy_ Patch 10: Move zerocopy bits from tx_flags into flags for clarity. These bits will be used in the RX path in the future. Patch 11: Set the skb flags from the ubuf being attached, instead of a fixed value, allowing different initialization types. Patch 12: Replace open-coded assignments with skb_zcopy_init() Jonathan Lemon (12): skbuff: remove unused skb_zcopy_abort function skbuff: simplify sock_zerocopy_put skbuff: Push status and refcounts into sock_zerocopy_callback skbuff: replace sock_zerocopy_put() with skb_zcopy_put() skbuff: replace sock_zerocopy_get with skb_zcopy_get skbuff: Add skb parameter to the ubuf zerocopy callback skbuff: Call sock_zerocopy_put_abort from skb_zcopy_put_abort skbuff: Call skb_zcopy_clear() before unref'ing fragments skbuff: rename sock_zerocopy_* to msg_zerocopy_* net: group skb_shinfo zerocopy related bits together. skbuff: add flags to ubuf_info for ubuf setup tap/tun: add skb_zcopy_init() helper for initialization. drivers/net/tap.c | 6 +- drivers/net/tun.c | 6 +- drivers/net/xen-netback/common.h| 3 +- drivers/net/xen-netback/interface.c | 4 +- drivers/net/xen-netback/netback.c | 5 +- drivers/vhost/net.c | 4 +- include/linux/skbuff.h | 106 +++- net/core/skbuff.c | 65 - net/ipv4/ip_output.c| 5 +- net/ipv4/tcp.c | 8 +-- net/ipv6/ip6_output.c | 5 +- net/kcm/kcmsock.c | 4 +- 12 files changed, 113 insertions(+), 108 deletions(-) -- 2.24.1
[RFC PATCH v3 12/12] tap/tun: add skb_zcopy_init() helper for initialization.
From: Jonathan Lemon Replace direct assignments with skb_zcopy_init() for zerocopy cases where a new skb is initialized, without changing the reference counts. Signed-off-by: Jonathan Lemon --- drivers/net/tap.c | 3 +-- drivers/net/tun.c | 3 +-- drivers/vhost/net.c| 1 + include/linux/skbuff.h | 9 +++-- 4 files changed, 10 insertions(+), 6 deletions(-) diff --git a/drivers/net/tap.c b/drivers/net/tap.c index f7a19d9b7c27..3c652c8ac5ba 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -722,8 +722,7 @@ static ssize_t tap_get_user(struct tap_queue *q, void *msg_control, tap = rcu_dereference(q->tap); /* copy skb_ubuf_info for callback when skb has no error */ if (zerocopy) { - skb_shinfo(skb)->destructor_arg = msg_control; - skb_shinfo(skb)->flags |= SKBFL_ZEROCOPY_FRAG; + skb_zcopy_init(skb, msg_control); } else if (msg_control) { struct ubuf_info *uarg = msg_control; uarg->callback(NULL, uarg, false); diff --git a/drivers/net/tun.c b/drivers/net/tun.c index dd9edbd72ae8..7414e0584729 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1814,8 +1814,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, /* copy skb_ubuf_info for callback when skb has no error */ if (zerocopy) { - skb_shinfo(skb)->destructor_arg = msg_control; - skb_shinfo(skb)->flags |= SKBFL_ZEROCOPY_FRAG; + skb_zcopy_init(skb, msg_control); } else if (msg_control) { struct ubuf_info *uarg = msg_control; uarg->callback(NULL, uarg, false); diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index bf28d0b75c1b..5c722c4179a9 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -904,6 +904,7 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock) ubuf->callback = vhost_zerocopy_callback; ubuf->ctx = nvq->ubufs; ubuf->desc = nvq->upend_idx; + ubuf->flags = SKBFL_ZEROCOPY_FRAG; refcount_set(&ubuf->refcnt, 1); msg.msg_control = &ctl; ctl.type = TUN_MSG_UBUF; diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 58010df9d183..3ec8b83aca3e 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1448,6 +1448,12 @@ static inline void skb_zcopy_get(struct ubuf_info *uarg) refcount_inc(&uarg->refcnt); } +static inline void skb_zcopy_init(struct sk_buff *skb, struct ubuf_info *uarg) +{ + skb_shinfo(skb)->destructor_arg = uarg; + skb_shinfo(skb)->flags |= uarg->flags; +} + static inline void skb_zcopy_set(struct sk_buff *skb, struct ubuf_info *uarg, bool *have_ref) { @@ -1456,8 +1462,7 @@ static inline void skb_zcopy_set(struct sk_buff *skb, struct ubuf_info *uarg, *have_ref = false; else skb_zcopy_get(uarg); - skb_shinfo(skb)->destructor_arg = uarg; - skb_shinfo(skb)->flags |= uarg->flags; + skb_zcopy_init(skb, uarg); } } -- 2.24.1
[RFC PATCH v3 11/12] skbuff: add flags to ubuf_info for ubuf setup
From: Jonathan Lemon Currently, when an ubuf is attached to a new skb, the shared flags word is initialized to a fixed value. Instead of doing this, set the default flags in the ubuf, and have new skbs inherit from this default. This is needed when setting up different zerocopy types. Signed-off-by: Jonathan Lemon --- include/linux/skbuff.h | 3 ++- net/core/skbuff.c | 1 + 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 66fde9a7b851..58010df9d183 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -480,6 +480,7 @@ struct ubuf_info { }; }; refcount_t refcnt; + u8 flags; struct mmpin { struct user_struct *user; @@ -1456,7 +1457,7 @@ static inline void skb_zcopy_set(struct sk_buff *skb, struct ubuf_info *uarg, else skb_zcopy_get(uarg); skb_shinfo(skb)->destructor_arg = uarg; - skb_shinfo(skb)->flags |= SKBFL_ZEROCOPY_FRAG; + skb_shinfo(skb)->flags |= uarg->flags; } } diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 124e4752afb6..ee288af095f0 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1119,6 +1119,7 @@ struct ubuf_info *msg_zerocopy_alloc(struct sock *sk, size_t size) uarg->len = 1; uarg->bytelen = size; uarg->zerocopy = 1; + uarg->flags = SKBFL_ZEROCOPY_FRAG; refcount_set(&uarg->refcnt, 1); sock_hold(sk); -- 2.24.1
[RFC PATCH v3 09/12] skbuff: rename sock_zerocopy_* to msg_zerocopy_*
From: Jonathan Lemon At Willem's suggestion, rename the sock_zerocopy_* functions so that they match the MSG_ZEROCOPY flag, which makes it clear they are specific to this zerocopy implementation. Signed-off-by: Jonathan Lemon Acked-by: Willem de Bruijn --- include/linux/skbuff.h | 16 net/core/skbuff.c | 28 ++-- net/ipv4/ip_output.c | 2 +- net/ipv4/tcp.c | 2 +- net/ipv6/ip6_output.c | 2 +- 5 files changed, 25 insertions(+), 25 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 9f7393167f0a..3c34c75ab004 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -488,13 +488,13 @@ struct ubuf_info { int mm_account_pinned_pages(struct mmpin *mmp, size_t size); void mm_unaccount_pinned_pages(struct mmpin *mmp); -struct ubuf_info *sock_zerocopy_alloc(struct sock *sk, size_t size); -struct ubuf_info *sock_zerocopy_realloc(struct sock *sk, size_t size, - struct ubuf_info *uarg); +struct ubuf_info *msg_zerocopy_alloc(struct sock *sk, size_t size); +struct ubuf_info *msg_zerocopy_realloc(struct sock *sk, size_t size, + struct ubuf_info *uarg); -void sock_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref); +void msg_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref); -void sock_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *uarg, +void msg_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *uarg, bool success); int skb_zerocopy_iter_dgram(struct sk_buff *skb, struct msghdr *msg, int len); @@ -1481,8 +1481,8 @@ static inline void skb_zcopy_put(struct ubuf_info *uarg) static inline void skb_zcopy_put_abort(struct ubuf_info *uarg, bool have_uref) { if (uarg) { - if (uarg->callback == sock_zerocopy_callback) - sock_zerocopy_put_abort(uarg, have_uref); + if (uarg->callback == msg_zerocopy_callback) + msg_zerocopy_put_abort(uarg, have_uref); else if (have_uref) skb_zcopy_put(uarg); } @@ -2776,7 +2776,7 @@ static inline int skb_orphan_frags(struct sk_buff *skb, gfp_t gfp_mask) if (likely(!skb_zcopy(skb))) return 0; if (!skb_zcopy_is_nouarg(skb) && - skb_uarg(skb)->callback == sock_zerocopy_callback) + skb_uarg(skb)->callback == msg_zerocopy_callback) return 0; return skb_copy_ubufs(skb, gfp_mask); } diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 6d031ed99182..d520168accf9 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1094,7 +1094,7 @@ void mm_unaccount_pinned_pages(struct mmpin *mmp) } EXPORT_SYMBOL_GPL(mm_unaccount_pinned_pages); -struct ubuf_info *sock_zerocopy_alloc(struct sock *sk, size_t size) +struct ubuf_info *msg_zerocopy_alloc(struct sock *sk, size_t size) { struct ubuf_info *uarg; struct sk_buff *skb; @@ -1114,7 +1114,7 @@ struct ubuf_info *sock_zerocopy_alloc(struct sock *sk, size_t size) return NULL; } - uarg->callback = sock_zerocopy_callback; + uarg->callback = msg_zerocopy_callback; uarg->id = ((u32)atomic_inc_return(&sk->sk_zckey)) - 1; uarg->len = 1; uarg->bytelen = size; @@ -1124,15 +1124,15 @@ struct ubuf_info *sock_zerocopy_alloc(struct sock *sk, size_t size) return uarg; } -EXPORT_SYMBOL_GPL(sock_zerocopy_alloc); +EXPORT_SYMBOL_GPL(msg_zerocopy_alloc); static inline struct sk_buff *skb_from_uarg(struct ubuf_info *uarg) { return container_of((void *)uarg, struct sk_buff, cb); } -struct ubuf_info *sock_zerocopy_realloc(struct sock *sk, size_t size, - struct ubuf_info *uarg) +struct ubuf_info *msg_zerocopy_realloc(struct sock *sk, size_t size, + struct ubuf_info *uarg) { if (uarg) { const u32 byte_limit = 1 << 19; /* limit to a few TSO */ @@ -1171,9 +1171,9 @@ struct ubuf_info *sock_zerocopy_realloc(struct sock *sk, size_t size, } new_alloc: - return sock_zerocopy_alloc(sk, size); + return msg_zerocopy_alloc(sk, size); } -EXPORT_SYMBOL_GPL(sock_zerocopy_realloc); +EXPORT_SYMBOL_GPL(msg_zerocopy_realloc); static bool skb_zerocopy_notify_extend(struct sk_buff *skb, u32 lo, u16 len) { @@ -1195,7 +1195,7 @@ static bool skb_zerocopy_notify_extend(struct sk_buff *skb, u32 lo, u16 len) return true; } -static void __sock_zerocopy_callback(struct ubuf_info *uarg) +static void __msg_zerocopy_callback(struct ubuf_info *uarg) { struct sk_buff *tail, *skb = skb_from_uarg(uarg); struct sock_exterr_skb *serr; @@ -1243,17 +1243,17 @@ static void __sock_zerocopy_callback(struct ubuf_info *uarg) sock_put(sk); } -void sock_zerocopy_callback(struct sk_buff *skb, st
[RFC PATCH v3 08/12] skbuff: Call skb_zcopy_clear() before unref'ing fragments
From: Jonathan Lemon RX zerocopy fragment pages which are not allocated from the system page pool require special handling. Give the callback in skb_zcopy_clear() a chance to process them first. Signed-off-by: Jonathan Lemon --- net/core/skbuff.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 5b9cd528d6a6..6d031ed99182 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -605,13 +605,14 @@ static void skb_release_data(struct sk_buff *skb) &shinfo->dataref)) return; + skb_zcopy_clear(skb, true); + for (i = 0; i < shinfo->nr_frags; i++) __skb_frag_unref(&shinfo->frags[i]); if (shinfo->frag_list) kfree_skb_list(shinfo->frag_list); - skb_zcopy_clear(skb, true); skb_free_head(skb); } -- 2.24.1
[RFC PATCH v3 02/12] skbuff: simplify sock_zerocopy_put
From: Jonathan Lemon All 'struct ubuf_info' users should have a callback defined as of commit 0a4a060bb204 ("sock: fix zerocopy_success regression with msg_zerocopy"). Remove the dead code path to consume_skb(), which makes assumptions about how the structure was allocated. Signed-off-by: Jonathan Lemon Acked-by: Willem de Bruijn --- net/core/skbuff.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index f62cae3f75d8..d88963f47f7d 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1245,12 +1245,8 @@ EXPORT_SYMBOL_GPL(sock_zerocopy_callback); void sock_zerocopy_put(struct ubuf_info *uarg) { - if (uarg && refcount_dec_and_test(&uarg->refcnt)) { - if (uarg->callback) - uarg->callback(uarg, uarg->zerocopy); - else - consume_skb(skb_from_uarg(uarg)); - } + if (uarg && refcount_dec_and_test(&uarg->refcnt)) + uarg->callback(uarg, uarg->zerocopy); } EXPORT_SYMBOL_GPL(sock_zerocopy_put); -- 2.24.1
Re: [PATCH 1/4] net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips
On Wed, Dec 30, 2020 at 06:31:52PM +0100, Pali Rohár wrote: > On Wednesday 30 December 2020 17:05:46 Russell King - ARM Linux admin wrote: > > On Wed, Dec 30, 2020 at 05:56:34PM +0100, Pali Rohár wrote: > > > This change is really required for those Realtek chips. I thought that > > > it is obvious that from *both* addresses 0x50 and 0x51 can be read only > > > one byte at the same time. Reading 2 bytes (for be16 value) cannot be > > > really done by one i2 transfer, it must be done in two. > > > > Then these modules are even more broken than first throught, and > > quite simply it is pointless supporting the diagnostics on them > > because we can never read the values in an atomic way. > > They are broken in a way that neither holy water help them... > > But from diagnostic 0x51 address we can read at least 8bit registers in > atomic way :-) ... which doesn't fit the requirements. > > It's also a violation of the SFF-8472 that _requires_ multi-byte reads > > to read these 16 byte values atomically. Reading them with individual > > byte reads results in a non-atomic read, and the 16-bit value can not > > be trusted to be correct. > > > > That is really not optional, no matter what any manufacturer says - if > > they claim the SFP MSAs allows it, they're quite simply talking out of > > a donkey's backside and you should dispose of the module in biohazard > > packaging. :) > > > > So no, I hadn't understood this from your emails, and as I say above, > > if this is the case, then we quite simply disable diagnostics on these > > modules since they are _highly_ noncompliant. > > We have just two options: > > Disable 2 (and more) bytes reads from 0x51 address and therefore disable > sfp_hwmon_read_sensor() function. > > Or allow 2 bytes non-atomic reads and allow at least semi-correct values > for hwmon. I guess that upper 8bits would not change between two single > byte i2c transfers too much (when they are done immediately one by one). So when you read the temperature, and the MSB reads as the next higher value than the LSB, causing an error of 256, or vice versa causing an error of -256, which when scaled according to the factors causes a big error, that's acceptable. No, it isn't. If the data can't be read reliably, the data is useless. Consider a system that implements userspace monitoring for modules and checks the current values against pre-set thresholds - it suddenly gets a value that is outside of its alarm threshold due to this. It raises a false alarm. This is not good. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Re: [PATCH 1/4] net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips
> Ok! > > So should we completely skip hwmon_device_register_with_info() call > if (i2c_block_size < 2) ? Yep. That would be a nice simple test. But does ethtool -m still export the second page? That is also unreliable. Andrew
general protection fault in dst_dev_put (6)
Hello, syzbot found the following issue on: HEAD commit:40f78232 Merge tag 'pci-v5.11-fixes-1' of git://git.kernel.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=1731027750 kernel config: https://syzkaller.appspot.com/x/.config?x=9798d6898fec8a72 dashboard link: https://syzkaller.appspot.com/bug?extid=0f1d7ae54b1e86cb1b67 compiler: gcc (GCC) 10.1.0-syz 20200507 Unfortunately, I don't have any reproducer for this issue yet. IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+0f1d7ae54b1e86cb1...@syzkaller.appspotmail.com general protection fault, probably for non-canonical address 0xdc01: [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0008-0x000f] CPU: 0 PID: 8482 Comm: syz-fuzzer Not tainted 5.10.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:dst_dev_put+0x1d/0x220 net/core/dst.c:156 Code: a0 b1 fa e9 53 ff ff ff cc cc cc cc cc 41 54 55 53 48 89 fb e8 64 b2 6e fa 48 89 da 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 8b 01 00 00 48 8d 7b 3a 4c 8b 23 48 b8 00 00 00 RSP: 0018:c9007db8 EFLAGS: 00010203 RAX: dc00 RBX: 000f RCX: 0100 RDX: 0001 RSI: 87049d6c RDI: 000f RBP: R08: R09: R10: 87926f33 R11: 0001 R12: 0007 R13: fbfff1af7aec R14: 607f4607e398 R15: 000f FS: 00c0002a0e90() GS:8880b9c0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 00c000b37000 CR3: 19e32000 CR4: 001526f0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Call Trace: rt_fibinfo_free_cpus.part.0+0x118/0x180 net/ipv4/fib_semantics.c:202 rt_fibinfo_free_cpus net/ipv4/fib_semantics.c:194 [inline] fib_nh_common_release+0xfb/0x300 net/ipv4/fib_semantics.c:215 fib6_info_destroy_rcu+0x187/0x210 net/ipv6/ip6_fib.c:174 rcu_do_batch kernel/rcu/tree.c:2489 [inline] rcu_core+0x75d/0xf80 kernel/rcu/tree.c:2723 __do_softirq+0x2bc/0xa77 kernel/softirq.c:343 asm_call_irq_on_stack+0xf/0x20 __run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline] run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline] do_softirq_own_stack+0xaa/0xd0 arch/x86/kernel/irq_64.c:77 invoke_softirq kernel/softirq.c:226 [inline] __irq_exit_rcu+0x17f/0x200 kernel/softirq.c:420 irq_exit_rcu+0x5/0x20 kernel/softirq.c:432 sysvec_apic_timer_interrupt+0x4d/0x100 arch/x86/kernel/apic/apic.c:1096 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:628 RIP: 0010:stack_trace_consume_entry+0xd5/0x160 kernel/stacktrace.c:92 Code: 0f 85 92 00 00 00 8d 45 01 89 43 10 48 8b 03 48 8d 2c e8 48 b8 00 00 00 00 00 fc ff df 48 89 ea 48 c1 ea 03 80 3c 02 00 75 5c <48> 89 75 00 8b 43 08 39 43 10 0f 92 c0 48 83 c4 08 5b 5d c3 83 e8 RSP: 0018:c900011ef5d8 EFLAGS: 0246 RAX: dc00 RBX: c900011ef6b0 RCX: RDX: 19200023deeb RSI: 86f8c6ef RDI: c900011ef6bc RBP: c900011ef758 R08: 8e6efcd6 R09: 0001 R10: f5200023deca R11: 0001 R12: c900011ef6b0 R13: R14: 88802805c380 R15: 0280 arch_stack_walk+0x6d/0xe0 arch/x86/kernel/stacktrace.c:27 stack_trace_save+0x8c/0xc0 kernel/stacktrace.c:121 kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:46 [inline] set_alloc_info mm/kasan/common.c:401 [inline] kasan_kmalloc.constprop.0+0x7f/0xa0 mm/kasan/common.c:429 __kmalloc_reserve net/core/skbuff.c:142 [inline] __alloc_skb+0xae/0x5c0 net/core/skbuff.c:210 alloc_skb_fclone include/linux/skbuff.h:1149 [inline] sk_stream_alloc_skb+0x109/0xc30 net/ipv4/tcp.c:888 tcp_sendmsg_locked+0xc00/0x2da0 net/ipv4/tcp.c:1310 tcp_sendmsg+0x2b/0x40 net/ipv4/tcp.c:1459 inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:817 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:672 sock_write_iter+0x289/0x3c0 net/socket.c:999 call_write_iter include/linux/fs.h:1901 [inline] new_sync_write+0x426/0x650 fs/read_write.c:518 vfs_write+0x7dd/0xa80 fs/read_write.c:605 ksys_write+0x1ee/0x250 fs/read_write.c:658 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x4b117b Code: ff e9 69 ff ff ff cc cc cc cc cc cc cc cc cc e8 9b c2 f8 ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30 RSP: 002b:00c000575480 EFLAGS: 0206 ORIG_RAX: 0001 RAX: ffda RBX: 00c20800 RCX: 004b117b RDX: 00f0 RSI: 00c0a200 RDI: 0006 RBP: 00c00
KASAN: use-after-free Read in vlan_dev_real_dev
Hello, syzbot found the following issue on: HEAD commit:1f45dc22 ibmvnic: continue fatal error reset after passive.. git tree: net console output: https://syzkaller.appspot.com/x/log.txt?x=10dbe3ff50 kernel config: https://syzkaller.appspot.com/x/.config?x=9c6feb14e07e1220 dashboard link: https://syzkaller.appspot.com/bug?extid=944af49d74bdd9175f4b compiler: gcc (GCC) 10.1.0-syz 20200507 Unfortunately, I don't have any reproducer for this issue yet. IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+944af49d74bdd9175...@syzkaller.appspotmail.com == BUG: KASAN: use-after-free in is_vlan_dev include/linux/if_vlan.h:74 [inline] BUG: KASAN: use-after-free in vlan_dev_real_dev+0xf9/0x120 net/8021q/vlan_core.c:105 Read of size 4 at addr 8880263c222c by task kworker/u4:0/8 CPU: 1 PID: 8 Comm: kworker/u4:0 Not tainted 5.10.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: gid-cache-wq netdevice_event_work_handler Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 print_address_description.constprop.0.cold+0xae/0x4c8 mm/kasan/report.c:385 __kasan_report mm/kasan/report.c:545 [inline] kasan_report.cold+0x1f/0x37 mm/kasan/report.c:562 is_vlan_dev include/linux/if_vlan.h:74 [inline] vlan_dev_real_dev+0xf9/0x120 net/8021q/vlan_core.c:105 rdma_vlan_dev_real_dev include/rdma/ib_addr.h:267 [inline] is_eth_port_of_netdev_filter.part.0+0xe7/0x300 drivers/infiniband/core/roce_gid_mgmt.c:157 is_eth_port_of_netdev_filter+0x28/0x40 drivers/infiniband/core/roce_gid_mgmt.c:153 ib_enum_roce_netdev+0x18f/0x2b0 drivers/infiniband/core/device.c:2287 ib_enum_all_roce_netdevs+0xbd/0x130 drivers/infiniband/core/device.c:2316 netdevice_event_work_handler+0x9c/0x1b0 drivers/infiniband/core/roce_gid_mgmt.c:626 process_one_work+0x98d/0x1630 kernel/workqueue.c:2275 worker_thread+0x64c/0x1120 kernel/workqueue.c:2421 kthread+0x3b1/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296 Allocated by task 19697: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48 kasan_set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:461 kmalloc_node include/linux/slab.h:575 [inline] kvmalloc_node+0x61/0xf0 mm/util.c:575 kvmalloc include/linux/mm.h:773 [inline] kvzalloc include/linux/mm.h:781 [inline] alloc_netdev_mqs+0x97/0xea0 net/core/dev.c:10540 rtnl_create_link+0x219/0xad0 net/core/rtnetlink.c:3171 __rtnl_newlink+0xf9b/0x1750 net/core/rtnetlink.c:3433 rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3502 rtnetlink_rcv_msg+0x498/0xb80 net/core/rtnetlink.c:5564 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1330 netlink_sendmsg+0x907/0xe40 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:672 sys_sendmsg+0x6e8/0x810 net/socket.c:2345 ___sys_sendmsg+0xf3/0x170 net/socket.c:2399 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2432 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 19689: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48 kasan_set_track+0x1c/0x30 mm/kasan/common.c:56 kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:352 __kasan_slab_free+0x102/0x140 mm/kasan/common.c:422 slab_free_hook mm/slub.c:1544 [inline] slab_free_freelist_hook+0x5d/0x150 mm/slub.c:1577 slab_free mm/slub.c:3140 [inline] kfree+0xdb/0x3c0 mm/slub.c:4122 kvfree+0x42/0x50 mm/util.c:604 device_release+0x9f/0x240 drivers/base/core.c:1962 kobject_cleanup lib/kobject.c:705 [inline] kobject_release lib/kobject.c:736 [inline] kref_put include/linux/kref.h:65 [inline] kobject_put+0x1c8/0x540 lib/kobject.c:753 put_device+0x1b/0x30 drivers/base/core.c:3190 free_netdev+0x3d7/0x500 net/core/dev.c:10660 ppp_destroy_interface+0x2ab/0x340 drivers/net/ppp/ppp_generic.c:3358 ppp_release+0x1bf/0x240 drivers/net/ppp/ppp_generic.c:410 __fput+0x283/0x920 fs/file_table.c:280 task_work_run+0xdd/0x190 kernel/task_work.c:140 tracehook_notify_resume include/linux/tracehook.h:189 [inline] exit_to_user_mode_loop kernel/entry/common.c:174 [inline] exit_to_user_mode_prepare+0x1f0/0x200 kernel/entry/common.c:201 __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline] syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:302 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The buggy address belongs to the object at 8880263c2000 which belongs to the cache kmalloc-4k of size 4096 The buggy address is located 556 bytes inside of 4096-byte region [8880263c2000, 8880263c3000) The buggy address belongs to the page: page:52fa6739 refcount:1 mapcount:0 mapping:0
Re: [PATCH 1/4] net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips
On Wed, Dec 30, 2020 at 08:30:52PM +0100, Andrew Lunn wrote: > > Ok! > > > > So should we completely skip hwmon_device_register_with_info() call > > if (i2c_block_size < 2) ? > > Yep. That would be a nice simple test. > > But does ethtool -m still export the second page? That is also > unreliable. It will do, but we need to check how ethtool decides to request/dump that information - we probably need to make sfp_module_info() report that it's a ETH_MODULE_SFF_8079 style EEPROM, not the ETH_MODULE_SFF_8472 style. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Re: [PATCH 1/4] net: sfp: add workaround for Realtek RTL8672 and RTL9601C chips
> Which shows why it's pointless producing an EEPROM validation tool that > runs under Linux (as has been your suggestion). They won't use it, > since their testing only goes as far as "does it work in our product?" Yes, i would need SNIA to push for conformance testing, hold plug-testing events, and marketing that only devices which pass the conformance test are allowed to use the SNIA logo, etc. They do seem to have conformance testing for some of their storage standards, but nothing for SFPs. Andrew
Re: [PATCH 2/3] vhost/vsock: support for SOCK_SEQPACKET socket.
On Tue, Dec 29, 2020 at 02:06:33PM +0300, Arseny Krasnov wrote: > This patch simply adds transport ops and removes > ignore of non-stream type of packets. > > Signed-off-by: Arseny Krasnov How is this supposed to work? virtio vsock at the moment has byte level end to end credit accounting at the protocol level. I suspect some protocol changes involving more than this tweak would be needed to properly support anything that isn't a stream. > --- > drivers/vhost/vsock.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c > index a483cec31d5c..4a36ef1c52d0 100644 > --- a/drivers/vhost/vsock.c > +++ b/drivers/vhost/vsock.c > @@ -346,8 +346,7 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq, > return NULL; > } > > - if (le16_to_cpu(pkt->hdr.type) == VIRTIO_VSOCK_TYPE_STREAM) > - pkt->len = le32_to_cpu(pkt->hdr.len); > + pkt->len = le32_to_cpu(pkt->hdr.len); > > /* No payload */ > if (!pkt->len) > @@ -416,6 +415,9 @@ static struct virtio_transport vhost_transport = { > .stream_is_active = virtio_transport_stream_is_active, > .stream_allow = virtio_transport_stream_allow, > > + .seqpacket_seq_send_len = > virtio_transport_seqpacket_seq_send_len, > + .seqpacket_seq_get_len= > virtio_transport_seqpacket_seq_get_len, > + > .notify_poll_in = virtio_transport_notify_poll_in, > .notify_poll_out = virtio_transport_notify_poll_out, > .notify_recv_init = virtio_transport_notify_recv_init, > -- > 2.25.1