[PATCH net-next v2 01/10] drop_monitor: Split tracing enable / disable to different functions
From: Ido Schimmel Subsequent patches will need to enable / disable tracing based on the configured alerting mode. Reduce the nesting level and prepare for the introduction of this functionality by splitting the tracing enable / disable operations into two different functions. Signed-off-by: Ido Schimmel --- net/core/drop_monitor.c | 79 ++--- 1 file changed, 51 insertions(+), 28 deletions(-) diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c index 4deb86f990f1..8b9b0b899ebc 100644 --- a/net/core/drop_monitor.c +++ b/net/core/drop_monitor.c @@ -241,11 +241,58 @@ static void trace_napi_poll_hit(void *ignore, struct napi_struct *napi, rcu_read_unlock(); } +static int net_dm_trace_on_set(struct netlink_ext_ack *extack) +{ + int rc; + + if (!try_module_get(THIS_MODULE)) { + NL_SET_ERR_MSG_MOD(extack, "Failed to take reference on module"); + return -ENODEV; + } + + rc = register_trace_kfree_skb(trace_kfree_skb_hit, NULL); + if (rc) { + NL_SET_ERR_MSG_MOD(extack, "Failed to connect probe to kfree_skb() tracepoint"); + goto err_module_put; + } + + rc = register_trace_napi_poll(trace_napi_poll_hit, NULL); + if (rc) { + NL_SET_ERR_MSG_MOD(extack, "Failed to connect probe to napi_poll() tracepoint"); + goto err_unregister_trace; + } + + return 0; + +err_unregister_trace: + unregister_trace_kfree_skb(trace_kfree_skb_hit, NULL); +err_module_put: + module_put(THIS_MODULE); + return rc; +} + +static void net_dm_trace_off_set(void) +{ + struct dm_hw_stat_delta *new_stat, *temp; + + unregister_trace_napi_poll(trace_napi_poll_hit, NULL); + unregister_trace_kfree_skb(trace_kfree_skb_hit, NULL); + + tracepoint_synchronize_unregister(); + + list_for_each_entry_safe(new_stat, temp, &hw_stats_list, list) { + if (new_stat->dev == NULL) { + list_del_rcu(&new_stat->list); + kfree_rcu(new_stat, rcu); + } + } + + module_put(THIS_MODULE); +} + static int set_all_monitor_traces(int state, struct netlink_ext_ack *extack) { int rc = 0; - struct dm_hw_stat_delta *new_stat = NULL; - struct dm_hw_stat_delta *temp; if (state == trace_state) { NL_SET_ERR_MSG_MOD(extack, "Trace state already set to requested state"); @@ -254,34 +301,10 @@ static int set_all_monitor_traces(int state, struct netlink_ext_ack *extack) switch (state) { case TRACE_ON: - if (!try_module_get(THIS_MODULE)) { - NL_SET_ERR_MSG_MOD(extack, "Failed to take reference on module"); - rc = -ENODEV; - break; - } - - rc |= register_trace_kfree_skb(trace_kfree_skb_hit, NULL); - rc |= register_trace_napi_poll(trace_napi_poll_hit, NULL); + rc = net_dm_trace_on_set(extack); break; - case TRACE_OFF: - rc |= unregister_trace_kfree_skb(trace_kfree_skb_hit, NULL); - rc |= unregister_trace_napi_poll(trace_napi_poll_hit, NULL); - - tracepoint_synchronize_unregister(); - - /* -* Clean the device list -*/ - list_for_each_entry_safe(new_stat, temp, &hw_stats_list, list) { - if (new_stat->dev == NULL) { - list_del_rcu(&new_stat->list); - kfree_rcu(new_stat, rcu); - } - } - - module_put(THIS_MODULE); - + net_dm_trace_off_set(); break; default: rc = 1; -- 2.21.0
[PATCH net-next v2 00/10] drop_monitor: Capture dropped packets and metadata
From: Ido Schimmel So far drop monitor supported only one mode of operation in which a summary of recent packet drops is periodically sent to user space as a netlink event. The event only includes the drop location (program counter) and number of drops in the last interval. While this mode of operation allows one to understand if the system is dropping packets, it is not sufficient if a more detailed analysis is required. Both the packet itself and related metadata are missing. This patchset extends drop monitor with another mode of operation where the packet - potentially truncated - and metadata (e.g., drop location, timestamp, netdev) are sent to user space as a netlink event. Thanks to the extensible nature of netlink, more metadata can be added in the future. To avoid performing expensive operations in the context in which kfree_skb() is called, the dropped skbs are cloned and queued on per-CPU skb drop list. The list is then processed in process context (using a workqueue), where the netlink messages are allocated, prepared and finally sent to user space. A follow-up patchset will integrate drop monitor with devlink and allow the latter to call into drop monitor to report hardware drops. In the future, XDP drops can be added as well, thereby making drop monitor the go-to netlink channel for diagnosing all packet drops. Example usage with patched dropwatch [1] can be found here [2]. Example dissection of drop monitor netlink events with patched wireshark [3] can be found here [4]. I will submit both changes upstream after the kernel changes are accepted. Another change worth making is adding a dropmon pseudo interface to libpcap, similar to the nflog interface [5]. This will allow users to specifically listen on dropmon traffic instead of capturing all netlink packets via the nlmon netdev. Patches #1-#5 prepare the code towards the actual changes in later patches. Patch #6 adds another mode of operation to drop monitor in which the dropped packet itself is notified to user space along with metadata. Patch #7 allows users to truncate reported packets to a specific length, in case only the headers are of interest. The original length of the packet is added as metadata to the netlink notification. Patch #8 allows user to query the current configuration of drop monitor (e.g., alert mode, truncation length). Patches #9-#10 allow users to tune the length of the per-CPU skb drop list according to their needs. Changes since v1 [6]: * Add skb protocol as metadata. This allows user space to correctly dissect the packet instead of blindly assuming it is an Ethernet packet Changes since RFC [7]: * Limit the length of the per-CPU skb drop list and make it configurable * Do not use the hysteresis timer in packet alert mode * Introduce alert mode operations in a separate patch and only then introduce the new alert mode * Use 'skb->skb_iif' instead of 'skb->dev' because the latter is inside a union with 'dev_scratch' and therefore not guaranteed to point to a valid netdev * Return '-EBUSY' instead of '-EOPNOTSUPP' when trying to configure drop monitor while it is monitoring * Did not change schedule_work() in favor of schedule_work_on() as I did not observe a change in number of tail drops [1] https://github.com/idosch/dropwatch/tree/packet-mode [2] https://gist.github.com/idosch/3d524b887e16bc11b4b19e25c23dcc23#file-gistfile1-txt [3] https://github.com/idosch/wireshark/tree/drop-monitor-v2 [4] https://gist.github.com/idosch/3d524b887e16bc11b4b19e25c23dcc23#file-gistfile2-txt [5] https://github.com/the-tcpdump-group/libpcap/blob/master/pcap-netfilter-linux.c [6] https://patchwork.ozlabs.org/cover/1143443/ [7] https://patchwork.ozlabs.org/cover/1135226/ Ido Schimmel (10): drop_monitor: Split tracing enable / disable to different functions drop_monitor: Initialize timer and work item upon tracing enable drop_monitor: Reset per-CPU data before starting to trace drop_monitor: Require CAP_NET_ADMIN for drop monitor configuration drop_monitor: Add alert mode operations drop_monitor: Add packet alert mode drop_monitor: Allow truncation of dropped packets drop_monitor: Add a command to query current configuration drop_monitor: Make drop queue length configurable drop_monitor: Expose tail drop counter include/uapi/linux/net_dropmon.h | 51 +++ net/core/drop_monitor.c | 599 +-- 2 files changed, 613 insertions(+), 37 deletions(-) -- 2.21.0
[PATCH net-next v2 10/10] drop_monitor: Expose tail drop counter
From: Ido Schimmel Previous patch made the length of the per-CPU skb drop list configurable. Expose a counter that shows how many packets could not be enqueued to this list. This allows users determine the desired queue length. Signed-off-by: Ido Schimmel --- include/uapi/linux/net_dropmon.h | 10 +++ net/core/drop_monitor.c | 101 +++ 2 files changed, 111 insertions(+) diff --git a/include/uapi/linux/net_dropmon.h b/include/uapi/linux/net_dropmon.h index 1d0bdb1ba954..405b31cbf723 100644 --- a/include/uapi/linux/net_dropmon.h +++ b/include/uapi/linux/net_dropmon.h @@ -56,6 +56,8 @@ enum { NET_DM_CMD_PACKET_ALERT, NET_DM_CMD_CONFIG_GET, NET_DM_CMD_CONFIG_NEW, + NET_DM_CMD_STATS_GET, + NET_DM_CMD_STATS_NEW, _NET_DM_CMD_MAX, }; @@ -80,6 +82,7 @@ enum net_dm_attr { NET_DM_ATTR_TRUNC_LEN, /* u32 */ NET_DM_ATTR_ORIG_LEN, /* u32 */ NET_DM_ATTR_QUEUE_LEN, /* u32 */ + NET_DM_ATTR_STATS, /* nested */ __NET_DM_ATTR_MAX, NET_DM_ATTR_MAX = __NET_DM_ATTR_MAX - 1 @@ -103,4 +106,11 @@ enum { NET_DM_ATTR_PORT_MAX = __NET_DM_ATTR_PORT_MAX - 1 }; +enum { + NET_DM_ATTR_STATS_DROPPED, /* u64 */ + + __NET_DM_ATTR_STATS_MAX, + NET_DM_ATTR_STATS_MAX = __NET_DM_ATTR_STATS_MAX - 1 +}; + #endif diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c index eb3c34d69ea9..39e094907391 100644 --- a/net/core/drop_monitor.c +++ b/net/core/drop_monitor.c @@ -51,12 +51,18 @@ static int trace_state = TRACE_OFF; */ static DEFINE_MUTEX(net_dm_mutex); +struct net_dm_stats { + u64 dropped; + struct u64_stats_sync syncp; +}; + struct per_cpu_dm_data { spinlock_t lock; /* Protects 'skb' and 'send_timer' */ struct sk_buff *skb; struct sk_buff_head drop_queue; struct work_struct dm_alert_work; struct timer_list send_timer; + struct net_dm_stats stats; }; struct dm_hw_stat_delta { @@ -300,6 +306,9 @@ static void net_dm_packet_trace_kfree_skb_hit(void *ignore, unlock_free: spin_unlock_irqrestore(&data->drop_queue.lock, flags); + u64_stats_update_begin(&data->stats.syncp); + data->stats.dropped++; + u64_stats_update_end(&data->stats.syncp); consume_skb(nskb); } @@ -732,6 +741,93 @@ static int net_dm_cmd_config_get(struct sk_buff *skb, struct genl_info *info) return rc; } +static void net_dm_stats_read(struct net_dm_stats *stats) +{ + int cpu; + + memset(stats, 0, sizeof(*stats)); + for_each_possible_cpu(cpu) { + struct per_cpu_dm_data *data = &per_cpu(dm_cpu_data, cpu); + struct net_dm_stats *cpu_stats = &data->stats; + unsigned int start; + u64 dropped; + + do { + start = u64_stats_fetch_begin_irq(&cpu_stats->syncp); + dropped = cpu_stats->dropped; + } while (u64_stats_fetch_retry_irq(&cpu_stats->syncp, start)); + + stats->dropped += dropped; + } +} + +static int net_dm_stats_put(struct sk_buff *msg) +{ + struct net_dm_stats stats; + struct nlattr *attr; + + net_dm_stats_read(&stats); + + attr = nla_nest_start(msg, NET_DM_ATTR_STATS); + if (!attr) + return -EMSGSIZE; + + if (nla_put_u64_64bit(msg, NET_DM_ATTR_STATS_DROPPED, + stats.dropped, NET_DM_ATTR_PAD)) + goto nla_put_failure; + + nla_nest_end(msg, attr); + + return 0; + +nla_put_failure: + nla_nest_cancel(msg, attr); + return -EMSGSIZE; +} + +static int net_dm_stats_fill(struct sk_buff *msg, struct genl_info *info) +{ + void *hdr; + int rc; + + hdr = genlmsg_put(msg, info->snd_portid, info->snd_seq, + &net_drop_monitor_family, 0, NET_DM_CMD_STATS_NEW); + if (!hdr) + return -EMSGSIZE; + + rc = net_dm_stats_put(msg); + if (rc) + goto nla_put_failure; + + genlmsg_end(msg, hdr); + + return 0; + +nla_put_failure: + genlmsg_cancel(msg, hdr); + return -EMSGSIZE; +} + +static int net_dm_cmd_stats_get(struct sk_buff *skb, struct genl_info *info) +{ + struct sk_buff *msg; + int rc; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return -ENOMEM; + + rc = net_dm_stats_fill(msg, info); + if (rc) + goto free_msg; + + return genlmsg_reply(msg, info); + +free_msg: + nlmsg_free(msg); + return rc; +} + static int dropmon_net_event(struct notifier_block *ev_block, unsigned long event, void *ptr) { @@ -799,6 +895,10 @@ static const struct genl_ops dropmon_ops[] = {
[PATCH net-next v2 03/10] drop_monitor: Reset per-CPU data before starting to trace
From: Ido Schimmel The function reset_per_cpu_data() allocates and prepares a new skb for the summary netlink alert message ('NET_DM_CMD_ALERT'). The new skb is stored in the per-CPU 'data' variable and the old is returned. The function is invoked during module initialization and from the workqueue, before an alert is sent. This means that it is possible to receive an alert with stale data, if we stopped tracing when the hysteresis timer ('data->send_timer') was pending. Instead of invoking the function during module initialization, invoke it just before we start tracing and ensure we get a fresh skb. This also allows us to remove the calls to initialize the timer and the work item from the module initialization path, since both could have been triggered by the error paths of reset_per_cpu_data(). Signed-off-by: Ido Schimmel --- net/core/drop_monitor.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c index b266dc1660ed..1cf4988de591 100644 --- a/net/core/drop_monitor.c +++ b/net/core/drop_monitor.c @@ -252,9 +252,16 @@ static int net_dm_trace_on_set(struct netlink_ext_ack *extack) for_each_possible_cpu(cpu) { struct per_cpu_dm_data *data = &per_cpu(dm_cpu_data, cpu); + struct sk_buff *skb; INIT_WORK(&data->dm_alert_work, send_dm_alert); timer_setup(&data->send_timer, sched_send_work, 0); + /* Allocate a new per-CPU skb for the summary alert message and +* free the old one which might contain stale data from +* previous tracing. +*/ + skb = reset_per_cpu_data(data); + consume_skb(skb); } rc = register_trace_kfree_skb(trace_kfree_skb_hit, NULL); @@ -475,10 +482,7 @@ static int __init init_net_drop_monitor(void) for_each_possible_cpu(cpu) { data = &per_cpu(dm_cpu_data, cpu); - INIT_WORK(&data->dm_alert_work, send_dm_alert); - timer_setup(&data->send_timer, sched_send_work, 0); spin_lock_init(&data->lock); - reset_per_cpu_data(data); } goto out; -- 2.21.0
[PATCH net-next v2 04/10] drop_monitor: Require CAP_NET_ADMIN for drop monitor configuration
From: Ido Schimmel Currently, the configure command does not do anything but return an error. Subsequent patches will enable the command to change various configuration options such as alert mode and packet truncation. Similar to other netlink-based configuration channels, make sure only users with the CAP_NET_ADMIN capability set can execute this command. Signed-off-by: Ido Schimmel --- net/core/drop_monitor.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c index 1cf4988de591..cd2f3069f34e 100644 --- a/net/core/drop_monitor.c +++ b/net/core/drop_monitor.c @@ -409,6 +409,7 @@ static const struct genl_ops dropmon_ops[] = { .cmd = NET_DM_CMD_CONFIG, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .doit = net_dm_cmd_config, + .flags = GENL_ADMIN_PERM, }, { .cmd = NET_DM_CMD_START, -- 2.21.0
[PATCH net-next v2 08/10] drop_monitor: Add a command to query current configuration
From: Ido Schimmel Users should be able to query the current configuration of drop monitor before they start using it. Add a command to query the existing configuration which currently consists of alert mode and packet truncation length. Signed-off-by: Ido Schimmel --- include/uapi/linux/net_dropmon.h | 2 ++ net/core/drop_monitor.c | 48 2 files changed, 50 insertions(+) diff --git a/include/uapi/linux/net_dropmon.h b/include/uapi/linux/net_dropmon.h index 5cd7eb1f66ba..3b765a8428b5 100644 --- a/include/uapi/linux/net_dropmon.h +++ b/include/uapi/linux/net_dropmon.h @@ -54,6 +54,8 @@ enum { NET_DM_CMD_START, NET_DM_CMD_STOP, NET_DM_CMD_PACKET_ALERT, + NET_DM_CMD_CONFIG_GET, + NET_DM_CMD_CONFIG_NEW, _NET_DM_CMD_MAX, }; diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c index 9f884adaa85f..135638474ab8 100644 --- a/net/core/drop_monitor.c +++ b/net/core/drop_monitor.c @@ -676,6 +676,50 @@ static int net_dm_cmd_trace(struct sk_buff *skb, return -EOPNOTSUPP; } +static int net_dm_config_fill(struct sk_buff *msg, struct genl_info *info) +{ + void *hdr; + + hdr = genlmsg_put(msg, info->snd_portid, info->snd_seq, + &net_drop_monitor_family, 0, NET_DM_CMD_CONFIG_NEW); + if (!hdr) + return -EMSGSIZE; + + if (nla_put_u8(msg, NET_DM_ATTR_ALERT_MODE, net_dm_alert_mode)) + goto nla_put_failure; + + if (nla_put_u32(msg, NET_DM_ATTR_TRUNC_LEN, net_dm_trunc_len)) + goto nla_put_failure; + + genlmsg_end(msg, hdr); + + return 0; + +nla_put_failure: + genlmsg_cancel(msg, hdr); + return -EMSGSIZE; +} + +static int net_dm_cmd_config_get(struct sk_buff *skb, struct genl_info *info) +{ + struct sk_buff *msg; + int rc; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) + return -ENOMEM; + + rc = net_dm_config_fill(msg, info); + if (rc) + goto free_msg; + + return genlmsg_reply(msg, info); + +free_msg: + nlmsg_free(msg); + return rc; +} + static int dropmon_net_event(struct notifier_block *ev_block, unsigned long event, void *ptr) { @@ -738,6 +782,10 @@ static const struct genl_ops dropmon_ops[] = { .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, .doit = net_dm_cmd_trace, }, + { + .cmd = NET_DM_CMD_CONFIG_GET, + .doit = net_dm_cmd_config_get, + }, }; static int net_dm_nl_pre_doit(const struct genl_ops *ops, -- 2.21.0
[PATCH net-next v2 07/10] drop_monitor: Allow truncation of dropped packets
From: Ido Schimmel When sending dropped packets to user space it is not always necessary to copy the entire packet as usually only the headers are of interest. Allow user to specify the truncation length and add the original length of the packet as additional metadata to the netlink message. By default no truncation is performed. Signed-off-by: Ido Schimmel --- include/uapi/linux/net_dropmon.h | 2 ++ net/core/drop_monitor.c | 19 +++ 2 files changed, 21 insertions(+) diff --git a/include/uapi/linux/net_dropmon.h b/include/uapi/linux/net_dropmon.h index cfaaf75371b8..5cd7eb1f66ba 100644 --- a/include/uapi/linux/net_dropmon.h +++ b/include/uapi/linux/net_dropmon.h @@ -75,6 +75,8 @@ enum net_dm_attr { NET_DM_ATTR_PROTO, /* u16 */ NET_DM_ATTR_PAYLOAD,/* binary */ NET_DM_ATTR_PAD, + NET_DM_ATTR_TRUNC_LEN, /* u32 */ + NET_DM_ATTR_ORIG_LEN, /* u32 */ __NET_DM_ATTR_MAX, NET_DM_ATTR_MAX = __NET_DM_ATTR_MAX - 1 diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c index ba765832413b..9f884adaa85f 100644 --- a/net/core/drop_monitor.c +++ b/net/core/drop_monitor.c @@ -77,6 +77,7 @@ static unsigned long dm_hw_check_delta = 2*HZ; static LIST_HEAD(hw_stats_list); static enum net_dm_alert_mode net_dm_alert_mode = NET_DM_ALERT_MODE_SUMMARY; +static u32 net_dm_trunc_len; struct net_dm_alert_ops { void (*kfree_skb_probe)(void *ignore, struct sk_buff *skb, @@ -334,6 +335,8 @@ static size_t net_dm_packet_report_size(size_t payload_len) net_dm_in_port_size() + /* NET_DM_ATTR_TIMESTAMP */ nla_total_size(sizeof(struct timespec)) + + /* NET_DM_ATTR_ORIG_LEN */ + nla_total_size(sizeof(u32)) + /* NET_DM_ATTR_PROTO */ nla_total_size(sizeof(u16)) + /* NET_DM_ATTR_PAYLOAD */ @@ -391,6 +394,9 @@ static int net_dm_packet_report_fill(struct sk_buff *msg, struct sk_buff *skb, nla_put(msg, NET_DM_ATTR_TIMESTAMP, sizeof(ts), &ts)) goto nla_put_failure; + if (nla_put_u32(msg, NET_DM_ATTR_ORIG_LEN, skb->len)) + goto nla_put_failure; + if (!payload_len) goto out; @@ -429,6 +435,8 @@ static void net_dm_packet_report(struct sk_buff *skb) /* Ensure packet fits inside a single netlink attribute */ payload_len = min_t(size_t, skb->len, NET_DM_MAX_PACKET_SIZE); + if (net_dm_trunc_len) + payload_len = min_t(size_t, net_dm_trunc_len, payload_len); msg = nlmsg_new(net_dm_packet_report_size(payload_len), GFP_KERNEL); if (!msg) @@ -627,6 +635,14 @@ static int net_dm_alert_mode_set(struct genl_info *info) return 0; } +static void net_dm_trunc_len_set(struct genl_info *info) +{ + if (!info->attrs[NET_DM_ATTR_TRUNC_LEN]) + return; + + net_dm_trunc_len = nla_get_u32(info->attrs[NET_DM_ATTR_TRUNC_LEN]); +} + static int net_dm_cmd_config(struct sk_buff *skb, struct genl_info *info) { @@ -642,6 +658,8 @@ static int net_dm_cmd_config(struct sk_buff *skb, if (rc) return rc; + net_dm_trunc_len_set(info); + return 0; } @@ -700,6 +718,7 @@ static int dropmon_net_event(struct notifier_block *ev_block, static const struct nla_policy net_dm_nl_policy[NET_DM_ATTR_MAX + 1] = { [NET_DM_ATTR_UNSPEC] = { .strict_start_type = NET_DM_ATTR_UNSPEC + 1 }, [NET_DM_ATTR_ALERT_MODE] = { .type = NLA_U8 }, + [NET_DM_ATTR_TRUNC_LEN] = { .type = NLA_U32 }, }; static const struct genl_ops dropmon_ops[] = { -- 2.21.0
[PATCH net-next v2 09/10] drop_monitor: Make drop queue length configurable
From: Ido Schimmel In packet alert mode, each CPU holds a list of dropped skbs that need to be processed in process context and sent to user space. To avoid exhausting the system's memory the maximum length of this queue is currently set to 1000. Allow users to tune the length of this queue according to their needs. The configured length is reported to user space when drop monitor configuration is queried. Signed-off-by: Ido Schimmel --- include/uapi/linux/net_dropmon.h | 1 + net/core/drop_monitor.c | 19 --- 2 files changed, 17 insertions(+), 3 deletions(-) diff --git a/include/uapi/linux/net_dropmon.h b/include/uapi/linux/net_dropmon.h index 3b765a8428b5..1d0bdb1ba954 100644 --- a/include/uapi/linux/net_dropmon.h +++ b/include/uapi/linux/net_dropmon.h @@ -79,6 +79,7 @@ enum net_dm_attr { NET_DM_ATTR_PAD, NET_DM_ATTR_TRUNC_LEN, /* u32 */ NET_DM_ATTR_ORIG_LEN, /* u32 */ + NET_DM_ATTR_QUEUE_LEN, /* u32 */ __NET_DM_ATTR_MAX, NET_DM_ATTR_MAX = __NET_DM_ATTR_MAX - 1 diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c index 135638474ab8..eb3c34d69ea9 100644 --- a/net/core/drop_monitor.c +++ b/net/core/drop_monitor.c @@ -78,6 +78,7 @@ static LIST_HEAD(hw_stats_list); static enum net_dm_alert_mode net_dm_alert_mode = NET_DM_ALERT_MODE_SUMMARY; static u32 net_dm_trunc_len; +static u32 net_dm_queue_len = 1000; struct net_dm_alert_ops { void (*kfree_skb_probe)(void *ignore, struct sk_buff *skb, @@ -93,8 +94,6 @@ struct net_dm_skb_cb { #define NET_DM_SKB_CB(__skb) ((struct net_dm_skb_cb *)&((__skb)->cb[0])) -#define NET_DM_QUEUE_LEN 1000 - static struct sk_buff *reset_per_cpu_data(struct per_cpu_dm_data *data) { size_t al; @@ -289,7 +288,7 @@ static void net_dm_packet_trace_kfree_skb_hit(void *ignore, data = this_cpu_ptr(&dm_cpu_data); spin_lock_irqsave(&data->drop_queue.lock, flags); - if (skb_queue_len(&data->drop_queue) < NET_DM_QUEUE_LEN) + if (skb_queue_len(&data->drop_queue) < net_dm_queue_len) __skb_queue_tail(&data->drop_queue, nskb); else goto unlock_free; @@ -643,6 +642,14 @@ static void net_dm_trunc_len_set(struct genl_info *info) net_dm_trunc_len = nla_get_u32(info->attrs[NET_DM_ATTR_TRUNC_LEN]); } +static void net_dm_queue_len_set(struct genl_info *info) +{ + if (!info->attrs[NET_DM_ATTR_QUEUE_LEN]) + return; + + net_dm_queue_len = nla_get_u32(info->attrs[NET_DM_ATTR_QUEUE_LEN]); +} + static int net_dm_cmd_config(struct sk_buff *skb, struct genl_info *info) { @@ -660,6 +667,8 @@ static int net_dm_cmd_config(struct sk_buff *skb, net_dm_trunc_len_set(info); + net_dm_queue_len_set(info); + return 0; } @@ -691,6 +700,9 @@ static int net_dm_config_fill(struct sk_buff *msg, struct genl_info *info) if (nla_put_u32(msg, NET_DM_ATTR_TRUNC_LEN, net_dm_trunc_len)) goto nla_put_failure; + if (nla_put_u32(msg, NET_DM_ATTR_QUEUE_LEN, net_dm_queue_len)) + goto nla_put_failure; + genlmsg_end(msg, hdr); return 0; @@ -763,6 +775,7 @@ static const struct nla_policy net_dm_nl_policy[NET_DM_ATTR_MAX + 1] = { [NET_DM_ATTR_UNSPEC] = { .strict_start_type = NET_DM_ATTR_UNSPEC + 1 }, [NET_DM_ATTR_ALERT_MODE] = { .type = NLA_U8 }, [NET_DM_ATTR_TRUNC_LEN] = { .type = NLA_U32 }, + [NET_DM_ATTR_QUEUE_LEN] = { .type = NLA_U32 }, }; static const struct genl_ops dropmon_ops[] = { -- 2.21.0
[PATCH net-next v2 02/10] drop_monitor: Initialize timer and work item upon tracing enable
From: Ido Schimmel The timer and work item are currently initialized once during module init, but subsequent patches will need to associate different functions with the work item, based on the configured alert mode. Allow subsequent patches to make that change by initializing and de-initializing these objects during tracing enable and disable. This also guarantees that once the request to disable tracing returns, no more netlink notifications will be generated. Signed-off-by: Ido Schimmel --- net/core/drop_monitor.c | 24 +++- 1 file changed, 19 insertions(+), 5 deletions(-) diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c index 8b9b0b899ebc..b266dc1660ed 100644 --- a/net/core/drop_monitor.c +++ b/net/core/drop_monitor.c @@ -243,13 +243,20 @@ static void trace_napi_poll_hit(void *ignore, struct napi_struct *napi, static int net_dm_trace_on_set(struct netlink_ext_ack *extack) { - int rc; + int cpu, rc; if (!try_module_get(THIS_MODULE)) { NL_SET_ERR_MSG_MOD(extack, "Failed to take reference on module"); return -ENODEV; } + for_each_possible_cpu(cpu) { + struct per_cpu_dm_data *data = &per_cpu(dm_cpu_data, cpu); + + INIT_WORK(&data->dm_alert_work, send_dm_alert); + timer_setup(&data->send_timer, sched_send_work, 0); + } + rc = register_trace_kfree_skb(trace_kfree_skb_hit, NULL); if (rc) { NL_SET_ERR_MSG_MOD(extack, "Failed to connect probe to kfree_skb() tracepoint"); @@ -274,12 +281,23 @@ static int net_dm_trace_on_set(struct netlink_ext_ack *extack) static void net_dm_trace_off_set(void) { struct dm_hw_stat_delta *new_stat, *temp; + int cpu; unregister_trace_napi_poll(trace_napi_poll_hit, NULL); unregister_trace_kfree_skb(trace_kfree_skb_hit, NULL); tracepoint_synchronize_unregister(); + /* Make sure we do not send notifications to user space after request +* to stop tracing returns. +*/ + for_each_possible_cpu(cpu) { + struct per_cpu_dm_data *data = &per_cpu(dm_cpu_data, cpu); + + del_timer_sync(&data->send_timer); + cancel_work_sync(&data->dm_alert_work); + } + list_for_each_entry_safe(new_stat, temp, &hw_stats_list, list) { if (new_stat->dev == NULL) { list_del_rcu(&new_stat->list); @@ -481,14 +499,10 @@ static void exit_net_drop_monitor(void) /* * Because of the module_get/put we do in the trace state change path * we are guarnateed not to have any current users when we get here -* all we need to do is make sure that we don't have any running timers -* or pending schedule calls */ for_each_possible_cpu(cpu) { data = &per_cpu(dm_cpu_data, cpu); - del_timer_sync(&data->send_timer); - cancel_work_sync(&data->dm_alert_work); /* * At this point, we should have exclusive access * to this struct and can free the skb inside it -- 2.21.0
[PATCH net-next v2 06/10] drop_monitor: Add packet alert mode
From: Ido Schimmel So far drop monitor supported only one alert mode in which a summary of locations in which packets were recently dropped was sent to user space. This alert mode is sufficient in order to understand that packets were dropped, but lacks information to perform a more detailed analysis. Add a new alert mode in which the dropped packet itself is passed to user space along with metadata: The drop location (as program counter and resolved symbol), ingress netdevice and drop timestamp. More metadata can be added in the future. To avoid performing expensive operations in the context in which kfree_skb() is invoked (can be hard IRQ), the dropped skb is cloned and queued on per-CPU skb drop list. Then, in process context the netlink message is allocated, prepared and finally sent to user space. The per-CPU skb drop list is limited to 1000 skbs to prevent exhausting the system's memory. Subsequent patches will make this limit configurable and also add a counter that indicates how many skbs were tail dropped. Signed-off-by: Ido Schimmel --- include/uapi/linux/net_dropmon.h | 27 +++ net/core/drop_monitor.c | 280 ++- 2 files changed, 305 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/net_dropmon.h b/include/uapi/linux/net_dropmon.h index 0fecdedeb6ca..cfaaf75371b8 100644 --- a/include/uapi/linux/net_dropmon.h +++ b/include/uapi/linux/net_dropmon.h @@ -53,6 +53,7 @@ enum { NET_DM_CMD_CONFIG, NET_DM_CMD_START, NET_DM_CMD_STOP, + NET_DM_CMD_PACKET_ALERT, _NET_DM_CMD_MAX, }; @@ -63,12 +64,38 @@ enum { */ #define NET_DM_GRP_ALERT 1 +enum net_dm_attr { + NET_DM_ATTR_UNSPEC, + + NET_DM_ATTR_ALERT_MODE, /* u8 */ + NET_DM_ATTR_PC, /* u64 */ + NET_DM_ATTR_SYMBOL, /* string */ + NET_DM_ATTR_IN_PORT,/* nested */ + NET_DM_ATTR_TIMESTAMP, /* struct timespec */ + NET_DM_ATTR_PROTO, /* u16 */ + NET_DM_ATTR_PAYLOAD,/* binary */ + NET_DM_ATTR_PAD, + + __NET_DM_ATTR_MAX, + NET_DM_ATTR_MAX = __NET_DM_ATTR_MAX - 1 +}; + /** * enum net_dm_alert_mode - Alert mode. * @NET_DM_ALERT_MODE_SUMMARY: A summary of recent drops is sent to user space. + * @NET_DM_ALERT_MODE_PACKET: Each dropped packet is sent to user space along + *with metadata. */ enum net_dm_alert_mode { NET_DM_ALERT_MODE_SUMMARY, + NET_DM_ALERT_MODE_PACKET, +}; + +enum { + NET_DM_ATTR_PORT_NETDEV_IFINDEX,/* u32 */ + + __NET_DM_ATTR_PORT_MAX, + NET_DM_ATTR_PORT_MAX = __NET_DM_ATTR_PORT_MAX - 1 }; #endif diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c index 9cd2f662cb9e..ba765832413b 100644 --- a/net/core/drop_monitor.c +++ b/net/core/drop_monitor.c @@ -54,6 +54,7 @@ static DEFINE_MUTEX(net_dm_mutex); struct per_cpu_dm_data { spinlock_t lock; /* Protects 'skb' and 'send_timer' */ struct sk_buff *skb; + struct sk_buff_head drop_queue; struct work_struct dm_alert_work; struct timer_list send_timer; }; @@ -85,6 +86,14 @@ struct net_dm_alert_ops { void (*work_item_func)(struct work_struct *work); }; +struct net_dm_skb_cb { + void *pc; +}; + +#define NET_DM_SKB_CB(__skb) ((struct net_dm_skb_cb *)&((__skb)->cb[0])) + +#define NET_DM_QUEUE_LEN 1000 + static struct sk_buff *reset_per_cpu_data(struct per_cpu_dm_data *data) { size_t al; @@ -257,8 +266,214 @@ static const struct net_dm_alert_ops net_dm_alert_summary_ops = { .work_item_func = send_dm_alert, }; +static void net_dm_packet_trace_kfree_skb_hit(void *ignore, + struct sk_buff *skb, + void *location) +{ + ktime_t tstamp = ktime_get_real(); + struct per_cpu_dm_data *data; + struct sk_buff *nskb; + unsigned long flags; + + nskb = skb_clone(skb, GFP_ATOMIC); + if (!nskb) + return; + + NET_DM_SKB_CB(nskb)->pc = location; + /* Override the timestamp because we care about the time when the +* packet was dropped. +*/ + nskb->tstamp = tstamp; + + data = this_cpu_ptr(&dm_cpu_data); + + spin_lock_irqsave(&data->drop_queue.lock, flags); + if (skb_queue_len(&data->drop_queue) < NET_DM_QUEUE_LEN) + __skb_queue_tail(&data->drop_queue, nskb); + else + goto unlock_free; + spin_unlock_irqrestore(&data->drop_queue.lock, flags); + + schedule_work(&data->dm_alert_work); + + return; + +unlock_free: + spin_unlock_irqrestore(&data->drop_queue.lock, flags); + consume_skb(nskb); +} + +static void net_dm_packet_trace_napi_poll_hit(void *ignore, +
[PATCH net-next v2 05/10] drop_monitor: Add alert mode operations
From: Ido Schimmel The next patch is going to add another alert mode in which the dropped packet is notified to user space, instead of only a summary of recent drops. Abstract the differences between the modes by adding alert mode operations. The operations are selected based on the currently configured mode and associated with the probes and the work item just before tracing starts. Signed-off-by: Ido Schimmel --- include/uapi/linux/net_dropmon.h | 9 net/core/drop_monitor.c | 38 +++- 2 files changed, 41 insertions(+), 6 deletions(-) diff --git a/include/uapi/linux/net_dropmon.h b/include/uapi/linux/net_dropmon.h index 5edbd0a675fd..0fecdedeb6ca 100644 --- a/include/uapi/linux/net_dropmon.h +++ b/include/uapi/linux/net_dropmon.h @@ -62,4 +62,13 @@ enum { * Our group identifiers */ #define NET_DM_GRP_ALERT 1 + +/** + * enum net_dm_alert_mode - Alert mode. + * @NET_DM_ALERT_MODE_SUMMARY: A summary of recent drops is sent to user space. + */ +enum net_dm_alert_mode { + NET_DM_ALERT_MODE_SUMMARY, +}; + #endif diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c index cd2f3069f34e..9cd2f662cb9e 100644 --- a/net/core/drop_monitor.c +++ b/net/core/drop_monitor.c @@ -75,6 +75,16 @@ static int dm_delay = 1; static unsigned long dm_hw_check_delta = 2*HZ; static LIST_HEAD(hw_stats_list); +static enum net_dm_alert_mode net_dm_alert_mode = NET_DM_ALERT_MODE_SUMMARY; + +struct net_dm_alert_ops { + void (*kfree_skb_probe)(void *ignore, struct sk_buff *skb, + void *location); + void (*napi_poll_probe)(void *ignore, struct napi_struct *napi, + int work, int budget); + void (*work_item_func)(struct work_struct *work); +}; + static struct sk_buff *reset_per_cpu_data(struct per_cpu_dm_data *data) { size_t al; @@ -241,10 +251,23 @@ static void trace_napi_poll_hit(void *ignore, struct napi_struct *napi, rcu_read_unlock(); } +static const struct net_dm_alert_ops net_dm_alert_summary_ops = { + .kfree_skb_probe= trace_kfree_skb_hit, + .napi_poll_probe= trace_napi_poll_hit, + .work_item_func = send_dm_alert, +}; + +static const struct net_dm_alert_ops *net_dm_alert_ops_arr[] = { + [NET_DM_ALERT_MODE_SUMMARY] = &net_dm_alert_summary_ops, +}; + static int net_dm_trace_on_set(struct netlink_ext_ack *extack) { + const struct net_dm_alert_ops *ops; int cpu, rc; + ops = net_dm_alert_ops_arr[net_dm_alert_mode]; + if (!try_module_get(THIS_MODULE)) { NL_SET_ERR_MSG_MOD(extack, "Failed to take reference on module"); return -ENODEV; @@ -254,7 +277,7 @@ static int net_dm_trace_on_set(struct netlink_ext_ack *extack) struct per_cpu_dm_data *data = &per_cpu(dm_cpu_data, cpu); struct sk_buff *skb; - INIT_WORK(&data->dm_alert_work, send_dm_alert); + INIT_WORK(&data->dm_alert_work, ops->work_item_func); timer_setup(&data->send_timer, sched_send_work, 0); /* Allocate a new per-CPU skb for the summary alert message and * free the old one which might contain stale data from @@ -264,13 +287,13 @@ static int net_dm_trace_on_set(struct netlink_ext_ack *extack) consume_skb(skb); } - rc = register_trace_kfree_skb(trace_kfree_skb_hit, NULL); + rc = register_trace_kfree_skb(ops->kfree_skb_probe, NULL); if (rc) { NL_SET_ERR_MSG_MOD(extack, "Failed to connect probe to kfree_skb() tracepoint"); goto err_module_put; } - rc = register_trace_napi_poll(trace_napi_poll_hit, NULL); + rc = register_trace_napi_poll(ops->napi_poll_probe, NULL); if (rc) { NL_SET_ERR_MSG_MOD(extack, "Failed to connect probe to napi_poll() tracepoint"); goto err_unregister_trace; @@ -279,7 +302,7 @@ static int net_dm_trace_on_set(struct netlink_ext_ack *extack) return 0; err_unregister_trace: - unregister_trace_kfree_skb(trace_kfree_skb_hit, NULL); + unregister_trace_kfree_skb(ops->kfree_skb_probe, NULL); err_module_put: module_put(THIS_MODULE); return rc; @@ -288,10 +311,13 @@ static int net_dm_trace_on_set(struct netlink_ext_ack *extack) static void net_dm_trace_off_set(void) { struct dm_hw_stat_delta *new_stat, *temp; + const struct net_dm_alert_ops *ops; int cpu; - unregister_trace_napi_poll(trace_napi_poll_hit, NULL); - unregister_trace_kfree_skb(trace_kfree_skb_hit, NULL); + ops = net_dm_alert_ops_arr[net_dm_alert_mode]; + + unregister_trace_napi_poll(ops->napi_poll_probe, NULL); + unregister_trace_kfree_skb(ops->kfree_skb_probe, NULL); tracepoint_synchronize_unregister(); -- 2.21.0
[PATCH net] mlxsw: spectrum_ptp: Keep unmatched entries in a linked list
From: Petr Machata To identify timestamps for matching with their packets, Spectrum-1 uses a five-tuple of (port, direction, domain number, message type, sequence ID). If there are several clients from the same domain behind a single port sending Delay_Req's, the only thing differentiating these packets, as far as Spectrum-1 is concerned, is the sequence ID. Should sequence IDs between individual clients be similar, conflicts may arise. That is not a problem to hardware, which will simply deliver timestamps on a first comes, first served basis. However the driver uses a simple hash table to store the unmatched pieces. When a new conflicting piece arrives, it pushes out the previously stored one, which if it is a packet, is delivered without timestamp. Later on as the corresponding timestamps arrive, the first one is mismatched to the second packet, and the second one is never matched and eventually is GCd. To correct this issue, instead of using a simple rhashtable, use rhltable to keep the unmatched entries. Previously, a found unmatched entry would always be removed from the hash table. That is not the case anymore--an incompatible entry is left in the hash table. Therefore removal from the hash table cannot be used to confirm the validity of the looked-up pointer, instead the lookup would simply need to be redone. Therefore move it inside the critical section. This simplifies a lot of the code. Fixes: 8748642751ed ("mlxsw: spectrum: PTP: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls") Reported-by: Alex Veber Signed-off-by: Petr Machata Signed-off-by: Ido Schimmel --- .../ethernet/mellanox/mlxsw/spectrum_ptp.c| 138 +++--- 1 file changed, 55 insertions(+), 83 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c index 63b07edd9d81..38bb1cfe4e8c 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c @@ -29,7 +29,7 @@ struct mlxsw_sp_ptp_state { struct mlxsw_sp *mlxsw_sp; - struct rhashtable unmatched_ht; + struct rhltable unmatched_ht; spinlock_t unmatched_lock; /* protects the HT */ struct delayed_work ht_gc_dw; u32 gc_cycle; @@ -45,7 +45,7 @@ struct mlxsw_sp1_ptp_key { struct mlxsw_sp1_ptp_unmatched { struct mlxsw_sp1_ptp_key key; - struct rhash_head ht_node; + struct rhlist_head ht_node; struct rcu_head rcu; struct sk_buff *skb; u64 timestamp; @@ -359,7 +359,7 @@ static int mlxsw_sp_ptp_parse(struct sk_buff *skb, /* Returns NULL on successful insertion, a pointer on conflict, or an ERR_PTR on * error. */ -static struct mlxsw_sp1_ptp_unmatched * +static int mlxsw_sp1_ptp_unmatched_save(struct mlxsw_sp *mlxsw_sp, struct mlxsw_sp1_ptp_key key, struct sk_buff *skb, @@ -368,41 +368,51 @@ mlxsw_sp1_ptp_unmatched_save(struct mlxsw_sp *mlxsw_sp, int cycles = MLXSW_SP1_PTP_HT_GC_TIMEOUT / MLXSW_SP1_PTP_HT_GC_INTERVAL; struct mlxsw_sp_ptp_state *ptp_state = mlxsw_sp->ptp_state; struct mlxsw_sp1_ptp_unmatched *unmatched; - struct mlxsw_sp1_ptp_unmatched *conflict; + int err; unmatched = kzalloc(sizeof(*unmatched), GFP_ATOMIC); if (!unmatched) - return ERR_PTR(-ENOMEM); + return -ENOMEM; unmatched->key = key; unmatched->skb = skb; unmatched->timestamp = timestamp; unmatched->gc_cycle = mlxsw_sp->ptp_state->gc_cycle + cycles; - conflict = rhashtable_lookup_get_insert_fast(&ptp_state->unmatched_ht, - &unmatched->ht_node, - mlxsw_sp1_ptp_unmatched_ht_params); - if (conflict) + err = rhltable_insert(&ptp_state->unmatched_ht, &unmatched->ht_node, + mlxsw_sp1_ptp_unmatched_ht_params); + if (err) kfree(unmatched); - return conflict; + return err; } static struct mlxsw_sp1_ptp_unmatched * mlxsw_sp1_ptp_unmatched_lookup(struct mlxsw_sp *mlxsw_sp, - struct mlxsw_sp1_ptp_key key) + struct mlxsw_sp1_ptp_key key, int *p_length) { - return rhashtable_lookup(&mlxsw_sp->ptp_state->unmatched_ht, &key, -mlxsw_sp1_ptp_unmatched_ht_params); + struct mlxsw_sp1_ptp_unmatched *unmatched, *last = NULL; + struct rhlist_head *tmp, *list; + int length = 0; + + list = rhltable_lookup(&mlxsw_sp->ptp_state->unmatched_ht, &key, + mlxsw_sp1_ptp_unmatched_ht_params); + rhl_for_each_entry_rcu(unmatched, tmp, list, ht_node) { + last = unmatched; + length++; + } + + *p_length = length; + return last; } static int mlxsw_sp1_ptp_unmatched_remo
Re: [PATCH net-next v2 1/1] net: dsa: fix fixed-link port registration
Hello! Just noticed a comment typo... On 11.08.2019 6:18, Marek Behún wrote: Commit 88d6272acaaa ("net: phy: avoid unneeded MDIO reads in genphy_read_status") broke fixed link DSA port registration in dsa_port_fixed_link_register_of: the genphy_read_status does not do what it is supposed to and the following adjust_link is given wrong parameters. This causes a regression on Turris Omnia, where the mvneta driver for the interface connected to the switch reports crc errors, for some reason. I realize this fix is not ideal, something else could change in genphy functions which could cause DSA fixed-link port to break again. Hopefully DSA fixed-link port functionality will be converted to phylink API soon. Signed-off-by: Marek Behún Cc: Heiner Kallweit Cc: Sebastian Reichel Cc: Vivien Didelot Cc: Andrew Lunn Cc: Florian Fainelli Cc: David S. Miller --- net/dsa/port.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/net/dsa/port.c b/net/dsa/port.c index 363eab6df51b..c424ebb373e1 100644 --- a/net/dsa/port.c +++ b/net/dsa/port.c @@ -485,6 +485,17 @@ static int dsa_port_fixed_link_register_of(struct dsa_port *dp) phydev->interface = mode; genphy_config_init(phydev); + + /* +* Commit 88d6272acaaa caused genphy_read_status not to do it's work if its. +* autonegotiation is enabled and link status did not change. This is +* the case for fixed_phy. By setting phydev->link = 0 before the call +* to genphy_read_status we force it to read and fill in the parameters. +* +* Hopefully this dirty hack will be removed soon by converting DSA +* fixed link ports to phylink API. +*/ + phydev->link = 0; genphy_read_status(phydev); if (ds->ops->adjust_link) MBR, Sergei
Re: [PATCH net-next] net: openvswitch: Set OvS recirc_id from tc chain index
On 8/8/2019 11:53 PM, Pravin Shelar wrote: > On Wed, Aug 7, 2019 at 5:08 AM Paul Blakey wrote: >> Offloaded OvS datapath rules are translated one to one to tc rules, >> for example the following simplified OvS rule: >> >> recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) >> actions:ct(),recirc(2) >> >> Will be translated to the following tc rule: >> >> $ tc filter add dev dev1 ingress \ >> prio 1 chain 0 proto ip \ >> flower tcp ct_state -trk \ >> action ct pipe \ >> action goto chain 2 >> >> Received packets will first travel though tc, and if they aren't stolen >> by it, like in the above rule, they will continue to OvS datapath. >> Since we already did some actions (action ct in this case) which might >> modify the packets, and updated action stats, we would like to continue >> the proccessing with the correct recirc_id in OvS (here recirc_id(2)) >> where we left off. >> >> To support this, introduce a new skb extension for tc, which >> will be used for translating tc chain to ovs recirc_id to >> handle these miss cases. Last tc chain index will be set >> by tc goto chain action and read by OvS datapath. >> >> Signed-off-by: Paul Blakey >> Signed-off-by: Vlad Buslov >> Acked-by: Jiri Pirko >> --- >> include/linux/skbuff.h| 13 + >> include/net/sch_generic.h | 5 - >> net/core/skbuff.c | 6 ++ >> net/openvswitch/flow.c| 9 + >> net/sched/Kconfig | 13 + >> net/sched/act_api.c | 1 + >> net/sched/cls_api.c | 12 >> 7 files changed, 58 insertions(+), 1 deletion(-) >> >> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h >> index 3aef8d8..fb2a792 100644 >> --- a/include/linux/skbuff.h >> +++ b/include/linux/skbuff.h >> @@ -279,6 +279,16 @@ struct nf_bridge_info { >> }; >> #endif >> >> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT) >> +/* Chain in tc_skb_ext will be used to share the tc chain with >> + * ovs recirc_id. It will be set to the current chain by tc >> + * and read by ovs to recirc_id. >> + */ >> +struct tc_skb_ext { >> + __u32 chain; >> +}; >> +#endif >> + >> struct sk_buff_head { >> /* These two members must be first. */ >> struct sk_buff *next; >> @@ -4050,6 +4060,9 @@ enum skb_ext_id { >> #ifdef CONFIG_XFRM >> SKB_EXT_SEC_PATH, >> #endif >> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT) >> + TC_SKB_EXT, >> +#endif >> SKB_EXT_NUM, /* must be last */ >> }; >> >> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h >> index 6b6b012..871feea 100644 >> --- a/include/net/sch_generic.h >> +++ b/include/net/sch_generic.h >> @@ -275,7 +275,10 @@ struct tcf_result { >> unsigned long class; >> u32 classid; >> }; >> - const struct tcf_proto *goto_tp; >> + struct { >> + const struct tcf_proto *goto_tp; >> + u32 goto_index; >> + }; >> >> /* used in the skb_tc_reinsert function */ >> struct { >> diff --git a/net/core/skbuff.c b/net/core/skbuff.c >> index ea8e8d3..2b40b5a 100644 >> --- a/net/core/skbuff.c >> +++ b/net/core/skbuff.c >> @@ -4087,6 +4087,9 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff >> *skb) >> #ifdef CONFIG_XFRM >> [SKB_EXT_SEC_PATH] = SKB_EXT_CHUNKSIZEOF(struct sec_path), >> #endif >> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT) >> + [TC_SKB_EXT] = SKB_EXT_CHUNKSIZEOF(struct tc_skb_ext), >> +#endif >> }; >> >> static __always_inline unsigned int skb_ext_total_length(void) >> @@ -4098,6 +4101,9 @@ static __always_inline unsigned int >> skb_ext_total_length(void) >> #ifdef CONFIG_XFRM >> skb_ext_type_len[SKB_EXT_SEC_PATH] + >> #endif >> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT) >> + skb_ext_type_len[TC_SKB_EXT] + >> +#endif >> 0; >> } >> >> diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c >> index bc89e16..0287ead 100644 >> --- a/net/openvswitch/flow.c >> +++ b/net/openvswitch/flow.c >> @@ -816,6 +816,9 @@ static int key_extract_mac_proto(struct sk_buff *skb) >> int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info, >> struct sk_buff *skb, struct sw_flow_key *key) >> { >> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT) >> + struct tc_skb_ext *tc_ext; >> +#endif >> int res, err; >> >> /* Extract metadata from packet. */ >> @@ -848,7 +851,13 @@ int ovs_flow_key_extract(const struct ip_tunnel_info >> *tun_info, >> if (res < 0) >> return res; >> key->mac_proto = res; >> + >> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT) >> + tc_ext = skb_ext_find(skb, TC_SKB_EXT); >> + key->recirc_id = tc_ext ? tc_ext->chain : 0; >> +#else >> key->recirc_id = 0; >> +#endif >> >
Re: [PATCH net-next] net: openvswitch: Set OvS recirc_id from tc chain index
On 8/8/2019 11:53 PM, Pravin Shelar wrote: > On Wed, Aug 7, 2019 at 5:08 AM Paul Blakey wrote: >> Offloaded OvS datapath rules are translated one to one to tc rules, >> for example the following simplified OvS rule: >> >> recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) >> actions:ct(),recirc(2) >> >> Will be translated to the following tc rule: >> >> $ tc filter add dev dev1 ingress \ >> prio 1 chain 0 proto ip \ >> flower tcp ct_state -trk \ >> action ct pipe \ >> action goto chain 2 >> >> Received packets will first travel though tc, and if they aren't stolen >> by it, like in the above rule, they will continue to OvS datapath. >> Since we already did some actions (action ct in this case) which might >> modify the packets, and updated action stats, we would like to continue >> the proccessing with the correct recirc_id in OvS (here recirc_id(2)) >> where we left off. >> >> To support this, introduce a new skb extension for tc, which >> will be used for translating tc chain to ovs recirc_id to >> handle these miss cases. Last tc chain index will be set >> by tc goto chain action and read by OvS datapath. >> >> Signed-off-by: Paul Blakey >> Signed-off-by: Vlad Buslov >> Acked-by: Jiri Pirko >> --- >> include/linux/skbuff.h| 13 + >> include/net/sch_generic.h | 5 - >> net/core/skbuff.c | 6 ++ >> net/openvswitch/flow.c| 9 + >> net/sched/Kconfig | 13 + >> net/sched/act_api.c | 1 + >> net/sched/cls_api.c | 12 >> 7 files changed, 58 insertions(+), 1 deletion(-) >> >> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h >> index 3aef8d8..fb2a792 100644 >> --- a/include/linux/skbuff.h >> +++ b/include/linux/skbuff.h >> @@ -279,6 +279,16 @@ struct nf_bridge_info { >> }; >> #endif >> >> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT) >> +/* Chain in tc_skb_ext will be used to share the tc chain with >> + * ovs recirc_id. It will be set to the current chain by tc >> + * and read by ovs to recirc_id. >> + */ >> +struct tc_skb_ext { >> + __u32 chain; >> +}; >> +#endif >> + >> struct sk_buff_head { >> /* These two members must be first. */ >> struct sk_buff *next; >> @@ -4050,6 +4060,9 @@ enum skb_ext_id { >> #ifdef CONFIG_XFRM >> SKB_EXT_SEC_PATH, >> #endif >> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT) >> + TC_SKB_EXT, >> +#endif >> SKB_EXT_NUM, /* must be last */ >> }; >> >> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h >> index 6b6b012..871feea 100644 >> --- a/include/net/sch_generic.h >> +++ b/include/net/sch_generic.h >> @@ -275,7 +275,10 @@ struct tcf_result { >> unsigned long class; >> u32 classid; >> }; >> - const struct tcf_proto *goto_tp; >> + struct { >> + const struct tcf_proto *goto_tp; >> + u32 goto_index; >> + }; >> >> /* used in the skb_tc_reinsert function */ >> struct { >> diff --git a/net/core/skbuff.c b/net/core/skbuff.c >> index ea8e8d3..2b40b5a 100644 >> --- a/net/core/skbuff.c >> +++ b/net/core/skbuff.c >> @@ -4087,6 +4087,9 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff >> *skb) >> #ifdef CONFIG_XFRM >> [SKB_EXT_SEC_PATH] = SKB_EXT_CHUNKSIZEOF(struct sec_path), >> #endif >> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT) >> + [TC_SKB_EXT] = SKB_EXT_CHUNKSIZEOF(struct tc_skb_ext), >> +#endif >> }; >> >> static __always_inline unsigned int skb_ext_total_length(void) >> @@ -4098,6 +4101,9 @@ static __always_inline unsigned int >> skb_ext_total_length(void) >> #ifdef CONFIG_XFRM >> skb_ext_type_len[SKB_EXT_SEC_PATH] + >> #endif >> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT) >> + skb_ext_type_len[TC_SKB_EXT] + >> +#endif >> 0; >> } >> >> diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c >> index bc89e16..0287ead 100644 >> --- a/net/openvswitch/flow.c >> +++ b/net/openvswitch/flow.c >> @@ -816,6 +816,9 @@ static int key_extract_mac_proto(struct sk_buff *skb) >> int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info, >> struct sk_buff *skb, struct sw_flow_key *key) >> { >> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT) >> + struct tc_skb_ext *tc_ext; >> +#endif >> int res, err; >> >> /* Extract metadata from packet. */ >> @@ -848,7 +851,13 @@ int ovs_flow_key_extract(const struct ip_tunnel_info >> *tun_info, >> if (res < 0) >> return res; >> key->mac_proto = res; >> + >> +#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT) >> + tc_ext = skb_ext_find(skb, TC_SKB_EXT); >> + key->recirc_id = tc_ext ? tc_ext->chain : 0; >> +#else >> key->recirc_id = 0; >> +#endif >> >
Re: [PATCH] xen-netback: no need to check return value of debugfs_create functions
On Sat, Aug 10, 2019 at 12:31:08PM +0200, Greg Kroah-Hartman wrote: > When calling debugfs functions, there is no need to ever check the > return value. The function can work or not, but the code logic should > never do something different based on this. > > Cc: Wei Liu > Cc: Paul Durrant > Cc: xen-de...@lists.xenproject.org > Cc: netdev@vger.kernel.org > Signed-off-by: Greg Kroah-Hartman Acked-by: Wei Liu
Re: [PATCH net-next v2 1/1] net: dsa: fix fixed-link port registration
On 11.08.2019 05:39, Andrew Lunn wrote: > On Sun, Aug 11, 2019 at 05:18:57AM +0200, Marek Behún wrote: >> Commit 88d6272acaaa ("net: phy: avoid unneeded MDIO reads in >> genphy_read_status") broke fixed link DSA port registration in >> dsa_port_fixed_link_register_of: the genphy_read_status does not do what >> it is supposed to and the following adjust_link is given wrong >> parameters. > > Hi Marek > > Which parameters are incorrect? > > In fixed_phy.c, __fixed_phy_register() there is: > > /* propagate the fixed link values to struct phy_device */ > phy->link = status->link; > if (status->link) { > phy->speed = status->speed; > phy->duplex = status->duplex; > phy->pause = status->pause; > phy->asym_pause = status->asym_pause; > } > > Are we not initialising something? Or is the initialisation done here > getting reset sometime afterwards? > In addition to Andrew's question: We talk about this DT config: armada-385-turris-omnia.dts ? Which kernel version are you using? > Thanks > Andrew > Heiner
[BUG] fec mdio times out under system stress
Hi Fabio, When I woke up this morning, I found that one of the Hummingboards had gone offline (as in, lost network link) during the night. Investigating, I find that the system had gone into OOM, and at that time, triggered an unrelated: [4111697.698776] fec 2188000.ethernet eth0: MDIO read timeout [4111697.712996] MII_DATA: 0x6006796d [4111697.729415] MII_SPEED: 0x001a [4111697.745232] IEVENT: 0x [4111697.745242] IMASK: 0x0a8000aa [4111698.002233] Atheros 8035 ethernet 2188000.ethernet-1:00: PHY state change RUNNING -> HALTED [4111698.009882] fec 2188000.ethernet eth0: Link is Down This is on a dual-core iMX6. It looks like the read actually completed (since MII_DATA contains the register data) but we somehow lost the interrupt (or maybe received the interrupt after wait_for_completion_timeout() timed out.) >From what I can see, the OOM events happened on CPU1, CPU1 was allocated the FEC interrupt, and the PHY polling that suffered the MDIO timeout was on CPU0. Given that IEVENT is zero, it seems that CPU1 had read serviced the interrupt, but it is not clear how far through processing that it was - it may be that fec_enet_interrupt() had been delayed by the OOM condition. This seems rather fragile - as the system slowing down due to OOM triggers the network to completely collapse by phylib taking the PHY offline, making the system inaccessible except through the console. In my case, even serial console wasn't operational (except for magic sysrq). Not sure what agetty was playing at... so the only way I could recover any information from the system was to connect the HDMI and plug in a USB keyboard. Any thoughts on how FEC MDIO accesses could be made more robust? Maybe phylib should retry a number of times - but with read-sensitive registers, if the read has already completed successfully, and its just a problem with the FEC MDIO hardware, that could cause issues. Thanks. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up
Re: [PATCH net-next v2 1/1] net: dsa: fix fixed-link port registration
OK guys, something is terribly wrong here. I bisected to the commit mentioned (88d6272acaaa), looked around at the genphy functions, tried adding the link=0 workaround and it did work, so I though this was the issue. What I realized now is that before the commit 88d6272acaaa things worked because of two bugs, which negated each other. This commit caused one of this bugs not to fire, and thus the second bug was not negated. What actually happened before the commit that broke it is this: - after the fixed_phy is created, the parameters are corrent - genphy_read_status breaks the parameters: - first it sets the parameters to unknown (SPEED_UNKNOWN, DUPLEX_UNKNOWN) - then read the registers, which are simulated for fixed_phy - then it uses phy-core.c:phy_resolve_aneg_linkmode function, which looks for correct settings by bit-anding the ->advertising and ->lp_advertigins bit arrays. But in fixed_phy, ->lp_advertising is set to zero, so the parameters are left at SPEED_UNKNOWN, ... (this is the first bug) - then adjust_link is called, which then goes to mv88e6xxx_port_setup_mac, where there is a test if it should change something: if (state.link == link && state.speed == speed && state.duplex == duplex) return 0; - since current speed on the switch port (state.speed) is SPEED_1000, and new speed is SPEED_UNKNOWN, this test fails, and so the rest of this function is called, which makes the port work (the if test is the second bug) After the commit that broke things: - after the fixed_phy is created, the parameters are corrent - genphy_read_status doesn't change them - mv88e6xxx_port_setup_mac does nothing, since the if condition above is true So, there are two things that are broken: - the test in mv88e6xxx_port_setup_mac whether there is to be a change should be more sophisticated - fixed_phy should also simulate the lp_advertising register What do you think of this? Marek On Sun, 11 Aug 2019 13:35:20 +0200 Heiner Kallweit wrote: > On 11.08.2019 05:39, Andrew Lunn wrote: > > On Sun, Aug 11, 2019 at 05:18:57AM +0200, Marek Behún wrote: > >> Commit 88d6272acaaa ("net: phy: avoid unneeded MDIO reads in > >> genphy_read_status") broke fixed link DSA port registration in > >> dsa_port_fixed_link_register_of: the genphy_read_status does not do what > >> it is supposed to and the following adjust_link is given wrong > >> parameters. > > > > Hi Marek > > > > Which parameters are incorrect? > > > > In fixed_phy.c, __fixed_phy_register() there is: > > > > /* propagate the fixed link values to struct phy_device */ > > phy->link = status->link; > > if (status->link) { > > phy->speed = status->speed; > > phy->duplex = status->duplex; > > phy->pause = status->pause; > > phy->asym_pause = status->asym_pause; > > } > > > > Are we not initialising something? Or is the initialisation done here > > getting reset sometime afterwards? > > > In addition to Andrew's question: > We talk about this DT config: armada-385-turris-omnia.dts ? > Which kernel version are you using? > > > Thanks > > Andrew > > > Heiner
Re: [BUG] fec mdio times out under system stress
On Sun, Aug 11, 2019 at 02:37:07PM +0100, Russell King - ARM Linux admin wrote: > Hi Fabio, > > When I woke up this morning, I found that one of the Hummingboards > had gone offline (as in, lost network link) during the night. > Investigating, I find that the system had gone into OOM, and at > that time, triggered an unrelated: > > [4111697.698776] fec 2188000.ethernet eth0: MDIO read timeout > [4111697.712996] MII_DATA: 0x6006796d > [4111697.729415] MII_SPEED: 0x001a > [4111697.745232] IEVENT: 0x > [4111697.745242] IMASK: 0x0a8000aa > [4111698.002233] Atheros 8035 ethernet 2188000.ethernet-1:00: PHY state > change RUNNING -> HALTED > [4111698.009882] fec 2188000.ethernet eth0: Link is Down > > This is on a dual-core iMX6. > > It looks like the read actually completed (since MII_DATA contains > the register data) but we somehow lost the interrupt (or maybe > received the interrupt after wait_for_completion_timeout() timed > out.) > > From what I can see, the OOM events happened on CPU1, CPU1 was > allocated the FEC interrupt, and the PHY polling that suffered the > MDIO timeout was on CPU0. > > Given that IEVENT is zero, it seems that CPU1 had read serviced the > interrupt, but it is not clear how far through processing that it > was - it may be that fec_enet_interrupt() had been delayed by the > OOM condition. > > This seems rather fragile - as the system slowing down due to OOM > triggers the network to completely collapse by phylib taking the > PHY offline, making the system inaccessible except through the > console. > > In my case, even serial console wasn't operational (except for > magic sysrq). Not sure what agetty was playing at... so the only > way I could recover any information from the system was to connect > the HDMI and plug in a USB keyboard. > > Any thoughts on how FEC MDIO accesses could be made more robust? > > Maybe phylib should retry a number of times - but with read-sensitive > registers, if the read has already completed successfully, and its > just a problem with the FEC MDIO hardware, that could cause issues. I should also note for the phylib people: After phylib has received an error, and has entered the HALTED state, downing the interface produces this warning, which seems rather unfair as the FEC doesn't know that its PHY suffered an error - it is merely reversing the phy_start() that happened when the interface was opened. [4144039.099786] [ cut here ] [4144039.109001] WARNING: CPU: 1 PID: 25842 at drivers/net/phy/phy.c:835 fec_enet_close+0x14/0x148 [4144039.124366] called from state HALTED [4144039.132626] Modules linked in: 8021q brcmfmac brcmutil imx_thermal snd_soc_imx_spdif snd_soc_imx_audmux nvmem_imx_ocotp cfg80211 snd_soc_sgtl5000 imx_sdma virt_dma rc_cec snd_soc_fsl_ssi snd_soc_fsl_spdif coda imx_pcm_dma v4l2_mem2mem imx_vdoa etnaviv videobuf2_dma_contig gpu_sched dw_hdmi_cec dw_hdmi_ahb_audio imx6q_cpufreq caamrng caam_jr caam error ip_tables x_tables [last unloaded: evbug][4144039.177249] CPU: 1 PID: 25842 Comm: ip Tainted: GW 5.1.0+ #319 [4144039.189477] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) [4144039.201025] [] (unwind_backtrace) from [] (show_stack+0x10/0x14) [4144039.213769] [] (show_stack) from [] (dump_stack+0x9c/0xd4) [4144039.225959] [] (dump_stack) from [] (__warn+0xf8/0x124) [4144039.237828] [] (__warn) from [] (warn_slowpath_fmt+0x38/0x48) [4144039.250275] [] (warn_slowpath_fmt) from [] (fec_enet_close+0x14/0x148) [4144039.263558] [] (fec_enet_close) from [] (__dev_close_many+0x88/0xf0) [4144039.276639] [] (__dev_close_many) from [] (__dev_change_flags+0xa4/0x1a0) [4144039.290130] [] (__dev_change_flags) from [] (dev_change_flags+0x18/0x48) [4144039.303558] [] (dev_change_flags) from [] (do_setlink+0x29c/0x990) [4144039.316517] [] (do_setlink) from [] (__rtnl_newlink+0x4a4/0x72c) [4144039.329282] [] (__rtnl_newlink) from [] (rtnl_newlink+0x40/0x60) [4144039.342023] [] (rtnl_newlink) from [] (rtnetlink_rcv_msg+0x244/0x470) [4144039.355224] [] (rtnetlink_rcv_msg) from [] (netlink_rcv_skb+0xe0/0xf4) [4144039.368492] [] (netlink_rcv_skb) from [] (netlink_unicast+0x170/0x1b8) [4144039.381758] [] (netlink_unicast) from [] (netlink_sendmsg+0x2b8/0x340) [4144039.395023] [] (netlink_sendmsg) from [] (sock_sendmsg+0x14/0x24) [4144039.407825] [] (sock_sendmsg) from [] (___sys_sendmsg+0x200/0x214) [4144039.420714] [] (___sys_sendmsg) from [] (__sys_sendmsg+0x40/0x6c) [4144039.433506] [] (__sys_sendmsg) from [] (ret_fast_syscall+0x0/0x28) [4144039.446366] Exception stack(0xe1933fa8 to 0xe1933ff0) [4144039.456329] 3fa0: bea086f0 bea086bc 0003 bea086d0 [4144039.469464] 3fc0: bea086f0 bea086bc 0128 0062a278 bea086d0 5d5011f3 0062a000 [4144039.482590] 3fe0: 0128 bea08680 b6dfc4cb b6d7d6f6 [4144039.492652] irq event stamp: 0 [4144039.500650] hardirqs last enabled at (0): [<>] (null) [4144039.511628] h
Re: [BUG] fec mdio times out under system stress
Hi Russell, Fabio, On Sun, 11 Aug 2019 at 16:42, Russell King - ARM Linux admin wrote: > > Hi Fabio, > > When I woke up this morning, I found that one of the Hummingboards > had gone offline (as in, lost network link) during the night. > Investigating, I find that the system had gone into OOM, and at > that time, triggered an unrelated: > > [4111697.698776] fec 2188000.ethernet eth0: MDIO read timeout > [4111697.712996] MII_DATA: 0x6006796d > [4111697.729415] MII_SPEED: 0x001a > [4111697.745232] IEVENT: 0x > [4111697.745242] IMASK: 0x0a8000aa > [4111698.002233] Atheros 8035 ethernet 2188000.ethernet-1:00: PHY state > change RUNNING -> HALTED > [4111698.009882] fec 2188000.ethernet eth0: Link is Down > > This is on a dual-core iMX6. > > It looks like the read actually completed (since MII_DATA contains > the register data) but we somehow lost the interrupt (or maybe > received the interrupt after wait_for_completion_timeout() timed > out.) > > From what I can see, the OOM events happened on CPU1, CPU1 was > allocated the FEC interrupt, and the PHY polling that suffered the > MDIO timeout was on CPU0. > > Given that IEVENT is zero, it seems that CPU1 had read serviced the > interrupt, but it is not clear how far through processing that it > was - it may be that fec_enet_interrupt() had been delayed by the > OOM condition. > > This seems rather fragile - as the system slowing down due to OOM > triggers the network to completely collapse by phylib taking the > PHY offline, making the system inaccessible except through the > console. > > In my case, even serial console wasn't operational (except for > magic sysrq). Not sure what agetty was playing at... so the only > way I could recover any information from the system was to connect > the HDMI and plug in a USB keyboard. > > Any thoughts on how FEC MDIO accesses could be made more robust? > I think a better question is why is the FEC MDIO controller configured to emit interrupts anyway (especially since the API built on top does not benefit in any way from this)? Hubert (copied) sent an interesting email very recently where he pointed out that this is one of the main sources of jitter when reading the PTP clock on a Marvell switch attached over MDIO. > Maybe phylib should retry a number of times - but with read-sensitive > registers, if the read has already completed successfully, and its > just a problem with the FEC MDIO hardware, that could cause issues. > > Thanks. > > -- > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ > FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up > According to speedtest.net: 11.9Mbps down 500kbps up Regards, -Vladimir
[PATCH net-next 1/2] net: dsa: mv88e6xxx: fix RGMII-ID port setup
The mv88e6xxx_port_setup_mac looks if one of the {link, speed, duplex} parameters is being changed from the current setting, and if not, does not do anything. This test is wrong in some situations: this method also has the mode argument, which can also be changed. For example on Turris Omnia, the mode is PHY_INTERFACE_MODE_RGMII_ID, which has to be set byt the ->port_set_rgmii_delay method. The test does not look if mode is being changed (in fact there is currently no method to determine port mode as phy_interface_t type). The simplest solution seems to be to drop this test altogether and simply do the setup when requested. Signed-off-by: Marek Behún Cc: Heiner Kallweit Cc: Sebastian Reichel Cc: Vivien Didelot Cc: Andrew Lunn Cc: Florian Fainelli Cc: David S. Miller --- drivers/net/dsa/mv88e6xxx/chip.c | 9 - 1 file changed, 9 deletions(-) diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index 2e8b1ab2c6f7..aae63f6515b3 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -420,15 +420,6 @@ int mv88e6xxx_port_setup_mac(struct mv88e6xxx_chip *chip, int port, int link, if (err) return err; - /* Has anything actually changed? We don't expect the -* interface mode to change without one of the other -* parameters also changing -*/ - if (state.link == link && - state.speed == speed && - state.duplex == duplex) - return 0; - /* Port's MAC control must not be changed unless the link is down */ err = chip->info->ops->port_set_link(chip, port, 0); if (err) -- 2.21.0
[PATCH net-next 2/2] net: fixed_phy: set is_gigabit_capable member when needed
The fixed_phy driver does not set the phydev->is_gigabit_capable member when the fixed_phy is gigabit capable. This in turn causes phy_device.c:genphy_read_status function to return unknown phy parameters (SPEED_UNKNOWN, DUPLEX_UNKNOWN, ...), if the fixed_phy is created with SPEED_1000. Signed-off-by: Marek Behún Cc: Heiner Kallweit Cc: Sebastian Reichel Cc: Vivien Didelot Cc: Andrew Lunn Cc: Florian Fainelli Cc: David S. Miller --- drivers/net/phy/fixed_phy.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/phy/fixed_phy.c b/drivers/net/phy/fixed_phy.c index 3ffe46df249e..424b02ad7b7b 100644 --- a/drivers/net/phy/fixed_phy.c +++ b/drivers/net/phy/fixed_phy.c @@ -286,6 +286,7 @@ static struct phy_device *__fixed_phy_register(unsigned int irq, phy->supported); linkmode_set_bit(ETHTOOL_LINK_MODE_1000baseT_Full_BIT, phy->supported); + phy->is_gigabit_capable = 1; /* fall through */ case SPEED_100: linkmode_set_bit(ETHTOOL_LINK_MODE_100baseT_Half_BIT, -- 2.21.0
Re: [PATCH net-next v2 1/1] net: dsa: fix fixed-link port registration
I have sent two new patches, each fixing one of the bugs that negated each other. Marek
Re: [PATCH net-next v2 1/1] net: dsa: fix fixed-link port registration
Hi Marek, On Sun, 11 Aug 2019 at 17:06, Marek Behun wrote: > > OK guys, something is terribly wrong here. > > I bisected to the commit mentioned (88d6272acaaa), looked around at the > genphy functions, tried adding the link=0 workaround and it did work, > so I though this was the issue. > > What I realized now is that before the commit 88d6272acaaa things > worked because of two bugs, which negated each other. This commit caused > one of this bugs not to fire, and thus the second bug was not negated. > > What actually happened before the commit that broke it is this: > - after the fixed_phy is created, the parameters are corrent > - genphy_read_status breaks the parameters: > - first it sets the parameters to unknown (SPEED_UNKNOWN, >DUPLEX_UNKNOWN) > - then read the registers, which are simulated for fixed_phy > - then it uses phy-core.c:phy_resolve_aneg_linkmode function, which >looks for correct settings by bit-anding the ->advertising and >->lp_advertigins bit arrays. But in fixed_phy, ->lp_advertising >is set to zero, so the parameters are left at SPEED_UNKNOWN, ... >(this is the first bug) > - then adjust_link is called, which then goes to > mv88e6xxx_port_setup_mac, where there is a test if it should change > something: >if (state.link == link && state.speed == speed && >state.duplex == duplex) >return 0; > - since current speed on the switch port (state.speed) is SPEED_1000, > and new speed is SPEED_UNKNOWN, this test fails, and so the rest of > this function is called, which makes the port work > (the if test is the second bug) > > After the commit that broke things: > - after the fixed_phy is created, the parameters are corrent > - genphy_read_status doesn't change them > - mv88e6xxx_port_setup_mac does nothing, since the if condition above > is true > > So, there are two things that are broken: > - the test in mv88e6xxx_port_setup_mac whether there is to be a change >should be more sophisticated > - fixed_phy should also simulate the lp_advertising register > > What do you think of this? > I don't know. But I think Heiner was asking you what kernel you're on because of what you said here: > Hopefully DSA fixed-link port functionality will be converted to phylink > API soon. The DSA fixed-link port functionality *has* been converted to phylink. See: - https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=0e27921816ad9 - https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=7fb5a711545d7d25fe9726a9ad277474dd83bd06 > Marek > > On Sun, 11 Aug 2019 13:35:20 +0200 > Heiner Kallweit wrote: > > > On 11.08.2019 05:39, Andrew Lunn wrote: > > > On Sun, Aug 11, 2019 at 05:18:57AM +0200, Marek Behún wrote: > > >> Commit 88d6272acaaa ("net: phy: avoid unneeded MDIO reads in > > >> genphy_read_status") broke fixed link DSA port registration in > > >> dsa_port_fixed_link_register_of: the genphy_read_status does not do what > > >> it is supposed to and the following adjust_link is given wrong > > >> parameters. > > > > > > Hi Marek > > > > > > Which parameters are incorrect? > > > > > > In fixed_phy.c, __fixed_phy_register() there is: > > > > > > /* propagate the fixed link values to struct phy_device */ > > > phy->link = status->link; > > > if (status->link) { > > > phy->speed = status->speed; > > > phy->duplex = status->duplex; > > > phy->pause = status->pause; > > > phy->asym_pause = status->asym_pause; > > > } > > > > > > Are we not initialising something? Or is the initialisation done here > > > getting reset sometime afterwards? > > > > > In addition to Andrew's question: > > We talk about this DT config: armada-385-turris-omnia.dts ? > > Which kernel version are you using? > > > > > Thanks > > > Andrew > > > > > Heiner > Regards, -Vladimir
Re: [PATCH net-next v2 1/1] net: dsa: fix fixed-link port registration
On Sun, Aug 11, 2019 at 04:04:04PM +0200, Marek Behun wrote: > OK guys, something is terribly wrong here. > > I bisected to the commit mentioned (88d6272acaaa), looked around at the > genphy functions, tried adding the link=0 workaround and it did work, > so I though this was the issue. > > What I realized now is that before the commit 88d6272acaaa things > worked because of two bugs, which negated each other. This commit caused > one of this bugs not to fire, and thus the second bug was not negated. > > What actually happened before the commit that broke it is this: > - after the fixed_phy is created, the parameters are corrent > - genphy_read_status breaks the parameters: > - first it sets the parameters to unknown (SPEED_UNKNOWN, >DUPLEX_UNKNOWN) > - then read the registers, which are simulated for fixed_phy > - then it uses phy-core.c:phy_resolve_aneg_linkmode function, which >looks for correct settings by bit-anding the ->advertising and >->lp_advertigins bit arrays. But in fixed_phy, ->lp_advertising >is set to zero, so the parameters are left at SPEED_UNKNOWN, ... >(this is the first bug) > - then adjust_link is called, which then goes to > mv88e6xxx_port_setup_mac, where there is a test if it should change > something: >if (state.link == link && state.speed == speed && >state.duplex == duplex) >return 0; > - since current speed on the switch port (state.speed) is SPEED_1000, > and new speed is SPEED_UNKNOWN, this test fails, and so the rest of > this function is called, which makes the port work > (the if test is the second bug) > > After the commit that broke things: > - after the fixed_phy is created, the parameters are corrent > - genphy_read_status doesn't change them > - mv88e6xxx_port_setup_mac does nothing, since the if condition above > is true > > So, there are two things that are broken: > - the test in mv88e6xxx_port_setup_mac whether there is to be a change >should be more sophisticated > - fixed_phy should also simulate the lp_advertising register > > What do you think of this? Marek This is the sort of information i like. I was having trouble understanding what was really wrong and how it was fixed by your previous patch. So setting the emulated lp_advertise to advertise makes a lot of sense for fixed phy. And it is something worthy of stable. As for mv88e6xxx_port_setup_mac(), which parameter is actually important here? My assumption was, if one of the other parameters changes, it would not happen alone. The link would also go down, and later come up again, etc. But it seems that assumption is wrong. At a guess, it is the RGMII delays. That would explain CRC errors in frames received by the master interface. Andrew
Re: [PATCH net-next 1/2] net: dsa: mv88e6xxx: fix RGMII-ID port setup
On Sun, Aug 11, 2019 at 05:08:11PM +0200, Marek Behún wrote: > The mv88e6xxx_port_setup_mac looks if one of the {link, speed, duplex} > parameters is being changed from the current setting, and if not, does > not do anything. This test is wrong in some situations: this method also > has the mode argument, which can also be changed. > > For example on Turris Omnia, the mode is PHY_INTERFACE_MODE_RGMII_ID, > which has to be set byt the ->port_set_rgmii_delay method. The test does > not look if mode is being changed (in fact there is currently no method > to determine port mode as phy_interface_t type). > > The simplest solution seems to be to drop this test altogether and > simply do the setup when requested. Hi Marek Unfortunately, that code is there for a reason. phylink can call the ->mac_config() method once per second. It is documented that mac_config() should only reconfigure what, if anything, has changed. And mv88e6xxx_port_setup_mac() needs to disable the port in order to change anything. So the change you propose here, under some conditions, will cause the port to be disabled/enables once per second. We need to fix this by expanding the test, not removing it. My current _guess_ would be, we need to add a ops->port_get_rgmii_delay() so we can see if that is what needs configuring. Andrew
Re: [PATCH net-next 2/2] net: fixed_phy: set is_gigabit_capable member when needed
On Sun, Aug 11, 2019 at 05:08:12PM +0200, Marek Behún wrote: > The fixed_phy driver does not set the phydev->is_gigabit_capable member > when the fixed_phy is gigabit capable. Neither does any other PHY driver. It should be possible to tell if a PHY supports 1G by looking at register values. If this does not work for fixed_link, it means we are missing something in the emulation. That is what we should be fixing. Also, this change has nothing to do the lp_advertise, what you previously said the problem was. At the moment, i don't get the feeling you have really dug all the way down and really understand the root causes. Andrew
Re: [BUG] fec mdio times out under system stress
> I think a better question is why is the FEC MDIO controller configured > to emit interrupts anyway (especially since the API built on top does > not benefit in any way from this)? Hubert (copied) sent an interesting > email very recently where he pointed out that this is one of the main > sources of jitter when reading the PTP clock on a Marvell switch > attached over MDIO. Hi Vladimir One reason is runtime power management. For a write operation, you could set it going and return immediately. Many drivers do, and then when the next read/write operation comes along, they poll in a loop waiting for the device to go idle, if it was still busy from the last operation. However, FEC supports runtime PM. When an MDIO read/write call is made, it calls runtime PM functions to indicate the device is active. Once it has completed the MDIO transaction, it calls runtime PM functions to indicate the device is inactive. These transitions can cause clocks to be enabled/disabled. If we don't wait around for the operation to complete, the clock could be disabled too early, and bad things would happen. You could replace the interrupt with a sleeping poll, but my guess would be, that has more jitter than using an interrupt. Andrew
Re: [PATCH net-next 2/2] net: fixed_phy: set is_gigabit_capable member when needed
On Sun, 11 Aug 2019 17:40:01 +0200 Andrew Lunn wrote: > On Sun, Aug 11, 2019 at 05:08:12PM +0200, Marek Behún wrote: > > The fixed_phy driver does not set the phydev->is_gigabit_capable member > > when the fixed_phy is gigabit capable. > > Neither does any other PHY driver. It should be possible to tell if a > PHY supports 1G by looking at register values. If this does not work > for fixed_link, it means we are missing something in the emulation. > That is what we should be fixing. > > Also, this change has nothing to do the lp_advertise, what you > previously said the problem was. At the moment, i don't get the > feeling you have really dug all the way down and really understand the > root causes. > > Andrew Andrew, is_gigabit_capable is otherwise set only in the phy_probe function. This function is not called at all for the DSA cpu port fixed_link phy. Why is that? But I guess it is not important anymore, if CPU and DSA were converted to phylink in net-next. I shall test it and let you know. In any case, sorry for the spam. Marek
Re: [PATCH net-next 1/2] net: dsa: mv88e6xxx: fix RGMII-ID port setup
On Sun, 11 Aug 2019 17:31:53 +0200 Andrew Lunn wrote: > On Sun, Aug 11, 2019 at 05:08:11PM +0200, Marek Behún wrote: > > The mv88e6xxx_port_setup_mac looks if one of the {link, speed, duplex} > > parameters is being changed from the current setting, and if not, does > > not do anything. This test is wrong in some situations: this method also > > has the mode argument, which can also be changed. > > > > For example on Turris Omnia, the mode is PHY_INTERFACE_MODE_RGMII_ID, > > which has to be set byt the ->port_set_rgmii_delay method. The test does > > not look if mode is being changed (in fact there is currently no method > > to determine port mode as phy_interface_t type). > > > > The simplest solution seems to be to drop this test altogether and > > simply do the setup when requested. > > Hi Marek > > Unfortunately, that code is there for a reason. phylink can call the > ->mac_config() method once per second. It is documented that > mac_config() should only reconfigure what, if anything, has changed. > And mv88e6xxx_port_setup_mac() needs to disable the port in order to > change anything. So the change you propose here, under some > conditions, will cause the port to be disabled/enables once per > second. > > We need to fix this by expanding the test, not removing it. My > current _guess_ would be, we need to add a ops->port_get_rgmii_delay() > so we can see if that is what needs configuring. > >Andrew Hi Andrew, what if we added a phy_mode member to struct mv88e6xxx_port, and either set it to PHY_INTERFACE_MODE_NA in mv88e6xxx_setup, or implemented methods for converting the switch register values to PHY_INTERFACE_MODE_* values. This way we could just add port.mode == mode check to port_setup_mac method. I am willing to implement this if you think this could work. Marek
Re: [PATCH net-next v2 1/1] net: dsa: fix fixed-link port registration
On Sun, 11 Aug 2019 18:16:11 +0300 Vladimir Oltean wrote: > The DSA fixed-link port functionality *has* been converted to phylink. > See: > - > https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=0e27921816ad9 > - > https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=7fb5a711545d7d25fe9726a9ad277474dd83bd06 > /o\ my bad. I did not realize that I was working on 5.2 :(. Sorry.
Re: [BUG] fec mdio times out under system stress
On Sun, Aug 11, 2019 at 02:37:07PM +0100, Russell King - ARM Linux admin wrote: > Hi Fabio, > > When I woke up this morning, I found that one of the Hummingboards > had gone offline (as in, lost network link) during the night. > Investigating, I find that the system had gone into OOM, and at > that time, triggered an unrelated: > > [4111697.698776] fec 2188000.ethernet eth0: MDIO read timeout > [4111697.712996] MII_DATA: 0x6006796d > [4111697.729415] MII_SPEED: 0x001a > [4111697.745232] IEVENT: 0x > [4111697.745242] IMASK: 0x0a8000aa > [4111698.002233] Atheros 8035 ethernet 2188000.ethernet-1:00: PHY state > change RUNNING -> HALTED > [4111698.009882] fec 2188000.ethernet eth0: Link is Down > > This is on a dual-core iMX6. > > It looks like the read actually completed (since MII_DATA contains > the register data) but we somehow lost the interrupt (or maybe > received the interrupt after wait_for_completion_timeout() timed > out.) Hi Russell The timeout is quite short, #define FEC_MII_TIMEOUT 3 /* us */ Looking at the Vybrid datasheet, there does not appear to be any way to determine if the hardware is busy other than waiting for the interrupt. There is no 'busy' bit which gets cleared on completion. So about the only option is to make the timeout bigger. Andrew
Re: [BUG] fec mdio times out under system stress
> Maybe phylib should retry a number of times - but with read-sensitive > registers, if the read has already completed successfully, and its > just a problem with the FEC MDIO hardware, that could cause issues. Hi Russell At the bus level, MDIO cannot fail. The bits get clocked out, and the bits get clocked in. There is no way for the PHY to stretch the clock as I2C slaves can. There is nothing like the USB NACK, try again later. If something fails, it fails at a higher level, which means it is a driver issue. In this case, the interrupt got delayed, after the timer interrupt. The FEC is also quite unusual in using an interrupt. Most MDIO drivers poll. And if time gets 'stretched' because the system is too busy, generally, the right thing happens anyway. So i don't think it is phylib job to work around this issue. Andrew
Re: [PATCH net-next 1/2] net: dsa: mv88e6xxx: fix RGMII-ID port setup
> what if we added a phy_mode member to struct mv88e6xxx_port, and either > set it to PHY_INTERFACE_MODE_NA in mv88e6xxx_setup, or implemented > methods for converting the switch register values to > PHY_INTERFACE_MODE_* values. We should read the switch registers. I think you can set the defaults using strapping pins. And in general, the driver always reads state from the hardware rather than caching it. Andrew
Re: [PATCH net-next 2/2] net: fixed_phy: set is_gigabit_capable member when needed
On 11.08.2019 18:08, Marek Behun wrote: > On Sun, 11 Aug 2019 17:40:01 +0200 > Andrew Lunn wrote: > >> On Sun, Aug 11, 2019 at 05:08:12PM +0200, Marek Behún wrote: >>> The fixed_phy driver does not set the phydev->is_gigabit_capable member >>> when the fixed_phy is gigabit capable. >> >> Neither does any other PHY driver. It should be possible to tell if a >> PHY supports 1G by looking at register values. If this does not work >> for fixed_link, it means we are missing something in the emulation. >> That is what we should be fixing. >> >> Also, this change has nothing to do the lp_advertise, what you >> previously said the problem was. At the moment, i don't get the >> feeling you have really dug all the way down and really understand the >> root causes. >> >> Andrew > > Andrew, > is_gigabit_capable is otherwise set only in the phy_probe function. > This function is not called at all for the DSA cpu port fixed_link phy. > Why is that? But I guess it is not important anymore, if CPU and DSA > were converted to phylink in net-next. I shall test it and let you know. > In any case, sorry for the spam. > Marek > A fixed phy should be bound to the genphy driver, but that happens later in phy_attach_direct(). Maybe the fixed phy device of a CPU port gets never attached to a net device? This would explain why phy_probe() doesn't get called. Following idea: We could swphy let return PHY ID 0x (instead of current value 0x). Then the fixed phy device would be bound to the genphy driver immediately at device registration time. As a consequence phy_probe() would call genphy_read_abilities() that sets supported modes properly. This should allow to remove the manual adding of supported modes in dsa_port_fixed_link_register_of(). One more thing regarding genphy_read_abilities() and fixed phys in general: swphy sets bit BMSR_ESTATEN, then genphy_read_abilities() reads MII_ESTATUS to check if and which 1Gbps modes are supported. MII_ESTATUS however isn't defined in swphy. We're just fortunate that therefore the default value 0x is returned and both 1Gbps modes seem to be supported. Last but not least I think the calls to genphy_config_init() and genphy_read_status() in dsa_port_fixed_link_register_of() are both not needed and could be removed. Heiner
Re: [PATCH net-next v2 00/10] drop_monitor: Capture dropped packets and metadata
From: Ido Schimmel Date: Sun, 11 Aug 2019 10:35:45 +0300 > From: Ido Schimmel > > So far drop monitor supported only one mode of operation in which a > summary of recent packet drops is periodically sent to user space as a > netlink event. The event only includes the drop location (program > counter) and number of drops in the last interval. > > While this mode of operation allows one to understand if the system is > dropping packets, it is not sufficient if a more detailed analysis is > required. Both the packet itself and related metadata are missing. > > This patchset extends drop monitor with another mode of operation where > the packet - potentially truncated - and metadata (e.g., drop location, > timestamp, netdev) are sent to user space as a netlink event. Thanks to > the extensible nature of netlink, more metadata can be added in the > future. ... Series applied.
pull-request: bpf 2019-08-11
Hi David, The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) x64 JIT code generation fix for backward-jumps to 1st insn, from Alexei. 2) Fix buggy multi-closing of BTF file descriptor in libbpf, from Andrii. 3) Fix libbpf_num_possible_cpus() to make it thread safe, from Takshak. 4) Fix bpftool to dump an error if pinning fails, from Jakub. Please consider pulling these changes from: git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git Thanks a lot! The following changes since commit 0a062ba725cdad3b167782179ee914a8402a0184: Merge tag 'mlx5-fixes-2019-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux (2019-07-26 14:26:41 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git for you to fetch changes up to 4f7aafd78aeaf18a4f6dea9415df60e745c9dfa7: Merge branch 'bpf-bpftool-pinning-error-msg' (2019-08-09 17:38:53 +0200) Alexei Starovoitov (2): bpf: fix x64 JIT code generation for jmp to 1st insn selftests/bpf: tests for jmp to 1st insn Andrii Nakryiko (2): libbpf: fix erroneous multi-closing of BTF FD libbpf: set BTF FD for prog only when there is supported .BTF.ext data Daniel Borkmann (1): Merge branch 'bpf-bpftool-pinning-error-msg' Jakub Kicinski (2): tools: bpftool: fix error message (prog -> object) tools: bpftool: add error message on pin failure Takshak Chahande (1): libbpf : make libbpf_num_possible_cpus function thread safe arch/x86/net/bpf_jit_comp.c | 9 tools/bpf/bpftool/common.c| 8 +-- tools/lib/bpf/libbpf.c| 33 +++ tools/testing/selftests/bpf/verifier/loops1.c | 28 +++ 4 files changed, 57 insertions(+), 21 deletions(-)
Re: pull-request: bpf 2019-08-11
From: Daniel Borkmann Date: Sun, 11 Aug 2019 21:58:34 +0200 > The following pull-request contains BPF updates for your *net* tree. > > The main changes are: > > 1) x64 JIT code generation fix for backward-jumps to 1st insn, from Alexei. > > 2) Fix buggy multi-closing of BTF file descriptor in libbpf, from Andrii. > > 3) Fix libbpf_num_possible_cpus() to make it thread safe, from Takshak. > > 4) Fix bpftool to dump an error if pinning fails, from Jakub. > > Please consider pulling these changes from: > > git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git Pulled, thanks Daniel.
Re: [patch net-next rfc 3/7] net: rtnetlink: add commands to add and delete alternative ifnames
On Sat, Aug 10, 2019 at 12:39:31PM -0700, Roopa Prabhu wrote: > On Sat, Aug 10, 2019 at 8:50 AM Michal Kubecek wrote: > > > > On Sat, Aug 10, 2019 at 06:46:57AM -0700, Roopa Prabhu wrote: > > > On Fri, Aug 9, 2019 at 8:46 AM Michal Kubecek wrote: > > > > > > > > On Fri, Aug 09, 2019 at 08:40:25AM -0700, Roopa Prabhu wrote: > > > > > to that point, I am also not sure why we have a new API For multiple > > > > > names. I mean why support more than two names (existing old name and > > > > > a new name to remove the length limitation) ? > > > > > > > > One use case is to allow "predictable names" from udev/systemd to work > > > > the way do for e.g. block devices, see > > > > > > > > http://lkml.kernel.org/r/20190628162716.gf29...@unicorn.suse.cz > > > > > > > > > > thanks for the link. don't know the details about alternate block > > > device names. Does user-space generate multiple and assign them to a > > > kernel object as proposed in this series ?. is there a limit to number > > > of names ?. my understanding of 'predictable names' was still a single > > > name but predictable structure to the name. > > > > It is a single name but IMHO mostly because we can only have one name. > > For block devices, udev uses symlinks to create multiple aliases based > > on different naming schemes, e.g. > > > > mike@lion:~> find -L /dev/disk/ -samefile /dev/sda2 -exec ls -l {} + > > lrwxrwxrwx 1 root root 10 srp 5 21:47 > > /dev/disk/by-id/ata-WDC_WD30EFRX-68AX9N0_WD-WMC1T3114933-part2 -> ../../sda2 > > lrwxrwxrwx 1 root root 10 srp 5 21:47 > > /dev/disk/by-id/scsi-SATA_WDC_WD30EFRX-68A_WD-WMC1T3114933-part2 -> > > ../../sda2 > > lrwxrwxrwx 1 root root 10 srp 5 21:47 > > /dev/disk/by-id/scsi-SATA_WDC_WD30EFRX-68_WD-WMC1T3114933-part2 -> > > ../../sda2 > > lrwxrwxrwx 1 root root 10 srp 5 21:47 > > /dev/disk/by-id/scsi-0ATA_WDC_WD30EFRX-68A_WD-WMC1T3114933-part2 -> > > ../../sda2 > > lrwxrwxrwx 1 root root 10 srp 5 21:47 > > /dev/disk/by-id/scsi-1ATA_WDC_WD30EFRX-68AX9N0_WD-WMC1T3114933-part2 -> > > ../../sda2 > > lrwxrwxrwx 1 root root 10 srp 5 21:47 > > /dev/disk/by-id/scsi-350014ee6589cfea0-part2 -> ../../sda2 > > lrwxrwxrwx 1 root root 10 srp 5 21:47 > > /dev/disk/by-id/wwn-0x50014ee6589cfea0-part2 -> ../../sda2 > > lrwxrwxrwx 1 root root 10 srp 5 21:47 /dev/disk/by-partlabel/root2 -> > > ../../sda2 > > lrwxrwxrwx 1 root root 10 srp 5 21:47 > > /dev/disk/by-partuuid/71affb47-a93b-40fd-8986-d2e227e1b39d -> ../../sda2 > > lrwxrwxrwx 1 root root 10 srp 5 21:47 > > /dev/disk/by-path/pci-:00:11.0-ata-1-part2 -> ../../sda2 > > lrwxrwxrwx 1 root root 10 srp 5 21:47 > > /dev/disk/by-path/pci-:00:11.0-scsi-0:0:0:0-part2 -> ../../sda2 > > > > Few years ago, udev even dropped support for renaming block and > > character devices (NAME="...") so that it now keeps kernel name and only > > creates symlinks to it. Recent versions only allow NAME="..." for > > network devices. > > ok thanks for the details. This looks like names that are structured > on hardware info which could fall into devlinks scope and they point > to a single name. > We should think about keeping them under devlink (by-id, by-mac etc). > It already can recognize network interfaces by id. Not all of them are hardware based, there are also links based on filesystem label or UUID. But my point is rather that udev creates multiple links so that any of them can be used in any place where a block device is to be identified. As network devices can have only one name, udev drops kernel provided name completely and replaces it with name following one naming scheme. Thus we have to know which naming scheme is going to be used and make sure it does not change. With multiple alternative names, we could also have all udev provided names at once (and also the original one from kernel). Michal
Re: [PATCH bpf-next v2 4/4] selftests/bpf: add sockopt clone/inheritance test
On 8/9/19 9:10 AM, Stanislav Fomichev wrote: > Add a test that calls setsockopt on the listener socket which triggers > BPF program. This BPF program writes to the sk storage and sets > clone flag. Make sure that sk storage is cloned for a newly > accepted connection. > > We have two cloned maps in the tests to make sure we hit both cases > in bpf_sk_storage_clone: first element (sk_storage_alloc) and > non-first element(s) (selem_link_map). > > Cc: Martin KaFai Lau > Cc: Yonghong Song > Signed-off-by: Stanislav Fomichev Acked-by: Yonghong Song
Re: [PATCH bpf-next v2 1/4] bpf: export bpf_map_inc_not_zero
On 8/9/19 9:10 AM, Stanislav Fomichev wrote: > Rename existing bpf_map_inc_not_zero to __bpf_map_inc_not_zero to > indicate that it's caller's responsibility to do proper locking. > Create and export bpf_map_inc_not_zero wrapper that properly > locks map_idr_lock. Will be used in the next commit to > hold a map while cloning a socket. > > Cc: Martin KaFai Lau > Cc: Yonghong Song > Signed-off-by: Stanislav Fomichev Acked-by: Yonghong Song
Re: [PATCH bpf-next v2 2/4] bpf: support cloning sk storage on accept()
On 8/9/19 9:10 AM, Stanislav Fomichev wrote: > Add new helper bpf_sk_storage_clone which optionally clones sk storage > and call it from sk_clone_lock. > > Cc: Martin KaFai Lau > Cc: Yonghong Song > Signed-off-by: Stanislav Fomichev Acked-by: Yonghong Song
Re: [v4,1/4] tools: bpftool: add net attach command to attach XDP on interface
On Fri, Aug 9, 2019 at 6:35 AM Daniel T. Lee wrote: > > By this commit, using `bpftool net attach`, user can attach XDP prog on > interface. New type of enum 'net_attach_type' has been made, as stated at > cover-letter, the meaning of 'attach' is, prog will be attached on interface. > > With 'overwrite' option at argument, attached XDP program could be replaced. > Added new helper 'net_parse_dev' to parse the network device at argument. > > BPF prog will be attached through libbpf 'bpf_set_link_xdp_fd'. > > Signed-off-by: Daniel T. Lee > --- > tools/bpf/bpftool/net.c | 136 +--- > 1 file changed, 129 insertions(+), 7 deletions(-) > > diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c > index 67e99c56bc88..74cc346c36cd 100644 > --- a/tools/bpf/bpftool/net.c > +++ b/tools/bpf/bpftool/net.c > @@ -55,6 +55,35 @@ struct bpf_attach_info { > __u32 flow_dissector_id; > }; > > +enum net_attach_type { > + NET_ATTACH_TYPE_XDP, > + NET_ATTACH_TYPE_XDP_GENERIC, > + NET_ATTACH_TYPE_XDP_DRIVER, > + NET_ATTACH_TYPE_XDP_OFFLOAD, > +}; > + > +static const char * const attach_type_strings[] = { > + [NET_ATTACH_TYPE_XDP] = "xdp", > + [NET_ATTACH_TYPE_XDP_GENERIC] = "xdpgeneric", > + [NET_ATTACH_TYPE_XDP_DRIVER]= "xdpdrv", > + [NET_ATTACH_TYPE_XDP_OFFLOAD] = "xdpoffload", > +}; > + > +const size_t net_attach_type_size = ARRAY_SIZE(attach_type_strings); > + > +static enum net_attach_type parse_attach_type(const char *str) > +{ > + enum net_attach_type type; > + > + for (type = 0; type < net_attach_type_size; type++) { > + if (attach_type_strings[type] && > + is_prefix(str, attach_type_strings[type])) > + return type; > + } > + > + return net_attach_type_size; > +} > + > static int dump_link_nlmsg(void *cookie, void *msg, struct nlattr **tb) > { > struct bpf_netdev_t *netinfo = cookie; > @@ -223,6 +252,97 @@ static int query_flow_dissector(struct bpf_attach_info > *attach_info) > return 0; > } > > +static int net_parse_dev(int *argc, char ***argv) > +{ > + int ifindex; > + > + if (is_prefix(**argv, "dev")) { > + NEXT_ARGP(); > + > + ifindex = if_nametoindex(**argv); > + if (!ifindex) > + p_err("invalid devname %s", **argv); > + > + NEXT_ARGP(); > + } else { > + p_err("expected 'dev', got: '%s'?", **argv); > + return -1; > + } > + > + return ifindex; > +} > + > +static int do_attach_detach_xdp(int progfd, enum net_attach_type attach_type, > + int ifindex, bool overwrite) > +{ > + __u32 flags = 0; > + > + if (!overwrite) > + flags = XDP_FLAGS_UPDATE_IF_NOEXIST; > + if (attach_type == NET_ATTACH_TYPE_XDP_GENERIC) > + flags |= XDP_FLAGS_SKB_MODE; > + if (attach_type == NET_ATTACH_TYPE_XDP_DRIVER) > + flags |= XDP_FLAGS_DRV_MODE; > + if (attach_type == NET_ATTACH_TYPE_XDP_OFFLOAD) > + flags |= XDP_FLAGS_HW_MODE; > + > + return bpf_set_link_xdp_fd(ifindex, progfd, flags); > +} > + > +static int do_attach(int argc, char **argv) > +{ > + enum net_attach_type attach_type; > + int progfd, ifindex, err = 0; > + bool overwrite = false; > + > + /* parse attach args */ > + if (!REQ_ARGS(5)) > + return -EINVAL; > + > + attach_type = parse_attach_type(*argv); > + if (attach_type == net_attach_type_size) { > + p_err("invalid net attach/detach type: %s", *argv); > + return -EINVAL; > + } > + NEXT_ARG(); > + > + progfd = prog_parse_fd(&argc, &argv); > + if (progfd < 0) > + return -EINVAL; > + > + ifindex = net_parse_dev(&argc, &argv); > + if (ifindex < 1) { > + close(progfd); > + return -EINVAL; > + } > + > + if (argc) { > + if (is_prefix(*argv, "overwrite")) { > + overwrite = true; > + } else { > + p_err("expected 'overwrite', got: '%s'?", *argv); > + close(progfd); > + return -EINVAL; > + } > + } > + > + /* attach xdp prog */ > + if (is_prefix("xdp", attach_type_strings[attach_type])) > + err = do_attach_detach_xdp(progfd, attach_type, ifindex, > + overwrite); > + > + if (err < 0) { > + p_err("interface %s attach failed: %s", > + attach_type_strings[attach_type], strerror(errno)); > + return err; > + } I tried the below example, -bash-4.4$ sudo ./bpftool net attach x pinned /sys/fs/bpf/xdp_example dev v1 -bash-4.4$ sudo ./bpftool net attach x
Re: [v4,2/4] tools: bpftool: add net detach command to detach XDP on interface
On Fri, Aug 9, 2019 at 6:35 AM Daniel T. Lee wrote: > > By this commit, using `bpftool net detach`, the attached XDP prog can > be detached. Detaching the BPF prog will be done through libbpf > 'bpf_set_link_xdp_fd' with the progfd set to -1. > > Signed-off-by: Daniel T. Lee > --- > tools/bpf/bpftool/net.c | 42 - > 1 file changed, 41 insertions(+), 1 deletion(-) > > diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c > index 74cc346c36cd..ef1e576c6dba 100644 > --- a/tools/bpf/bpftool/net.c > +++ b/tools/bpf/bpftool/net.c > @@ -343,6 +343,43 @@ static int do_attach(int argc, char **argv) > return 0; > } > > +static int do_detach(int argc, char **argv) > +{ > + enum net_attach_type attach_type; > + int progfd, ifindex, err = 0; > + > + /* parse detach args */ > + if (!REQ_ARGS(3)) > + return -EINVAL; > + > + attach_type = parse_attach_type(*argv); > + if (attach_type == net_attach_type_size) { > + p_err("invalid net attach/detach type: %s", *argv); > + return -EINVAL; > + } > + NEXT_ARG(); > + > + ifindex = net_parse_dev(&argc, &argv); > + if (ifindex < 1) > + return -EINVAL; > + > + /* detach xdp prog */ > + progfd = -1; > + if (is_prefix("xdp", attach_type_strings[attach_type])) > + err = do_attach_detach_xdp(progfd, attach_type, ifindex, > NULL); > + > + if (err < 0) { > + p_err("interface %s detach failed: %s", > + attach_type_strings[attach_type], strerror(errno)); > + return err; > + } Similar to previous patch, here we should use "strerror(-err)". With this fixed, you can add my ack: Acked-by: Yonghong Song > + > + if (json_output) > + jsonw_null(json_wtr); > + > + return 0; > +} > + [...]
Re: [patch net-next rfc 3/7] net: rtnetlink: add commands to add and delete alternative ifnames
On 8/10/19 12:30 AM, Jiri Pirko wrote: > Could you please write me an example message of add/remove? altnames are for existing netdevs, yes? existing netdevs have an id and a name - 2 existing references for identifying the existing netdev for which an altname will be added. Even using the altname as the main 'handle' for a setlink change, I see no reason why the GETLINK api can not take an the IFLA_ALT_IFNAME and return the full details of the device if the altname is unique. So, what do the new RTM commands give you that you can not do with RTM_*LINK?
Re: [patch net-next rfc 3/7] net: rtnetlink: add commands to add and delete alternative ifnames
On 8/11/19 7:34 PM, David Ahern wrote: > On 8/10/19 12:30 AM, Jiri Pirko wrote: >> Could you please write me an example message of add/remove? > > altnames are for existing netdevs, yes? existing netdevs have an id and > a name - 2 existing references for identifying the existing netdev for > which an altname will be added. Even using the altname as the main > 'handle' for a setlink change, I see no reason why the GETLINK api can > not take an the IFLA_ALT_IFNAME and return the full details of the > device if the altname is unique. > > So, what do the new RTM commands give you that you can not do with > RTM_*LINK? > To put this another way, the ALT_NAME is an attribute of an object - a LINK. It is *not* a separate object which requires its own set of commands for manipulating.
Re: [PATCH net] ipv4/route: do not check saddr dev if iif is LOOPBACK_IFINDEX
From: David Ahern Date: Thu, 1 Aug 2019 22:16:00 -0600 > On 8/1/19 10:13 PM, Hangbin Liu wrote: >> On Thu, Aug 01, 2019 at 01:51:25PM -0600, David Ahern wrote: >>> On 8/1/19 2:29 AM, Hangbin Liu wrote: Jianlin reported a bug that for IPv4, ip route get from src_addr would fail if src_addr is not an address on local system. \# ip route get 1.1.1.1 from 2.2.2.2 RTNETLINK answers: Invalid argument >>> >>> so this is a forwarding lookup in which case iif should be set. Based on >> >> with out setting iif in userspace, the kernel set iif to lo by default. > > right, it presumes locally generated traffic. >> >>> the above 'route get' inet_rtm_getroute is doing a lookup as if it is >>> locally generated traffic. >> >> yeah... but what about the IPv6 part. That cause a different behavior in >> userspace. > > just one of many, many annoying differences between v4 and v6. We could > try to catalog it. I think we just have to accept this difference because this change would change behavior for all route lookups, not just those done by ip route get.
Re: [PATCH net] netdevsim: Restore per-network namespace accounting for fib entries
From: David Ahern Date: Tue, 6 Aug 2019 12:15:17 -0700 > From: David Ahern > > Prior to the commit in the fixes tag, the resource controller in netdevsim > tracked fib entries and rules per network namespace. Restore that behavior. > > Fixes: 5fc494225c1e ("netdevsim: create devlink instance per netdevsim > instance") > Signed-off-by: David Ahern Applied, thanks for bringing this to our attention and fixing it David. Jiri, I disagree you on every single possible level. If you didn't like how netdevsim worked in this area the opportunity to do something about it was way back when it went in. No matter how completely busted or disagreeable an interface is, once we have committed it to a release (and in particular people are knowingly using and depending upon it) you cannot break it. It doesn't matter how much you disagree with something, you cannot break it when it's out there and actively in use. Do you have any idea how much stuff I'd like to break because I think the design turned out to be completely wrong? But I can't.
Re: [PATCHv2 net 0/2] Add netdev_level_ratelimited to avoid netdev msg flush
From: Hangbin Liu Date: Fri, 9 Aug 2019 08:29:39 +0800 > ibmveth 3003 env3: h_multicast_ctrl rc=4 when adding an entry to the > filter table You need to root cause and fix the reason this message appears so much. Once I let you rate limit the message you will have zero incentive to fix the real problem and fix it.
RE: [PATCH net-next v3 2/2] qed: Add driver API for flashing the config attributes.
> -Original Message- > From: Ariel Elior > Sent: Monday, August 5, 2019 8:00 PM > To: Sudarsana Reddy Kalluru ; David Miller > > Cc: netdev@vger.kernel.org; Michal Kalderon > Subject: RE: [PATCH net-next v3 2/2] qed: Add driver API for flashing the > config attributes. > > > From: Sudarsana Reddy Kalluru > > Sent: Tuesday, July 30, 2019 6:36 AM > > To: David Miller > > > > > -Original Message- > > > From: David Miller > > > Sent: Monday, July 29, 2019 11:34 PM > > > To: Sudarsana Reddy Kalluru > > > Cc: netdev@vger.kernel.org; Michal Kalderon > ; > > > Ariel Elior > > > Subject: Re: [PATCH net-next v3 2/2] qed: Add driver API for > > > flashing the config attributes. > > > > > > From: Sudarsana Reddy Kalluru > > > Date: Sat, 27 Jul 2019 18:55:49 -0700 > > > > > > > @@ -2268,6 +2330,9 @@ static int qed_nvm_flash(struct qed_dev > > > > *cdev, > > > const char *name) > > > > rc = qed_nvm_flash_image_access(cdev, &data, > > > > &check_resp); > > > > break; > > > > + case QED_NVM_FLASH_CMD_NVM_CFG_ID: > > > > + rc = qed_nvm_flash_cfg_write(cdev, &data); > > > > + break; > > > > > > default: > > > > DP_ERR(cdev, "Unknown command %08x\n", > > > cmd_type); > > > > > > I don't see how any existing portable interface can cause this new > > > code to actually be used. > > > > > > You have to explain this to me. > > The API qed_nvm_flash() is used to flash the user provided data (e.g., > > Management FW) to the required partitions of the adapter. > >- Format of the input file would be - file signature info, followed > > by one or more data sets. > >- Each data set is represented with the header followed by its contents. > > Header captures info such as command name (e.g., FILE_START), data > > size etc., which specifies how to handle the data. > > The API qed_nvm_flash() validates the user provided input file, parses > > the data sets and handles each accordingly. Here one of the data sets > > (preferably the last one) could be nvm-attributes page (with cmd-id = > > QED_NVM_FLASH_CMD_NVM_CHANGE). > > This is basically an expansion of our existing ethtool -f implementation. > The management FW has exposed an additional method of configuring some > of the nvram options, and this makes use of that. The new code will come > into use when newer FW files which contain configuration directives > employing this API will be provided to ethtool -f. > > thanks, > Ariel Dave, The series appears as "changes requested" in patchwork. Please let us know if any modifications need to be incorporated on this series? Thanks, Sudarsana
Re: [patch net-next] netdevsim: register couple of devlink params
From: Jiri Pirko Date: Fri, 9 Aug 2019 13:05:12 +0200 > From: Jiri Pirko > > Register couple of devlink params, one generic, one driver-specific. > Make the values available over debugfs. > > Example: > $ echo "111" > /sys/bus/netdevsim/new_device > $ devlink dev param > netdevsim/netdevsim111: > name max_macs type generic > values: > cmode driverinit value 32 > name test1 type driver-specific > values: > cmode driverinit value true > $ cat /sys/kernel/debug/netdevsim/netdevsim111/max_macs > 32 > $ cat /sys/kernel/debug/netdevsim/netdevsim111/test1 > Y > $ devlink dev param set netdevsim/netdevsim111 name max_macs cmode driverinit > value 16 > $ devlink dev param set netdevsim/netdevsim111 name test1 cmode driverinit > value false > $ devlink dev reload netdevsim/netdevsim111 > $ cat /sys/kernel/debug/netdevsim/netdevsim111/max_macs > 16 > $ cat /sys/kernel/debug/netdevsim/netdevsim111/test1 > > Signed-off-by: Jiri Pirko Applied, thanks Jiri.
Re: [PATCH net-next v2 0/4] net: phy: realtek: add support for integrated 2.5Gbps PHY in RTL8125
From: Heiner Kallweit Date: Fri, 9 Aug 2019 20:41:58 +0200 > This series adds support for the integrated 2.5Gbps PHY in RTL8125. > First three patches add necessary functionality to phylib. > > Changes in v2: > - added patch 1 > - changed patch 4 to use a fake PHY ID that is injected by the > network driver. This allows to use a dedicated PHY driver. Series applied, thanks Heiner.
Re: [PATCH net-next] r8169: inline rtl8169_free_rx_databuff
From: Heiner Kallweit Date: Fri, 9 Aug 2019 22:59:07 +0200 > rtl8169_free_rx_databuff is used in only one place, so let's inline it. > We can improve the loop because rtl8169_init_ring zero's RX_databuff > before calling rtl8169_rx_fill, and rtl8169_rx_fill fills > Rx_databuff starting from index 0. > > Signed-off-by: Heiner Kallweit Applied, thanks Heiner.
Re: [PATCH net-next 0/7] net: dsa: mv88e6xxx: prepare Wait Bit operation
From: Vivien Didelot Date: Fri, 9 Aug 2019 18:47:52 -0400 > The Remote Management Interface has its own implementation of a Wait > Bit operation, which requires a bit number and a value to wait for. > > In order to prepare the introduction of this implementation, rework the > code waiting for bits and masks in mv88e6xxx to match this signature. > > This has the benefit to unify the implementation of wait routines while > removing obsolete wait and update functions and also reducing the code. Series applied, thanks.
Re: [PATCH] xen-netback: no need to check return value of debugfs_create functions
From: Greg Kroah-Hartman Date: Sat, 10 Aug 2019 12:31:08 +0200 > When calling debugfs functions, there is no need to ever check the > return value. The function can work or not, but the code logic should > never do something different based on this. > > Cc: Wei Liu > Cc: Paul Durrant > Cc: xen-de...@lists.xenproject.org > Cc: netdev@vger.kernel.org > Signed-off-by: Greg Kroah-Hartman Applied to net-next.
Re: [PATCH] caif: no need to check return value of debugfs_create functions
From: Greg Kroah-Hartman Date: Sat, 10 Aug 2019 12:42:43 +0200 > When calling debugfs functions, there is no need to ever check the > return value. The function can work or not, but the code logic should > never do something different based on this. > > Cc: Richard Fontana > Cc: Steve Winslow > Cc: netdev@vger.kernel.org > Signed-off-by: Greg Kroah-Hartman Applied.
Re: [PATCH net] mlxsw: spectrum_ptp: Keep unmatched entries in a linked list
From: Ido Schimmel Date: Sun, 11 Aug 2019 10:48:37 +0300 > From: Petr Machata > > To identify timestamps for matching with their packets, Spectrum-1 uses a > five-tuple of (port, direction, domain number, message type, sequence ID). > If there are several clients from the same domain behind a single port > sending Delay_Req's, the only thing differentiating these packets, as far > as Spectrum-1 is concerned, is the sequence ID. Should sequence IDs between > individual clients be similar, conflicts may arise. That is not a problem > to hardware, which will simply deliver timestamps on a first comes, first > served basis. > > However the driver uses a simple hash table to store the unmatched pieces. > When a new conflicting piece arrives, it pushes out the previously stored > one, which if it is a packet, is delivered without timestamp. Later on as > the corresponding timestamps arrive, the first one is mismatched to the > second packet, and the second one is never matched and eventually is GCd. > > To correct this issue, instead of using a simple rhashtable, use rhltable > to keep the unmatched entries. > > Previously, a found unmatched entry would always be removed from the hash > table. That is not the case anymore--an incompatible entry is left in the > hash table. Therefore removal from the hash table cannot be used to confirm > the validity of the looked-up pointer, instead the lookup would simply need > to be redone. Therefore move it inside the critical section. This > simplifies a lot of the code. > > Fixes: 8748642751ed ("mlxsw: spectrum: PTP: Support SIOCGHWTSTAMP, > SIOCSHWTSTAMP ioctls") > Reported-by: Alex Veber > Signed-off-by: Petr Machata > Signed-off-by: Ido Schimmel Applied.