Re: [ovs-dev] [PATCH] Genericize/simplify kernel sFlow implementation

Jesse Gross Mon, 12 Sep 2011 16:38:12 -0700

On Tue, Aug 30, 2011 at 6:59 PM, Pravin Shelar <pshe...@nicira.com> wrote:
> diff --git a/datapath/actions.c b/datapath/actions.c
> index 8aec438..3178bdd 100644
> --- a/datapath/actions.c
> +++ b/datapath/actions.c
> +static int sample(struct datapath *dp, struct sk_buff *skb,
> +                 const struct nlattr *a)
> +{
> +       u32 probability;
> +       const struct nlattr *sample_args, *nested_act;
> +       int rem, err;
> +
> +       rem = nla_len(a);
> +
> +       /* First arg: probability */


It's not really good to be mandating specific positioning of the
attributes since it makes future compatibility more difficult.  This
is perhaps an exception because it is on the fast path (another good
reason to not do Netlink parsing here).  On the other hand, maybe it's
better to do it right and then we can optimize it in the future.  In
any case, this function currently has a strange duality between caring
about performance or not.  For example, nla_for_each_nested() has
extra checks that aren't necessary because we already did validation.

> +       sample_args = nla_data(a);
> +       memcpy(&probability, nla_data(sample_args), sizeof (probability));

I noticed that both in the kernel and userspace there is a lot of
direct access to the probability argument either like this or treating
it as unspec.  Where possible possible we should use the existing
helper functions like nla_get_u32().

> +       /* Second arg: actions to execute */
> +       sample_args = nla_next(sample_args, &rem);
> +       /* Second arg data contains actions to execute */
> +       nla_for_each_nested(nested_act, sample_args, rem) {
> +               switch (nla_type(nested_act)) {
> +                       case OVS_ACTION_ATTR_USERSPACE:
> +                               err = output_userspace(dp, skb,
> +                                                nla_get_u64(nested_act));
> +                               return err;
> +                       break;
> +                       default:
> +                               return -EINVAL;
> +               }
> +       }

I think that it's reasonable to restrict the supported actions
initially but it's not clear to me that it's actually a win either
here or any of the other places that this is now scattered around the
code base.  Since handling the new action list is simply a recursive
call, I think it's actually pretty straightforward to support all
actions without any special casing.

> +       return -EINVAL;

I think having no actions should be a legal, if odd, argument.

> @@ -258,6 +287,9 @@ static int do_execute_actions(struct datapath *dp, struct 
> sk_buff *skb,
>                }
>
>                switch (nla_type(a)) {
> +               case ODP_ACTION_ATTR_SAMPLING:
> +                       err = sample(dp, skb, a);

The action is called SAMPLING but most places refer to sample, which
seems like a better name and is more consistent with the other actions
which are verbs.

Also, I think it makes sense to put new actions at the bottom rather
than the top.

> diff --git a/datapath/datapath.c b/datapath/datapath.c
> index 0b6e2e5..c6ab8a8 100644
> --- a/datapath/datapath.c
> +++ b/datapath/datapath.c
> +static int validate_sample(const struct nlattr *a)
> +{
> +       const struct nlattr *sample_args, *nested_act;
> +       int rem;
> +
> +       rem = nla_len(a);
> +
> +       /* First arg: probability */
> +       sample_args = nla_data(a);
> +       if (nla_type(sample_args) != OVS_SAMPLE_ATTR_PROBABILITY)
> +               return -EINVAL;
> +
> +       if (nla_len(sample_args) != sizeof (unsigned int))
> +               return -EINVAL;

The actual type of sample_args is u32, not unsigned int so this
comparison doesn't logically make sense even if they happen to be the
same.

> +       /* Second arg: actions to execute */
> +       if (!nla_ok(sample_args, rem))
> +               return -EINVAL;

Doesn't this check actually relate to the first attribute, not the second?

> +       sample_args = nla_next(sample_args, &rem);
> +       if (nla_type(sample_args) != OVS_SAMPLE_ATTR_ACTIONS)
> +               return -EINVAL;
> +
> +       /* Second arg data contains actions to execute */
> +       nla_for_each_nested(nested_act, sample_args, rem) {
> +               switch (nla_type(nested_act)) {
> +                       case OVS_ACTION_ATTR_USERSPACE:
> +                               if (nla_len(nested_act) != 8)
> +                                       return -EINVAL;
> +                               return 0;
> +                       break;
> +                       default:
> +                               /* at this moment userspace is only
> +                                * supported action */
> +                               return -EINVAL;
> +               }
> +       }

It would be a lot nicer if this function could be replaced with a call
to nla_validate_nested() and a recursive invocation of
validate_actions().

> diff --git a/include/openvswitch/datapath-protocol.h 
> b/include/openvswitch/datapath-protocol.h
> index 535aab3..2b81dff 100644
> --- a/include/openvswitch/datapath-protocol.h
> +++ b/include/openvswitch/datapath-protocol.h
> +/**
> + * enum ovs_sample_attr - attributes for ODP_ACTION_ATTR_SAMPLING
> + * @OVS_SAMPLE_ATTR_PROBABILITY: Prabability for sampling event.
> + * @OVS_SAMPLE_ATTR_ACTIONS: Set of actions to execute in sampling event.
> + * actions are passed as nested attributes.

This sentence should be capitalized.

> + */
> +enum ovs_sample_attr {
> +       OVS_SAMPLE_ATTR_UNSPEC,
> +       OVS_SAMPLE_ATTR_PROBABILITY,
> +       OVS_SAMPLE_ATTR_ACTIONS,
> +       __OVS_SAMPLE_ATTR_MAX,
> +};
> +
> +

There should be a #define for OVS_SAMPLE_ATTR_MAX.  Also you have an
extra blank line.

> diff --git a/lib/odp-util.c b/lib/odp-util.c
> index 2830fe8..f377f7a 100644
> --- a/lib/odp-util.c
> +++ b/lib/odp-util.c
> +static void format_odp_sample_action(struct ds *ds, const struct nlattr *a)
> +{
> +    struct nlattr *prob, *userdata, *nla_acts;
> +    uint64_t *arg;
> +    uint32_t *probability;
> +
> +    ds_put_cstr(ds, "sample");
> +    prob = (struct nlattr *) nl_attr_find_nested(a, 
> OVS_SAMPLE_ATTR_PROBABILITY);
> +    if (prob == NULL) {
> +        ds_put_cstr(ds, "(No probability attribute)");
> +        return;
> +    }
> +
> +    probability = (uint32_t *) nl_attr_get_unspec(prob, 
> sizeof(*probability));
> +    if (probability == NULL) {
> +        ds_put_cstr(ds, "(No probability assigned)");
> +        return;
> +    }
> +
> +    nla_acts = (struct nlattr *) nl_attr_find_nested(a, 
> OVS_SAMPLE_ATTR_ACTIONS);
> +    if (nla_acts == NULL) {
> +        ds_put_cstr(ds, "(No actions for sampling)");
> +        return;
> +    }
> +
> +    userdata = (struct nlattr *) nl_attr_find_nested(nla_acts,
> +                                OVS_ACTION_ATTR_USERSPACE);
> +    if (userdata == NULL) {
> +        ds_put_cstr(ds, "(No userspace action)");
> +        return;
> +    }
> +
> +    arg = (uint64_t *) nl_attr_get_unspec(userdata, sizeof(*arg));
> +    if (arg == NULL) {
> +        ds_put_cstr(ds, "(No userspace data)");
> +        return;
> +    }

Can't you use nl_policy_parse() instead of all this?

> +    ds_put_format(ds, "(sample=%"PRIu32", ", (UINT32_MAX / *probability));
> +    ds_put_format(ds, "actions(userspace(%"PRIx64")))", *arg);

This formatting isn't consistent with that used by other actions.
Also, if there is a parse error then you'll just get a fairly generic
error message without any context on what it applies to.

dpif-sflow.c b/ofproto/ofproto-dpif-sflow.c
> index d41b7da..719905c 100644
> --- a/ofproto/ofproto-dpif-sflow.c
> +++ b/ofproto/ofproto-dpif-sflow.c
> +int dpif_sflow_get_probability(const struct dpif_sflow *ds)
> +{
> +    return MAX(1, UINT32_MAX / ds->options->sampling_rate);
> +}

Given that this now gets called on every flow setup, it seems like it
might be worth caching the result.

>  void
>  dpif_sflow_received(struct dpif_sflow *ds, const struct dpif_upcall *upcall,
> -                    const struct flow *flow)
> +                    const struct flow *flow, uint32_t sample_pool)
>  {
>     SFL_FLOW_SAMPLE_TYPE fs;
>     SFLFlow_sample_element hdrElem;
>     SFLSampled_header *header;
>     SFLFlow_sample_element switchElem;
>     SFLSampler *sampler;
> -    unsigned int left;
> -    struct nlattr *a;
> +    struct userdata_cookie data;
>     size_t n_outputs;
> +    uint16_t tci;
>
>     /* Build a flow sample */
>     memset(&fs, 0, sizeof fs);
>     fs.input = dpif_sflow_odp_port_to_ifindex(ds, flow->in_port);
>     fs.output = 0;              /* Filled in correctly below. */

I think you can just drop this initialization of fs.output.  It's
redundant with the memset and doesn't add much value now.

> @@ -527,26 +526,16 @@ dpif_sflow_received(struct dpif_sflow *ds, const struct 
> dpif_upcall *upcall,
>     switchElem.flowType.sw.dst_vlan = switchElem.flowType.sw.src_vlan;
>     switchElem.flowType.sw.dst_priority = switchElem.flowType.sw.src_priority;
>
> -    /* Figure out the output ports. */
>     n_outputs = 0;
> -    NL_ATTR_FOR_EACH_UNSAFE (a, left, upcall->actions, upcall->actions_len) {
> -        ovs_be16 tci;
> -
> -        switch (nl_attr_type(a)) {
> -        case OVS_ACTION_ATTR_OUTPUT:
> -            fs.output = dpif_sflow_odp_port_to_ifindex(ds, 
> nl_attr_get_u32(a));
> -            n_outputs++;
> -            break;
> -
> -        case OVS_ACTION_ATTR_SET_DL_TCI:
> -            tci = nl_attr_get_be16(a);
> -            switchElem.flowType.sw.dst_vlan = vlan_tci_to_vid(tci);
> -            switchElem.flowType.sw.dst_priority = vlan_tci_to_pcp(tci);
> -            break;
> -
> -        default:
> -            break;
> -        }
> +    if (data.output) {
> +        fs.output = dpif_sflow_odp_port_to_ifindex(ds, data.output);
> +        n_outputs = data.n_output;
> +    }

I'm not sure why this is protected by a check on data.output.
Shouldn't n_outputs tell us what we are looking for?

> +    tci = data.tci;
> +    if (tci) {
> +        switchElem.flowType.sw.dst_vlan = vlan_tci_to_vid(tci);
> +        switchElem.flowType.sw.dst_priority = vlan_tci_to_pcp(tci);
>     }

Here the TCI reflects the vlan tag, if any, that was added to the
packet.  Since this is an exact match flow, it's actually possible to
just store the output vlan information that we are looking for
directly.

> diff --git a/ofproto/ofproto-dpif-sflow.h b/ofproto/ofproto-dpif-sflow.h
> index c7c8b5a..374c381 100644
> --- a/ofproto/ofproto-dpif-sflow.h
> +++ b/ofproto/ofproto-dpif-sflow.h
> @@ -27,6 +27,8 @@ struct flow;
>  struct ofproto_sflow_options;
>
>  struct dpif_sflow *dpif_sflow_create(struct dpif *);
> +int dpif_sflow_get_probability(const struct dpif_sflow *);
> +
>  void dpif_sflow_destroy(struct dpif_sflow *);
>  void dpif_sflow_set_options(struct dpif_sflow *,
>                             const struct ofproto_sflow_options *);
> @@ -41,6 +43,6 @@ void dpif_sflow_run(struct dpif_sflow *);
>  void dpif_sflow_wait(struct dpif_sflow *);
>
>  void dpif_sflow_received(struct dpif_sflow *, const struct dpif_upcall *,
> -                         const struct flow *);
> +                         const struct flow *, uint32_t);

Where the argument isn't obvious from the type, you should include the
name as well.

> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
> index f09c230..fa70be6 100644
> --- a/ofproto/ofproto-dpif.c
> +++ b/ofproto/ofproto-dpif.c
> @@ -189,6 +189,9 @@ struct action_xlate_ctx {
>     struct flow base_flow;      /* Flow at the last commit. */
>     uint32_t base_priority;     /* Priority at the last commit. */
>     uint8_t table_id;           /* OpenFlow table ID where flow was found. */
> +    uint16_t sflow_odp_port;    /* Output port for composing sFlow action */
> +    uint16_t sflow_n_outputs;   /* Number of output port, for sFlow */
> +    uint16_t dst_tci;

We already track the vlan in the flow information in many places.  Is
it possible to use that or expand on it?

> @@ -764,6 +766,7 @@ set_sflow(struct ofproto *ofproto_,
>  {
>     struct ofproto_dpif *ofproto = ofproto_dpif_cast(ofproto_);
>     struct dpif_sflow *ds = ofproto->sflow;
> +
>     if (sflow_options) {
>         if (!ds) {
>             struct ofport_dpif *ofport;
> @@ -773,11 +776,15 @@ set_sflow(struct ofproto *ofproto_,
>                 dpif_sflow_add_port(ds, ofport->odp_port,
>                                     netdev_get_name(ofport->up.netdev));
>             }
> +            flush(ofproto_);

Is it actually necessary to flush all of the flows?  I think that
revalidation should be sufficient and is lighter weight.

>         }
>         dpif_sflow_set_options(ds, sflow_options);
>     } else {
>         dpif_sflow_destroy(ds);
> -        ofproto->sflow = NULL;
> +        if (ofproto->sflow) {

Everywhere else in this function refers to ofproto->sflow as ds, so
it's better to use that so it's obvious that they are the same.

> +handle_userspace_upcall(struct ofproto_dpif *ofproto,
> +                        struct dpif_upcall *upcall)
>  {
>     struct flow flow;
> +    struct userdata_cookie *data;
> +    struct ovs_dp_stats s;
>
> -    switch (upcall->type) {
> -    case DPIF_UC_ACTION:
> -        COVERAGE_INC(ofproto_dpif_ctlr_action);
> -        odp_flow_key_to_flow(upcall->key, upcall->key_len, &flow);
> -        send_packet_in(ofproto, upcall, &flow, false);
> -        break;
> +    data = (struct userdata_cookie *) &upcall->userdata;

Doesn't this cast cause strict aliasing problems?

> -    case DPIF_UC_SAMPLE:
> +    if (data->type == OVS_USERDATA_SFLOW) {

The structure of the if statements in this function is very strange
with the returns, etc.  Why not a simple if..else if?

>         if (ofproto->sflow) {
> +            dpif_get_dp_stats(ofproto->dpif, &s);
>             odp_flow_key_to_flow(upcall->key, upcall->key_len, &flow);
> -            dpif_sflow_received(ofproto->sflow, upcall, &flow);
> +            dpif_sflow_received(ofproto->sflow, upcall, &flow,
> +                                           (s.n_hit + s.n_missed));

This will miss packets that are completely synthesized by the
controller or userspace.

> +/*
> + * This function reserves space for sampling action.
> + * only information we have is sflow sampling probability. other
> + * field are set to invalid values.
> + **/
> +static void __compose_sflow_action(struct ofpbuf *odp_actions,
> +                                   uint32_t probability)
> +{
> +    struct userdata_cookie data;
> +    size_t offset1, offset2;

Can you use more descriptive variable names than data, offset1, offset2?

> +    offset1 = nl_msg_start_nested(odp_actions, ODP_ACTION_ATTR_SAMPLING);
> +
> +    nl_msg_put_unspec(odp_actions, OVS_SAMPLE_ATTR_PROBABILITY,
> +                         &probability, sizeof probability);

The userspace coding style is to use parentheses after sizeof.

> +/**
> + * Fix SAMPLING action according to data collected while computing actions.
> + * Specifically we need to fix USERSPACE argument which is required for
> + * sflow.
> + */
> +static void
> +fix_sample_action(struct action_xlate_ctx *ctx)
> +{
> +    struct userdata_cookie *arg;
> +    struct nlattr *nla, *userdata, *nla_acts;
> +
> +    if (!ctx->ofproto->sflow)
> +       return;
> +
> +    nla = (struct nlattr *) nl_attr_find(ctx->odp_actions, 0,
> +                                              ODP_ACTION_ATTR_SAMPLING);

At the moment there should only be one sampling action but the in the
future that might not be true so ideally we would handle that
gracefully.

> +    if (nla == NULL) {
> +        VLOG_ERR("No sampling attr");
> +        return;
> +    }

This really isn't something that should be logged because it indicates
a logic error.  I think an assert would be more appropriate.

>  static void
>  commit_odp_actions(struct action_xlate_ctx *ctx)
>  {
> @@ -2818,6 +2933,8 @@ commit_odp_actions(struct action_xlate_ctx *ctx)
>     struct flow *base = &ctx->base_flow;
>     struct ofpbuf *odp_actions = ctx->odp_actions;
>
> +    add_sample_action(ctx);
> +

This function is called from many places including before every output
action.  This means that for an action list of any complexity we'll
get many copies of the sampling action.

> +static void compose_con_userspace_action(struct ofpbuf *odp_actions, int len)

What does the "con" in this function name mean?  Can it be made more clear?

> diff --git a/ofproto/ofproto.h b/ofproto/ofproto.h
> index e0c99ea..acaf6e3 100644
> --- a/ofproto/ofproto.h
> +++ b/ofproto/ofproto.h
> @@ -239,6 +239,26 @@ int ofproto_port_get_cfm_fault(const struct ofproto *, 
> uint16_t ofp_port);
>  void ofproto_get_ofproto_controller_info(const struct ofproto *, struct 
> shash *);
>  void ofproto_free_ofproto_controller_info(struct shash *);
>
> +enum ovs_userdata_dst {
> +        OVS_USERDATA_CONTROLLER = 1,
> +        OVS_USERDATA_SFLOW,
> +};

Why do you need to assign an explicit value to OVS_USERDATA_CONTROLLER?

> +
> +/* userdata_cookie is passed as argument to OVS_ACTION_ATTR_USERSPACE
> + * Since is it passed to kernel,  its size can not be changed to 8 bytes */
> +struct userdata_cookie {
> +    uint16_t  type;
> +    uint16_t  tci;
> +    union {
> +        uint32_t  data;
> +        struct {
> +            uint16_t output;
> +            uint16_t n_output;

In general, port numbers are being slowly expanded to 32-bit so we
should follow that for new code.  I think that type and n_output could
each fit in 1 byte values, which would give us enough space for 32-bit
port numbers.

> +        };
> +    };
> +}__attribute__((__packed__));
> +
> +BUILD_ASSERT_DECL(8 == sizeof(struct userdata_cookie));

There's no need to include the packed attribute.  Any real compiler
will lay this out the way you expect and that's checked by an
assertion anyways.  Adding packed makes things slower on RISC
architectures even if it isn't needed.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] Genericize/simplify kernel sFlow implementation

Reply via email to