> On Nov 25, 2015, at 3:14 PM, Joe Stringer <j...@ovn.org> wrote: > > On 25 November 2015 at 15:12, Joe Stringer <joestrin...@nicira.com> wrote: >> On 25 November 2015 at 15:06, Jarno Rajahalme <ja...@ovn.org> wrote: >>> >>>> On Nov 25, 2015, at 2:58 PM, Joe Stringer <j...@ovn.org> wrote: >>>> >>>> On 25 November 2015 at 11:23, Jarno Rajahalme <ja...@ovn.org >>>> <mailto:ja...@ovn.org>> wrote: >>>>> >>>>> On Nov 25, 2015, at 11:11 AM, Jarno Rajahalme <ja...@ovn.org> wrote: >>>>> >>>>> >>>>> On Nov 25, 2015, at 10:52 AM, Joe Stringer <j...@ovn.org> wrote: >>>>> >>>>> On 25 November 2015 at 10:31, Jarno Rajahalme <ja...@ovn.org> wrote: >>>>> >>>>> >>>>> On Nov 24, 2015, at 5:02 PM, Joe Stringer <j...@ovn.org> wrote: >>>>> >>>>> On 24 November 2015 at 13:41, Jarno Rajahalme <ja...@ovn.org> wrote: >>>>> >>>>> Sometimes xlate_actions() fails due to too deep recursion, too many >>>>> MPLS labels, or missing recirculation context. Make xlate_actions() >>>>> clear out the produced odp actions in these cases to make it easy for >>>>> the caller to install a drop flow (instead or installing a flow with >>>>> partially translated actions). Also, return a specific error code, so >>>>> that the error can be properly propagated where meaningful. >>>>> >>>>> Before this patch it was possible that the revalidation installed a >>>>> flow with a recirculation ID with an invalid recirc ID (== 0), due to >>>>> the introduction of in-place modification in commit 43b2f131a229 >>>>> (ofproto: Allow in-place modifications of datapath flows). >>>>> >>>>> Signed-off-by: Jarno Rajahalme <ja...@ovn.org> >>>>> >>>>> >>>>> Should this also set the error when receiving packets on a mirror port >>>>> in xlate_actions()? Or when receiving tagged VLAN traffic that doesn't >>>>> correspond to the port's vlan tag? Or when a group has no live bucket? >>>>> Are there any other cases that should also be covered? (I just scanned >>>>> across ofproto/ofproto-dpif-xlate.c looking for cases where we're >>>>> already logging that we drop the packet, but maybe there's a reasoning >>>>> behind not including these - if so, please enlighten me) >>>>> >>>>> >>>>> No reasoning for missing those, I just did not notice them. Thanks for >>>>> pointing them out. >>>>> >>>>> >>>>> OK, I thought it may have been something like "expected errors" vs. >>>>> "unexpected errors". >>>>> >>>>> >>>>> Looking into these I noticed this to be the case. Must discern whether to >>>>> fail just the individual action v.s. the whole pipeline. >>>>> >>>>> >>>>> How about this incremental to cover two cases here (rest are “expected >>>>> errors” IMO): >>>>> >>>>> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c >>>>> index 36a6fbc..2908339 100644 >>>>> --- a/ofproto/ofproto-dpif-xlate.c >>>>> +++ b/ofproto/ofproto-dpif-xlate.c >>>>> @@ -336,6 +336,10 @@ const char *xlate_strerror(enum xlate_error error) >>>>> return "Recirculation conflict"; >>>>> case XLATE_TOO_MANY_MPLS_LABELS: >>>>> return "Too many MPLS labels"; >>>>> + case XLATE_BUCKET_CHAINING_TOO_DEEP: >>>>> + return "Bucket chaining too deep"; >>>>> + case XLATE_NO_INPUT_BUNDLE: >>>>> + return "No input bundle"; >>>>> } >>>>> return "Unknown error"; >>>>> } >>>>> @@ -1444,10 +1448,9 @@ bucket_is_alive(const struct xlate_ctx *ctx, >>>>> struct ofputil_bucket *bucket, int depth) >>>>> { >>>>> if (depth >= MAX_LIVENESS_RECURSION) { >>>>> - static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1); >>>>> - >>>>> - VLOG_WARN_RL(&rl, "bucket chaining exceeded %d links", >>>>> - MAX_LIVENESS_RECURSION); >>>>> + XLATE_REPORT_ERROR(ctx, "bucket chaining exceeded %d links", >>>>> + MAX_LIVENESS_RECURSION); >>>>> + ctx->error = XLATE_BUCKET_CHAINING_TOO_DEEP; >>>>> return false; >>>>> } >>>>> >>>>> @@ -2323,7 +2326,8 @@ xlate_normal(struct xlate_ctx *ctx) >>>>> in_xbundle = lookup_input_bundle(ctx->xbridge, flow->in_port.ofp_port, >>>>> ctx->xin->packet != NULL, &in_port); >>>>> if (!in_xbundle) { >>>>> - xlate_report(ctx, "no input bundle, dropping"); >>>>> + XLATE_REPORT_ERROR(ctx, "no input bundle, dropping"); >>>>> + ctx->error = XLATE_NO_INPUT_BUNDLE; >>>>> return; >>>>> } >>>>> >>>>> >>>>> The last one is debatable, as setting the error fails the whole >>>>> translation >>>>> rather than just the normal action. But this is most likely an >>>>> configuration >>>>> error, so maybe failing the whole pipeline (and installing a drop flow) is >>>>> the right thing to do here? >>>> >>>> Jarno and I discussed this offline, and I'll try to summarise here. >>>> Broadly speaking, we're talking about the decision between failing an >>>> individual (piece of an) action or completely failing the action >>>> processing for the flow. And I think arguably the approach should be >>>> that if it is a serious error such as running out of resources or an >>>> internal conflict of recirc IDs, then we should fail the entire action >>>> processing. In this case it will have two user-visible effects: >>>> 1) ofproto/trace will tell the user which serious condition is being >>>> triggered that causes dropping of the flow >>>> 2) OpenFlow controllers attempting packet_out could be notified that >>>> the error occurred (rather than silently failing like currently) >>>> >>>> However, in the two cases in the incremental patch here, the actions >>>> inherently have some ambiguity as to whether they successfully execute >>>> (eg output) or not. The more obvious case is in the bucket_is_alive() >>>> logic, where recursion will cause a bucket to be not used. If a bucket >>>> is not live in the spec, this doesn't mean that the entire flow should >>>> stop processing. In the case of normal, I'd argue it's very similar in >>>> that 'normal' doesn't specifically attempt to output to a particular >>>> port; sending packets out to different ports may fail for different >>>> reasons, but this shouldn't prevent later actions in the actions list >>>> from being executed. >>>> >>>> I think the latter cases should be reported for ofproto/trace, though. >>>> >>>> Looking back across this thread, it looks not far off your reasoning >>>> described earlier so I think we're converging on the same view. Does >>>> this sound like a fair approach? >>>> >>>> -- >>>> >>>> In the mirror case, the point is moot because do_xlate_actions() isn't >>>> even called in that case, so it's purely a matter of whether we want >>>> to return the error up the stack or not. Maybe that should be reported >>>> for ofproto/trace as well. >>>> >>>> I didn't see any other cases that might need handling through this. >>> >>> So the only ask I see here is that more of the cases of individual actions >>> bailing out should have xlate_report() calls on them. To me this sounds >>> like a different patch, not directly related to erroring out the whole >>> translation. As such I hope to get an Ack on the original patch of this now >>> lengthy discussion… >> >> I agree. >> >> Acked-by: Joe Stringer <j...@ovn.org> > > Ah, mismatch between my sender email an the ack. Here: > > Acked-by: Joe Stringer <j...@ovn.org>
Thanks for the thorough review, pushed to master. Jarno _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev