On Thu, Nov 21, 2013 at 02:25:31PM -0800, Jarno Rajahalme wrote: > Add a prefix tree (trie) structure for tracking the used address > space, enabling skipping classifier tables containing longer masks > than necessary for an address field value in a flow being classified. > This enables less unwildcarding for datapath flows in parts of the > address space without host routes. > > Trie lookup is interwoven to the staged lookup, so that a trie is > searched only when the configured trie field field becomes relevant > for the lookup. The trie lookup results are retained so that each > trie is checked at most once for each classifier lookup. > > This implementation tracks the number of rules at each address prefix > for the whole classifier. More aggressive table skipping would be > possible by maintaining lists of tables that have prefixes at the > lengths encountered on tree traversal, or by maintaining separate > tries for subsets of rules separated by metadata fields. > > Prefix tracking is configured via OVSDB. A new column "fieldspec" is > added to the database table "Flow_Table". "fieldspec" is a string map > with only the "prefix" key defined so far (other keys can be added in > future): > > "prefix" A comma separated list of field names for which prefix > lookup should be used. > > The fields for which prefix lookup can be enabled are: > - tun_id, tun_src, tun_dst > - metadata > - reg0 - reg7 > - nw_src, nw_dst (or aliases ip_src and ip_dst) > - ipv6_src, ipv6_dst > > There is a maximum number of fields that can be enabled for any one > flow table. Currently this limit is 3. > > Examples: > > ovs-vsctl set Bridge br0 flow_tables:0=@N1 -- \ > --id=@N1 create Flow_Table name=table0 > ovs-vsctl set Bridge br0 flow_tables:1=@N1 -- \ > --id=@N1 create Flow_Table name=table1 > > ovs-vsctl set Flow_Table table0 fieldspec:prefix=tun_id > ovs-vsctl set Flow_Table table1 fieldspec:prefix=ip_dst,ip_src > > ovs-vsctl remove Flow_Table table1 fieldspec prefix > > Signed-off-by: Jarno Rajahalme <jrajaha...@nicira.com>
I am not done studying this code, but I have some preliminary comments. Some of the fields that allow prefix length are in network byte order, and others (the registers) are in host byte order, but the code that looks at the fields doesn't seem to distinguish. "Prefixes" therefore turn into suffixes for the registers, but only on little-endian machines. vswitch.xml should document the new Flow_Table column. Thanks for correcting the comment on miniflow_init__(). I'm not sure that I would list the fields that can be used for prefix lookup so explicitly in the classifier.h comment, because lists of this kind tend to get out of date. trie_destroy() doesn't follow the usual OVS convention that a destroy function should accept NULL, but making it accept NULL would allow some callers to be slightly simplified. I don't see anything in classifier_set_prefix_fields() or its callers to detect and disallow listing the same field twice. Does it matter? mf_from_id() assert-fails on a bad id, but classifier_set_prefix_fields() still checks for a null return value. minimask_get_prefix_len() adds restriction to the fields that can be used for prefix matching, that is not mentioned elsewhere: it must be a multiple of 4 bytes long. trie_insert() adds a subtle variation that it must be a multiple of 32 bits long. The latter seems more appropriate, since we could have, say, 24-bit field that is aligned on a 4-byte boundary, and none of the code seems to deal with anything that is not a multiple of 32 bits. minimask_get_prefix_len() doesn't seem to be on a fast path, so it might be better to write really clear code by using minimask_get() in place of directly dealing with 'map' and values'. In minimask_get_prefix_len(), I think that: if (mask) { mask_tz = raw_ctz(mask); if (~mask & (~0u << mask_tz)) { return 0; /* Mask not contiguous. */ } nbits += 32 - mask_tz; } else { mask_tz = 32; /* End of mask seen. */ } could be simplified to just: if (~mask & (~mask + 1)) { return 0; /* Mask not contiguous. */ } mask_tz = ctz32(mask); nbits += 32 - mask_tz; using the alternative test for a non-CIDR mask found in ip_is_cidr(). (I did not test this change.) The !mf test at the top of trie_insert() seems odd. Is it really possible for mf to be null here? In trie_insert(), is the call to minimask_get_prefix_len() necessary? I may be missing something but I'm not sure how the rule's prefix len could be different from the rule's subtable's prefix length. trie_prefix_equal_bits() looks to me like it doesn't handle the case where ofs...plen crosses a 32-bit boundary, since it only looks at a single word in the prefix[] array. But the code for that function is really opaque to me so maybe I just misunderstand. trie_equal_bits() and trie_prefix_equal_bits() are very similar, I suspect that they could be written as trivial wrappers around a common function. Thanks, Ben. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev