> > > In the C IDL, allows to create multicolumn indexes in the tables, > > > that are keep synched with the data in the replica. > > > > > > Signed-off-by: Esteban Rodriguez Betancourt <esteb...@hpe.com>
> > I don't understand why indexes have string names. It seems like kind > > of a curious design. Why isn't the client expected to retain a > > pointer to the index that it wants to traverse, instead of a name? > > One reason is to prevent the developer from modifying the values of > the index by referencing it using a string instead of giving direct > access to the index data structure. That doesn't seem like a good reason. Data hiding is valuable, but we don't use it in OVS simply because we don't trust developers' intention; on the contrary, we rely on developers to do the right thing. Occasionally we do motivate data hiding, etc. on the basis that it would simply be too easy for a developer to accidentally do the wrong thing; for example, this is why the IDL public interface always provides access to rows and columns via "const"-qualified pointers: otherwise it would be too easy to accidentally modify a value. > The other one is to allow a more intuitive way to refer to the > indexes, and to select them when programming. I don't understand this yet, can you explain further? Usually, an appropriately named variable is a good way to select something. > > I found myself wondering whether there's value in doing dynamic > > allocation of the three parallel arrays inside struct ovsdb_idl_index. > > It would be very unusual to have an index over more than, say, 3 > > columns, and so it might be easier to have fixed-size arrays of 3 > > elements (or a few more, to allow room for expansion). I imagine that > > the columns in indexes will be fixed at build time, after all. > > > > ovsdb_idl_index_add_column() seems to be working hard to avoid using > > xrealloc(), but I don't know why. > > Changed in v2 patch. The indexes support any number of "columns" for > doing the comparison. But that number isn't limited to the number of > columns in a row, because eventually one column can have several > values indexed, for example, different keys from a map column. Hmm, OK, I didn't realize that was a possibility. > > I don't understand why only certain columns have a default comparison > > function. Why not use ovsdb_datum_compare_3way() as the default for > > every column? I guess that the ovsdb_datums and ovsdb_type would have > > to be passed into column_comparator instead of a pair of void * > > pointers, but that would arguably be an improvement anyhow, certainly > > from a type- safety viewpoint. > > We decided against using the datums for the comparison because it > seems intuitive that the parsed values in the ovsrec are easier and > faster to compare. > > In some cases, a table could have a really really high number of rows, > and receive a lot of updates per second, so is important that the > comparison functions are fast. To me, performance seems like a difficult way to justify this argument, because the big-O is the same either way, and it's really not clear to me that the actual comparisons are a significant cost when maintaining the indexes. "Easier" seems like a difficult argument too, since it's really easy to just use ovsdb_datum_compare_3way() as a default. > > Do you have an example of a useful use for a custom comparison function? > > Sorting a string column, that contains IPv4 or IPv6 values. > Sort an specific member inside a map column Sort a string as a number Sort > strings as dates Sort by a value contained in a referenced row OK... Most of these seem reasonable, but I don't think that the last one could work because I doubt that the index would be updated if the referenced row changed (although I guess the value in the referenced row could be immutable). > > The generic comparison functions takes 'row_sync' into account. Do > > custom comparison functions also need to take 'row_sync' into account? > > (Why or why not?) > > No they don't. The row_sync is used exclusively by the generic comparison > function. > More about it below. > > > It looks like row_sync is needed because the skiplist data structure > > implemented in the previous patch does not tolerate insertion of duplicates. > > That seems like a serious drawback for a data structure that is > > supposed to be used as an index that may contain duplicates. > > Did you consider implementing a skip list (or other balanced tree > > structure) that gracefully supports duplicates? That would be a much > > more straightforward solution to the problem. > > Allowing duplicates in the skiplist would make difficult to have a fast > mechanism for deleting/modifying rows. > > If the "key" is duplicated then we would need to iterate over all the equal > keys, until we found the one with the value (pointer to row) that we want. If > the "key" > is uncommon (e.g.: an UUID) that couldn't be very bad, but if the "key" is > common (e.g.: a Boolean) each update/deletion of a row would need to iterate > over many rows, potentially thousands. > > That would be O(n), not O(log(n)). The row_sync solution was a balance > between good performance for compound indexes, and adding a data structure > for general use. OK. Please come up with a better name for 'row_sync', as-is it doesn't help anyone to understand what's going on. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev