Re: [ovs-dev] [PATCH 3/5] ovsdb-idl: IDL Compound Indexes Implementation

Ben Pfaff Mon, 11 Apr 2016 15:32:43 -0700

> > > In the C IDL, allows to create multicolumn indexes in the tables, 
> > > that are keep synched with the data in the replica.
> > >
> > > Signed-off-by: Esteban Rodriguez Betancourt <esteb...@hpe.com>


> > I don't understand why indexes have string names.  It seems like kind 
> > of a curious design.  Why isn't the client expected to retain a 
> > pointer to the index that it wants to traverse, instead of a name?
> 
> One reason is to prevent the developer from modifying the values of
> the index by referencing it using a string instead of giving direct
> access to the index data structure.

That doesn't seem like a good reason.  Data hiding is valuable, but we
don't use it in OVS simply because we don't trust developers' intention;
on the contrary, we rely on developers to do the right thing.
Occasionally we do motivate data hiding, etc. on the basis that it would
simply be too easy for a developer to accidentally do the wrong thing;
for example, this is why the IDL public interface always provides access
to rows and columns via "const"-qualified pointers: otherwise it would
be too easy to accidentally modify a value.

> The other one is to allow a more intuitive way to refer to the
> indexes, and to select them when programming.

I don't understand this yet, can you explain further?  Usually, an
appropriately named variable is a good way to select something.

> > I found myself wondering whether there's value in doing dynamic 
> > allocation of the three parallel arrays inside struct ovsdb_idl_index.
> > It would be very unusual to have an index over more than, say, 3 
> > columns, and so it might be easier to have fixed-size arrays of 3 
> > elements (or a few more, to allow room for expansion).  I imagine that 
> > the columns in indexes will be fixed at build time, after all.
> > 
> > ovsdb_idl_index_add_column() seems to be working hard to avoid using 
> > xrealloc(), but I don't know why.
> 
> Changed in v2 patch. The indexes support any number of "columns" for
> doing the comparison. But that number isn't limited to the number of
> columns in a row, because eventually one column can have several
> values indexed, for example, different keys from a map column.

Hmm, OK, I didn't realize that was a possibility.

> > I don't understand why only certain columns have a default comparison 
> > function.  Why not use ovsdb_datum_compare_3way() as the default for 
> > every column?  I guess that the ovsdb_datums and ovsdb_type would have 
> > to be passed into column_comparator instead of a pair of void * 
> > pointers, but that would arguably be an improvement anyhow, certainly 
> > from a type- safety viewpoint.
> 
> We decided against using the datums for the comparison because it
> seems intuitive that the parsed values in the ovsrec are easier and
> faster to compare.
> 
> In some cases, a table could have a really really high number of rows,
> and receive a lot of updates per second, so is important that the
> comparison functions are fast.

To me, performance seems like a difficult way to justify this argument,
because the big-O is the same either way, and it's really not clear to
me that the actual comparisons are a significant cost when maintaining
the indexes.  "Easier" seems like a difficult argument too, since it's
really easy to just use ovsdb_datum_compare_3way() as a default.

> > Do you have an example of a useful use for a custom comparison function?
> 
> Sorting a string column, that contains IPv4 or IPv6 values.
> Sort an specific member inside a map column Sort a string as a number Sort 
> strings as dates Sort by a value contained in a referenced row

OK...  Most of these seem reasonable, but I don't think that the last
one could work because I doubt that the index would be updated if the
referenced row changed (although I guess the value in the referenced row
could be immutable).

> > The generic comparison functions takes 'row_sync' into account.  Do 
> > custom comparison functions also need to take 'row_sync' into account?
> > (Why or why not?)
> 
> No they don't. The row_sync is used exclusively by the generic comparison 
> function.
> More about it below.
> 
> > It looks like row_sync is needed because the skiplist data structure 
> > implemented in the previous patch does not tolerate insertion of duplicates.
> > That seems like a serious drawback for a data structure that is 
> > supposed to be used as an index that may contain duplicates.
> > Did you consider implementing a skip list (or other balanced tree
> > structure) that gracefully supports duplicates?  That would be a much 
> > more straightforward solution to the problem.
> 
> Allowing duplicates in the skiplist would make difficult to have a fast 
> mechanism for deleting/modifying rows.
> 
> If the "key" is duplicated then we would need to iterate over all the equal 
> keys, until we found the one with the value (pointer to row) that we want. If 
> the "key"
> is uncommon (e.g.: an UUID) that couldn't be very bad, but if the "key" is 
> common (e.g.: a Boolean) each update/deletion of a row would need to iterate 
> over many rows, potentially thousands.
> 
> That would be O(n), not O(log(n)). The row_sync solution was a balance 
> between good performance for compound indexes, and adding a data structure 
> for general use.

OK.

Please come up with a better name for 'row_sync', as-is it doesn't help
anyone to understand what's going on.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH 3/5] ovsdb-idl: IDL Compound Indexes Implementation

Reply via email to