On 8/12/14 4:39 PM, Branko Čibej wrote: > This thing against flags and booleans that change API behaviour does have a > very good argument going for it: it makes the code less obvious for the sake > of > saving a few function names. > > Consider the standard example, which (these days) would be an API to a > DOM-like > object that can be displayed or not; to query its state, you'd have a method > called > > bool is_visible(); > > To change the state, the "flaggy" design would be > > void set_visibility(bool); > > but the obviously more readable design uses two methods, > > void show(); > void hide(); > > With the former, you always have to worry about what the boolean parameter > actually means; with the latter, there can be no doubt. > > So in the case of displaying the contents of checksums, first of all something > called > > svn_checksum_to_cstring_display > > is clearly a broken API, especially if it's used to format the checksum for > the > protocol, i.e., has nothing to do with displaying it at all. The docstring > says > nothing about displaying; and there's no difference between this function and > svn_checksum_to_cstring, other than that the latter accepts (and possibly > returns) a NULL, where the former does not.
Agree with you so far. Yes svn_checksum_to_cstring() accepts and returns NULL. > > Then we have svn_checksum_serialize and svn_checksum_deserialize, which I > would > expect to be always used for the protocol; These are used by the wc indirectly by way of svn_sqlite__bind_checksum. The key difference here is they include the type and the digest itself in the output, allowing the svn_checksum_t value to be stored in the database and then recreated from that, without assuming any particular type. This is probably what we should have been doing for the protocol and the repository, but we didn't and this is a relatively new (1.7) API. I'd suggest that FSX should start using this but since it's probably going to be more binary there's probably going to be more compact serialization used for it. > and svn_checksum_parse_hex, where > it's not clear if it's the inverse of svn_checksum_to_cstring, or the _display > variant, or both, or WTF. svn_checksum_parse_hex takes a hash type (which is usually inferred from where the hex string is coming from) and a hex string and produces a svn_checksum_t, it can be used to parse the output from either svn_checksum_to_cstring or svn_checksum_to_cstring_display. The difference between svn_checksum_to_cstring and svn_checksum_to_cstring_display is how they handle a digest that is all zeros. svn_checksum_to_cstring() returns NULL while svn_checksum_to_cstring_display returns a string with the appropriate number of zeros in it. There are places where one or the other variant is useful. svn_checksum_parse_hex handles both NULL or a string with all zeros as the digest the same. > In other words, at least one of these APIs needs to be deprecated because it > has a terrible name, and all of them need better documentation. The documentation is sufficient to understand what's going on but it's not entirely obvious until you read all of it. So yes we could improve this by adding hints like that svn_checksum_parse_hex() can handle the output of either svn_checksum_to_cstring() or svn_checksum_to_cstring_display(). > If we want a > function that's intended specifically for printing fingerprints, then by all > means call it > > svn_checksum_fingerprint > > although, really, A fingerprint is just another name for a hash/checksum/digest. The problem here is that there are several variations of how to display this data. It is highly useful to try and display this data in the common format used for the context. For instance, almost everything displays X.509 cert fingerprints as SHA1 hashes with upper case hex digits with a colon between the characters for each byte. On the other hand you have software like GPG and PGP where the standard is to use upper case hex digits with a space between every 2 bytes of the hash. Then you have software like sha1sum, md5sum, etc... that just gives you a stream of all lower case hex digits with no separators. If you're wanting to manually verify that two match you can sit there and try to read one and compare it to the other which is an error prone operation even if you are using a format that has some sort of periodic separator. Or if you happen to have both in the computer you can copy one or the other out and put it in a editor and then search to see if the other one matches (or use the search in your terminal). This of course only works if you have them formatted the same. So creating a function named svn_checksum_fingerprint() does absolutely nothing to improve the situation. It's just as ambiguous as the current situation. If we want to avoid flags and booleans then I guess we need to come up with names for the styles like: svn_checksum_fingerprint_x509style() or svn_checksum_fingerprint_uppercolons() etc... I'll try to come up with some final names, but can we at least agree that as long as they're reasonably descriptive we don't need a week long thread to agree to names? > I don't know why such a thing would have to be in > libsvn_subr, or perhaps in svn_checksum.h; I could imagine such a thing living > in subr but exposed in svn_cmdline.h, called svn_cmdline_fingerprint or some > such. What does this have anything to do with the command line? The svn_cmdline.h stuff is all things that either handle input or output to a terminal. The few functions we have that handle character set encoding are there so they can handle the detection of the output encoding (done in svn_cmdline_init()) and use the correct encoding automagically. I see no reason why this sort of formatting function shouldn't return UTF-8 and then the command line tools use the svn_cmdline_* functions to output it. Whatever formatting functions we provide for svn_checksum_t should live in svn_checksum.h.