Date: Sun, 6 Nov 2022 17:31:59 +0000 From: David Holland <dholland-t...@netbsd.org> Message-ID: <Y2fvj/+e7oox4...@netbsd.org>
| It seems to orphan this text in POSIX: | | : If a struct tm broken-down time structure is created or modified by | : gmtime() or gmtime_r(), it is unspecified whether the result of the %Z | : and %z conversion specifiers shall refer to UTC or the current local | : timezone, when strftime( ) is called with such a broken-down time | : structure. Not necessarily: | since in the absence of significant magic this clearly requires | having and using timezone fields in struct tm. No, it doesn't. That text is there now, the tm_zone and tm_gmtoff fields are only just being added in the next version. The same text was there in Issue6 TC1 of POSIX (2003) which is where it seems to have been added (it isn't in the original Issue 6 (2001)). All of that is LONG before anyone in the standards community would have been taking any notice of tm_zone or tm_gmtoff, the reason for it needs to be something completely different than those. As I said in an earlier message, my guess (and certainly, this is not anything more than that) is that there are (were) implementations of gmtime which set the tzaddr[] array to "UTC" (or "GMT") at least for tzaddr[0]. If one calls gmtime() after, and then strftime, it is entirely possible that %z may produce 0 and %Z "UTC" (or "GMT") on such an implementation. Since POSIX likes (mostly) to standardise what exists, allowing either the local timezone info, or that for GMT in such a case is exactly the kind of thing they would do. No assumptions about tm_zone (etc) required. | I do wonder, however, since it was mentioned earlier in this thread | that POSIX is planning to add timezone fields to struct tm, what | they're intending in this regard. Do you have a link to that proposal? Certainly: https://austingroupbugs.net/view.php?id=1533 which is a 2021 proposal - note which is much more recent than the proposal to add %s to strftime, which was more or less complete in 2009 (it just needed to wait for a new standard version to actually be incorporated). That bug forgot to update mktime() which needs doing as part of the addition of those extra fields (because of the lazy way it was specified). I filed https://austingroupbugs.net/view.php?id=1613 just a few days ago to correct that. There has been no reaction to that one, which I suspect probably means it will be accepted more or less as written (the actual wording might be altered, I do not always to a very good job of explaining things clearly, so that often happens). On the other hand https://austingroupbugs.net/view.php?id=1614 which is another defect report I filed against mktime - one which I expected to be a no-brainer (much less controversial, I thought) than 1613, has turned out to generate a lot of ire... (This one has nothing at all to do with anything we have been discussing in this thread.) | Meanwhile, however, this doesn't affect %s since it isn't in C99 and | POSIX is free to define it as derived from whatever struct tm fields | they want. They could have done, yes, but unless there is a very good reason to be different from the implementations, they don't. %s is defined to use mktime() - what that uses I will come to below (and is the subject of bug 1613 above). | Furthermore, though I gather this has not actually been | done, it would be reasonable to extend the above permission to use UTC | to %s. No, only if implementations actually do that, and they don't. Sice %s is specified to use mktime() it uses exactly the fields of the struct tm that mktime() uses, and those are defined (defined in a rather stupid and lazy way, which is harder than it needs to be to understand, and not future proof, which it should be - that's what bug 1613 is intended to fix). But they are specified. You are, of course, perfectly entitled to ask the POSIX maintainers (the Austin Group) to modify things that way, if you like - you'd need to supply exactly the language needed to change to make that happen, and into what, and generally, note implementations which already act as you are planning (and for this, NetBSD isn't really generally considered to be a big enough player to count for much). | I also don't see that any of this constitutes permission to pass a | struct tm containing uninitialized fields, especially if the format | contains %s. There is, actually. | There is nothing in the C99 description of mktime (or | POSIX either) that specifies which fields it uses, I don't know what's in the C standards, but in POSIX that is specified. I would assume (since nothing in POSIX indicates that this is an extension) that the C standard is probably the same. Maybe someone with access to it (or a late draft, hint, hint) could tell us if it is substantially the same or not. What is said in XSH 3.mktime (this from the current POSIX standard, Issue7 TC2, 2018 I think - but this text is much older than that) is: The mktime( ) function shall convert the broken-down time, expressed as local time, in the structure pointed to by timeptr, into a time since the Epoch value with the same encoding as that of the values returned by time( ). The original values of the tm_wday and tm_yday components of the structure shall be ignored, and the original values of the other components shall not be restricted to the ranges described in <time.h>. Before anything else, note the "expressed as a local time" words (though those are not relevant to this particular point). That is, it uses the values from a struct tm, as defined by <time.h> except that it ignores tm_wday and tm_yday (hence those fields need not be set to anything, as they will not be referenced - they will be set before a non-error return however, that is stated later). To understand what that really means, one needs to also look at <time.h> The relevant part of XBD 13.<time.h> is: The <time.h> header shall declare the tm structure, which shall include at least the following members: int tm_sec Seconds [0,60]. int tm_min Minutes [0,59]. int tm_hour Hour [0,23]. int tm_mday Day of month [1,31]. int tm_mon Month of year [0,11]. int tm_year Years since 1900. int tm_wday Day of week [0,6] (Sunday =0). int tm_yday Day of year [0,365]. int tm_isdst Daylight Savings flag. The value of tm_isdst shall be positive if Daylight Savings Time is in effect, 0 if Daylight Savings Time is not in effect, and negative if the information is not available. While implementations are permitted to add fields (which is how we get tm_zone and tm_gmtoff) conforming applications are not permitted to expect any others to exist, and hence cannot expressly set them to anything (they'd need to be relying on something extra, like NETBSD_SOURCE or GNU_SOURCE or whatever it is called) which defines the extra fields, and that would make a non-standards conforming application. Hence, when XSH 3.mktime says "in the structure pointed to by timeptr" (a struct tm) it can only possibly be requiring the fields it has defined - and hence conforming applications only need set those (the 7 that remain after tm_wday and tm_yday are ignored). I know this all sounds rather roundabout and convoluted, but trust me, this is exactly how the standards are interpreted. | If the state of the real world is that code that conses up struct tm | doesn't actually bzero it first (or use an explicit struct | initializer), then using the timezone fields is perhaps unwise in the | near term. I also wonder how POSIX plans to handle this if they're | proposing adding those fields. They are adding the fields as outputs from localtime(), mktime() etc. They are not being used as input values to anything - if they were, then existing conforming applications would automatically break, and that isn't something that happens (application people would veto the idea). Existing conforming applications can simply ignore the new fields completely, and as long as they stick to the standard interfaces (even including strftime("%s") since that's been around and coming for 13 years now - an plenty of people knew that) the application will continue to work just fine (just as such applications do when compiled on NetBSD which has had the extra fields sice conception). Applications that conform to the coming standard will be able to use the tm_gmtoff and tm_zone fields (subject particularly to the lifetime restrictions on the string tm_zone points to) however they see fit - if desired, instead of using %Z and %z. [Aside: the 'E' and 'O' modifiers for the strftime() conversions currently have no defined meaning for %z or %Z, so while it would be a wild variation to the current meanings of those modifiers on the conversions to which they can be currently applied, an implementation could, I suppose, modify strftime so that one of %EZ or %OZ (and %Ez or %Oz) behaved the way you'd like them too - %s is more difficult, though could also be done but would require more specification, and the implementation would be very messy I suspect.] | (Note that questions about bzero vs. explicit initializers and all | bits zero are a red herring -- we do not run on the DS9000.) Agreed, that's noise. | Like I said above, it isn't clear to me that anything grants | permission to pass uninitialized fields to strftime, and it _is_ clear | (given the proposed and de facto definition of %s) that passing | uninitialized fields to strftime while using %s is UB. Only for the fields that mktime() uses, as %s is defined to give (as a decimal character string) the same value that mktime() returns as a time_t | It came up earlier, but got left by the wayside because it's not | portable and the question was what the portable functions should do. OK - I mentioned it just as it was noted that we'd need a new API, and it turns out that we already have it. | Can you file a PR on that? It seems likely that if nobody does | anything about it sooner or later the way we'll discover it again is | by random barely-debuggable browser or JVM misbehavior. I could, but first let me deal with your followup message on this point (there isn't likely to be any hurry here, we cannot alter the POSIX version we support without auditing everything in the newer version of the standard, and our implementations of it all, to make sure that we do in fact implement all of that - that's not happening (finishing) any time this year). dholland-t...@netbsd.org said in a followup message: | The plot thickens! This is not true. There are two copies of strftime_l | in strftime.c, Yes, I know. | one that works and one that behaves as you describe; I'm not sure which you mean there ... the one we use currently is the one that uses the locale, the one we don't, is the one which ignores the passed in locale (which is currently simply omitted). | the latter one is inside #if HAVE_STRFTIME_L, which I guess is false. Yes, I think so. | So I guess if the _POSIX_VERSION gets bumped it'll just break the build. :-) You mean because then there would be two .... But no, I think not (while I haven't actually tested this, and should, and certainly would before filing a PR) I believe what happens in that case, is that the strftime_l() which is currently omitted becomes strftime_l() instead of the one we currently used, and the other one, by this piece of magic in private.h # if HAVE_STRFTIME_L # undef strftime_l # define strftime_l tz_strftime_l # endif gets renamed to be tz_strftime_l() instead. Or at least that is what is intended to happen. dholland-t...@netbsd.org said in that same follup message: | Any reason not to remove that one? I think you already answered that in the message to which I am replying (the one quoted through most of this reply) | (Also, most people are afraid to touch tzcode, with good reason...) Fortunately, as we have additions/changes to tzcode already (some of them, controlled by #if NETBSD_INSPIRED in the tzcode sources) we have some brave souls amongst the NetBSD developers, so all hope is not lost. kre