Here is an executive summary of my thoughts on the discussion so far:
*How much of ISO-8601 should the Elixir standard library support?*
Of course everyone would love full support out of the box, but as Kip
describes this is a massive surface area in terms of development and
maintenance, so I also feel as if an external library is the best home for
comprehensive approaches, at least for now. We should support the bare
minimum to handle common cases.
*What is the bare minimum? What are the common cases?*
Arguably, ordinal dates are not super common. However, as Paul points out,
the `{Date, DateTime, Time, NaiveDatetime}.from_iso8601` class of functions
set certain expectations here, and they defer to `Calendar.ISO`, which is
why the PR makes changes to `Calendar.ISO.{parse_date,
parse_naive_datetime, parse_utc_datetime}` functions. More importantly, the
`from_iso8601` functions power the date, time, and datetime sigils that are
wildly common and useful, so it is important we get them right.
To me this suggests that we are committed to supporting at a minimum the
explicit fully-qualified date, time, and datetime parts of the ISO-8601
spec that do not require "prior agreement" between parties. This takes the
following components of the spec off the table:
- The by-agreement expanded year references via `±YYYY` to represent years
outside the 0000-9999 range
- References to things that cannot become fully-qualified dates, times, or
datetimes: like year, month, week, hour, or minute -resolution references
(via the "week" and "reduced precision" notations)
- Things that are not dates, times, or datetimes like durations and
intervals
- The by-agreement truncated references like `--MMDD`, which was also
removed from the spec
By my reading of the spec, this just leaves support for the extended format
(which we already have), and also ordinal date formats (this PR) and the
basic formats (as José brings up).
*How to indicate desired formats?*
I like José's suggesting of supporting a flag, but it gets kind of
complicated as there are several dimensions here even in our reduced case.
Dates, times, and datetimes support either basic or extended notations;
dates and datetimes support calendar dates or ordinal dates; both are
applicable to any parsing.
If we went with this approach I'd lean towards always accepting either form
for one of the dimensions, and using flags to the sigil and parsing
functions to indicate intent for the other.
On the other hand, we are claiming to support ISO-8601 here (with a reduced
surface area of only supporting calendar dates), so I'm more inclined to
say we should accept any permutation of these options.
Someone may only want to support some of these permutations, though the
feedback of "invalid date" from a `from_iso8601` function for something
supported by ISO-8601 would be kind of strange. But if there's strong
support for that, I'd compromise by saying we should accept everything by
default, and then support optional flags to restrict the what the parser
views as valid.
On Thursday, February 4, 2021 at 12:59:31 PM UTC-8 Paul Schoenfelder wrote:
> I think in the case of someone who explicitly wants to omit support for
> some specific part of the standard, they should use a custom date/time
> parsing library to handle that, allowing them precise control over what is
> a valid parse. If you choose to use `DateTime.from_iso8601/`, presumably
> you are happy to allow any valid ISO-8601 date/time supported by the
> standard. If you want a subset of ISO-8601, such as RFC-3339, then you'd
> necessarily want something like `DateTime.from_rfc3339/1`, but rather than
> add a proliferation of such APIs to Elixir, I'd propose supporting a wide
> breadth of ISO-8601, and providing support for parsing based on stftime
> format strings, or going a step further and adding support for some kind of
> extensible parsing primitives.
>
> Just to illustrate what I mean: Timex took the approach of supporting two
> format syntaxes out of the box - stftime and the default syntax which aims
> to be a more readable form of stftime. In addition to the primitive
> directives common to both syntaxes, the default syntax added directives to
> parse ISO-8601, RFC-3339, and a variety of other common date/time formats,
> without having to know the correct format string for those standards.
>
> This was all supported via the `Timex.parse/2` and `Timex.parse/3` API,
> where the latter allows one to provide a custom tokenizer for the format
> string. The tokenizer then parses the format string and produces a list of
> directives. A directive here is the primitive that Timex parsing builds on,
> and is either one of the built-in directive types, such as `:year4` , or
> can be a custom parser function. The date/time input string is parsed by
> applying the directives to the input, feeding the unparsed input from the
> previous parser as the input to the next parser, until either all parsers
> were successfully applied, an parsing error occurs, or the input is fully
> consumed. Internally, both stftime and the default format syntax are
> implemented as custom tokenizers on top of the same primitives provided to
> third-party libraries.
>
> In my opinion, I think it was probably a mistake to have named the
> `from_iso8601` function as such, since it sounds like the intent is to
> support more or less RFC-3339. Not much that can be done about that now
> though, which is why it seems to me that the best path forward is to
> provide some kind of support for parsing based on format strings. Obviously
> one doesn't need to go all the way to making it fully extensible like I did
> with Timex, that's just one extreme, but supporting at least stftime seems
> like a good compromise, since that is very common across languages, and
> provides an escape hatch when the behavior of `from_iso8601` isn't enough.
>
> Of course we could just rely on the community to provide libraries that
> support varying degrees of parsing functionality, but I think its a
> solved-enough problem that providing the core primitives in the standard
> library and having the community build libraries around those primitives
> will result in a better ecosystem. The alternative is that every app has
> like three different custom date/time libraries somewhere in their
> dependency tree, with potentially varying degrees of support for standards
> like ISO-8601.
>
> I don't have a strong feeling one way or the other on how best to resolve
> the original issue here, but I think if we're going to go down the road of
> saying a function that parses ISO-8601 doesn't _really_ parse ISO-8601,
> then the standard library should probably provide facilities for parsing
> based on format strings as an escape hatch. Honestly, after I saw the
> original email to the list, I realized that I at some point made the same
> assumption that was made in the standard library, and I'm planning to fix
> the Timex parser to properly support the spec.
>
> Paul
>
>
> On Thu, Feb 4, 2021, at 7:55 AM, José Valim wrote:
>
> To complement what Kip says, the ISO standard also focuses a lot on what
> is agreed between parties. For example, ISO says you can submit a date as
> 2021-01, as long as both parties agree on that. Does it mean we should
> support 2021-01 on Elixir out of the box?
>
> There is also the argument in that, if we parse 2021-034 by default, what
> happens when the user explicitly does not want to support this format? For
> example, my invoices follow precisely the YYYY-NNN format, and if someone
> wrote a code that detects between invoice numbers and dates, you may now
> accidentally parse invoices as dates.
>
> So if we want to support this, maybe we should tag it accordingly:
>
> Date.from_iso8601("2020-323", :ordinal)
>
> That can also be the way to support the :basic format, which we currently
> do not support.
>
> On Thu, Feb 4, 2021 at 2:58 AM Christopher Keele <[email protected]>
> wrote:
>
> > I think it's important to be careful on the scope. The ISO8601 spec is
> vast (I am implementing a new lib that implements the whole thing and its a
> huge task).
>
> This is a good point. I'm carefully pulling this particular thread, trying
> not to unweave the whole tapestry, because it does seem inline with what
> the stdlib tries to accommodate today:
>
> - We do not handle century dates, decade dates, etc
> - We do not handle implicit or "relative" dates
> - We do not handle week references
>
> Generally, we only handle explicit year/month/day references to precise
> calendar dates in string parsing. Ordinal dates are an edge case here: they
> can be trivially converted to such with the current calendar protocol.
>
> My cursory reading of the ISO8601 spec suggests that this is possibly the
> only "advanced" date descriptor that is trivial to implement against the
> calendar protocol today, but I'd appreciate your insight into that! I could
> have easily overlooked some features; and by implementing ordinal dates, I
> may be opening a can of worms definitely better left to a library.
> On Wednesday, February 3, 2021 at 5:45:22 PM UTC-8 Kip wrote:
>
> I think its important to be careful on the scope. The ISO8601 spec is
> vast (I am implementing a new lib that implements the whole thing and its a
> huge task) .
>
> > "I think we should probably consider this a bug, and fix functions which
> parse ISO-8601 dates so that they support what's allowed in the ISO-8601
> spec". This is not a realistic goal for Elixir core I think.
>
> Ordinal dates, explicit and implicit forms of dates, century dates, decade
> dates, ..... thats a lot of surface area for maintenance that in the spirit
> of Elixir probably lies better in an external library if required.
>
>
> On Thursday, February 4, 2021 at 9:30:32 AM UTC+8 [email protected]
> wrote:
>
> WIP PR available here: https://github.com/elixir-lang/elixir/pull/10687
>
> I've proved the concept, but want to step back and solicit feedback, get
> some discussion going on explicit points I call out in the PR, and think of
> ways to clean the implementation up a little.
> On Wednesday, February 3, 2021 at 2:53:23 PM UTC-8 [email protected] wrote:
>
> +1
>
>
> On Wed, Feb 3, 2021 at 2:54 PM Christopher Keele <[email protected]>
> wrote:
>
> As observed by @ryanbigg <https://twitter.com/ryanbigg> on Twitter
> <https://twitter.com/ryanbigg/status/1356847900035190786>, *"2021-034"*
> is a valid ISO 8601 date.
>
> Specifically, it is an ordinal date
> <https://en.wikipedia.org/wiki/Ordinal_date> descriptor of the format
> YYYY-DDD. Unlike some of the more exotic ISO 8601 formats, like naming a
> week of the year or a day+month without a year; it does fully describe a
> single date in time.
>
> As Ryan observes, Ruby supports parsing ordinal date strings but Elixir
> does not. Is this something we'd want to add? Honestly the correct
> behaviour here is almost more surprising to me than our lack of support for
> it, but I wanted to field a discussion about it.
>
>
> --
>
> You received this message because you are subscribed to the Google Groups
> "elixir-lang-core" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elixir-lang-core/51e44339-31aa-4ec6-93c8-3ca0f7901926n%40googlegroups.com
>
> <https://groups.google.com/d/msgid/elixir-lang-core/51e44339-31aa-4ec6-93c8-3ca0f7901926n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
>
>
>
> --
>
> Regards,
> Bruce Tate
> CEO
>
>
> <https://bowtie.mailbutler.io/tracking/hit/f8218219-d2a8-4de4-9fef-1cdde6e723f6/c7c97460-016e-45fb-a4ab-0a70318c7b97>
>
>
> Groxio, LLC.
> 512.799.9366 <(512)%20799-9366>
> [email protected]
> grox.io
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "elixir-lang-core" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elixir-lang-core/84f61e39-f261-43dd-9dd2-48cf9bcb4937n%40googlegroups.com
>
> <https://groups.google.com/d/msgid/elixir-lang-core/84f61e39-f261-43dd-9dd2-48cf9bcb4937n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "elixir-lang-core" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
>
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4%2BxAiOwff4FtdJoVormDLh8ZZ%3Dt4iOOMLqicH97uvJsqQ%40mail.gmail.com
>
> <https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4%2BxAiOwff4FtdJoVormDLh8ZZ%3Dt4iOOMLqicH97uvJsqQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
>
>
--
You received this message because you are subscribed to the Google Groups
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elixir-lang-core/f0d699a4-6671-47ff-829f-d6861b02e409n%40googlegroups.com.