Re: Allow to_date() and to_timestamp() to accept localized names

Juan José Santamaría Flecha Fri, 24 Jan 2020 01:49:25 -0800

On Thu, Jan 23, 2020 at 11:00 PM Tom Lane <t...@sss.pgh.pa.us> wrote:


Thank you for your time looking into this.

Here's a v7 patch, rebased over my recent hacking, and addressing
> most of the complaints I raised in <31691.1579648...@sss.pgh.pa.us>.
> However, I've got some new complaints now that I've looked harder
> at the code:
>
> * It's not exactly apparent to me why this code should be concerned
> about non-normalized characters when noplace else in the backend is.
> I think we should either rip that out entirely, or move the logic
> into str_tolower (and hence also str_toupper etc).  I'd tend to favor
> the former, but of course I don't speak any languages where this would
> be an issue.  Perhaps a better idea is to invent a new SQL function
> that normalizes a given string, and expect users to call that first
> if they'd like to_date() to match unnormalized text.
>
>
There is an open patch that will make the normalization functionality user
visible [1]. So, if a user can call to_date(normalize('01 ŞUB 2010'), 'DD
TMMON YYYY') I would vote to drop the normalization logic inside this patch
altogether.

* I have no faith in this calculation that decides how long the match
> length was:
>
>                 *len = element_len + name_len - norm_len;
>
> I seriously doubt that this does the right thing if either the
> normalization or the downcasing changed the byte length of the
> string.  I'm not actually sure how we can do that correctly.
> There's no good way to know whether the changes happened within
> the part of the "name" string that we matched, or the part beyond
> what we matched, but we only want to count the former.  So this
> seems like a pretty hard problem, and even if this logic is somehow
> correct as it stands, it needs a comment explaining why.
>
>
The proper logic would come from do_to_timestamp() receiving a normalized
"date_txt" input, so we do not operate with unnormalize and normalize
strings at the same time.


> * I'm still concerned about the risk of integer overflow in the
> string allocations in seq_search_localized().  Those need to be
> adjusted to be more paranoid, similar to the code in e.g. str_tolower.
>

Please find attached a patch with the normalization logic removed, thus no
direct allocations in seq_search_localized().

I would like to rise a couple of questions myself:

* When compiled with DEBUG_TO_FROM_CHAR, there is a warning "‘dump_node’
defined but not used". Should we drop this function or uncomment its usage?

* Would it be worth moving str_tolower(localized_name)
from seq_search_localized() into cache_locale_time()?

[1]
https://www.postgresql.org/message-id/014866c8-d7ff-2a4f-c185-cf7e3ceb7028%402ndquadrant.com

Regards,

Juan José Santamaría Flecha

0001-Allow-localized-month-names-to_date-v8.patch
Description: Binary data

Re: Allow to_date() and to_timestamp() to accept localized names

Reply via email to