Re: Proposal to add a new URL data type.

Alexander Borisov Wed, 11 Dec 2024 08:05:20 -0800

10.12.2024 13:59, Victor Yegorov пишет:

чт, 5 дек. 2024 г. в 17:02, Alexander Borisov <lex.bori...@gmail.com<mailto:lex.bori...@gmail.com>>:


[..]

Hey, I had a look at this patch and found its functionality mature andperformant.
As Peter mentioned pguri, I used it to compare with the proposedextension. This brought up
the following differences:
- pguri (uriparser 0.9.8) doesn't support Chinese symbols in the hostpart of URI (uri_test1.sh):
ERROR: invalid input syntax for type uri at or near "事例.com#comments <http://xn--3kq3x.com#comments>"
Therefore, I avoided Chinese or Cyrillic symbols in the pguri testscript.- There are no SET functions in the pguri, changing specific portions ofURI is troublesome. I used
   replace() in the test, but this is an error prone approach.
- It's even more troublesome to set parts of the URI that are notinitially there. Probably, a full decomposition
   into parts and the following wrap up is needed
Suggested extension has no such limitations. Additionally, pguriextracts userinfo as a whole,
while suggested extension can get/set user and password individually.


Running tests (attached) I got the following numbers:
$ ./url_test.sh
NOTICE:  extension "url" already exists, skipping
tps = 13068.287423 (without initial connection time)
tps = 12888.937747 (without initial connection time)
tps = 12830.642558 (without initial connection time)
tps = 12846.341411 (without initial connection time)
tps = 13187.955601 (without initial connection time)

$ ./uri_test2.sh
NOTICE:  extension "uri" already exists, skipping
tps = 2441.934308 (without initial connection time)
tps = 2513.277660 (without initial connection time)
tps = 2484.641673 (without initial connection time)
tps = 2519.312395 (without initial connection time)
tps = 2512.364492 (without initial connection time)
So it's 12.9k vs 2.5k, or 6x faster for a case where we replace 5 partsof the original URL.
Given its performance and functionality, I find the suggested URLextension better than pguri.


Thanks for the constructive comments and the testing you have done.

Now to the review part.
1. Applying patch causes indentation warning, please, bring spacing tothe project's policy
$ git apply ~/0001-Add-url-data-type-to-contrib.patch
/home/vyegorov/0001-Add-url-data-type-to-contrib.patch:837: indent withspaces.
     return lexbor_calloc(1, sizeof(lexbor_array_t));
/home/vyegorov/0001-Add-url-data-type-to-contrib.patch:843: indent withspaces.
     if (array == NULL) {
/home/vyegorov/0001-Add-url-data-type-to-contrib.patch:844: indent withspaces.
         return LXB_STATUS_ERROR_OBJECT_IS_NULL;
/home/vyegorov/0001-Add-url-data-type-to-contrib.patch:845: indent withspaces.
     }
/home/vyegorov/0001-Add-url-data-type-to-contrib.patch:847: indent withspaces.
     if (size == 0) {
warning: squelched 6350 whitespace errors
warning: 6355 lines add whitespace errors.


This will be fixed when the main URL parser code is rewritten to fit
the Postgres style/infrastructure.

2. There's a lexbor/ library that contains core and url parts. Feelslike some commentary about what's
   inside is required.


Yeah, that's a good point.

3. Do you think it's possible to adjust your code to use existingpostgres infrastructure instead? I don't think having its own infrastructure for a single extension is a goodthing. Also, this might be a source
   for performance improvements in the core.


To implement (rewrite/modify) a URL parser using exclusively Postgres
infrastructure is one of the main goals.


4. There's no user visible documentation, please, add one.


That's a good point.

I've created a commitfest entry for the patch: https://commitfest.postgresql.org/51/5432/ <https://commitfest.postgresql.org/51/5432/>I was not able to find you, please, register a community account and setyourself as an author for the patch.


Done.


--
Victor Yegorov



--
Alexander Borisov

Re: Proposal to add a new URL data type.

Reply via email to