чт, 5 дек. 2024 г. в 17:02, Alexander Borisov <lex.bori...@gmail.com>:

> My name is Alexander Borisov, I want to propose/discuss
> adding a new URL type as an extension in PostgreSQL contrib.
> I think everyone here knows/seen what URL/URI and their basics.
> If someone is interested in the basics, you can read the original
> RFC 1738 [1].
>
> ...
>
> I am willing to take care of the implementation of the new data type
> and its further support.  If it's of interest to the community.
>

Hey, I had a look at this patch and found its functionality mature and
performant.

As Peter mentioned pguri, I used it to compare with the proposed extension.
This brought up
the following differences:
- pguri (uriparser 0.9.8) doesn't support Chinese symbols in the host part
of URI (uri_test1.sh):

    ERROR:  invalid input syntax for type uri at or near "事例.com#comments
<http://xn--3kq3x.com#comments>"

  Therefore, I avoided Chinese or Cyrillic symbols in the pguri test script.
- There are no SET functions in the pguri, changing specific portions of
URI is troublesome. I used
  replace() in the test, but this is an error prone approach.
- It's even more troublesome to set parts of the URI that are not initially
there. Probably, a full decomposition
  into parts and the following wrap up is needed
Suggested extension has no such limitations. Additionally, pguri extracts
userinfo as a whole,
while suggested extension can get/set user and password individually.


Running tests (attached) I got the following numbers:
$ ./url_test.sh
NOTICE:  extension "url" already exists, skipping
tps = 13068.287423 (without initial connection time)
tps = 12888.937747 (without initial connection time)
tps = 12830.642558 (without initial connection time)
tps = 12846.341411 (without initial connection time)
tps = 13187.955601 (without initial connection time)

$ ./uri_test2.sh
NOTICE:  extension "uri" already exists, skipping
tps = 2441.934308 (without initial connection time)
tps = 2513.277660 (without initial connection time)
tps = 2484.641673 (without initial connection time)
tps = 2519.312395 (without initial connection time)
tps = 2512.364492 (without initial connection time)

So it's 12.9k vs 2.5k, or 6x faster for a case where we replace 5 parts of
the original URL.

Given its performance and functionality, I find the suggested URL extension
better than pguri.


Now to the review part.

1. Applying patch causes indentation warning, please, bring spacing to the
project's policy

$ git apply ~/0001-Add-url-data-type-to-contrib.patch
/home/vyegorov/0001-Add-url-data-type-to-contrib.patch:837: indent with
spaces.
    return lexbor_calloc(1, sizeof(lexbor_array_t));
/home/vyegorov/0001-Add-url-data-type-to-contrib.patch:843: indent with
spaces.
    if (array == NULL) {
/home/vyegorov/0001-Add-url-data-type-to-contrib.patch:844: indent with
spaces.
        return LXB_STATUS_ERROR_OBJECT_IS_NULL;
/home/vyegorov/0001-Add-url-data-type-to-contrib.patch:845: indent with
spaces.
    }
/home/vyegorov/0001-Add-url-data-type-to-contrib.patch:847: indent with
spaces.
    if (size == 0) {
warning: squelched 6350 whitespace errors
warning: 6355 lines add whitespace errors.

2. There's a lexbor/ library that contains core and url parts. Feels like
some commentary about what's
  inside is required.

3. Do you think it's possible to adjust your code to use existing postgres
infrastructure instead? I don't
  think having its own infrastructure for a single extension is a good
thing. Also, this might be a source
  for performance improvements in the core.

4. There's no user visible documentation, please, add one.

I've created a commitfest entry for the patch:
https://commitfest.postgresql.org/51/5432/
I was not able to find you, please, register a community account and set
yourself as an author for the patch.

-- 
Victor Yegorov
#!/bin/bash

export PATH=$PATH:$PWD/bin
dbname=postgres
secs=30
psql -c "alter system set max_parallel_workers_per_gather = 0;" $dbname > 
/dev/null
psql -c "select pg_reload_conf();" $dbname > /dev/null
psql -c "create extension if not exists url;" $dbname > /dev/null


cat << EOSQL > bench.sql
WITH u(url) AS MATERIALIZED (
  SELECT 'https://example.com#comments'::url
)
SELECT url_fragment_set(
           url_path_set(
             url_port_set(
               url_host_set(
                 url_password_set(
                   url_username_set(
                     url_scheme_set( url, 'wss' )
                   , 'guest' )
                 , '12345' )
               , '事例.com' )
             , '8080' )
           , '/а/б/в' )
       , 'comment' )
  FROM u;
EOSQL

for i in {1..5}
do
        pgbench -n -f bench.sql -M prepared -T $secs $dbname | grep tps
done
#!/bin/bash

export PATH=$PATH:$PWD/bin
dbname=postgres
secs=30
psql -c "alter system set max_parallel_workers_per_gather = 0;" $dbname > 
/dev/null
psql -c "select pg_reload_conf();" $dbname > /dev/null
psql -c "create extension if not exists uri;" $dbname > /dev/null


cat << EOSQL > bench.sql
WITH u(uri) AS MATERIALIZED (
  SELECT 'https://example.com#comments'::uri
)
SELECT regexp_replace( uri::text, uri_host( uri ), '事例.com' )::uri
  FROM u;
EOSQL

for i in {1..5}
do
        pgbench -n -f bench.sql -M prepared -T $secs $dbname | grep tps
done
#!/bin/bash

export PATH=$PATH:$PWD/bin
dbname=postgres
secs=30
psql -c "alter system set max_parallel_workers_per_gather = 0;" $dbname > 
/dev/null
psql -c "select pg_reload_conf();" $dbname > /dev/null
psql -c "create extension if not exists uri;" $dbname > /dev/null


cat << EOSQL > bench.sql
WITH u(uri) AS MATERIALIZED (
  SELECT 'https://root:qe...@example.com:80/a/b/c#comments'::uri
), a(uri) AS (
  SELECT replace( uri::text, uri_scheme( uri ), 'wss' )::uri FROM u
), b(uri) AS (
  SELECT replace( uri::text, uri_userinfo( uri ), '123:abc' )::uri FROM a
), c(uri) AS (
  SELECT replace( uri::text, uri_port( uri )::text, '5432' )::uri FROM b
), d(uri) AS (
  SELECT replace( uri::text, uri_path( uri )::text, '/d/e/f' )::uri FROM c
)
SELECT replace( uri::text, uri_fragment( uri )::text, 'version' )::uri FROM d;
EOSQL

for i in {1..5}
do
        pgbench -n -f bench.sql -T $secs $dbname | grep tps
done

Reply via email to