Re: delete query using CTE

2022-03-13 Thread benj . dev

Le 13/03/2022 à 15:44, Roger Bos a écrit :
Hello, trying to use CTE to remove duplicates from a table.  The DELETE 
version does not work, but the SELECT version does, so I am not 
understanding what the problem is.  Any suggestions on how to fix it?


Here is my query:

WITH cte AS
( SELECT *, ROW_NUMBER() OVER ( PARTITION BY ticker, date ORDER BY 
ticker, date) my_row_num FROM price_old)

DELETE FROM cte WHERE my_row_num > 1;

I get the following error:

ERROR: relation "cte" does not exist LINE 3: DELETE FROM cte WHERE 
my_row_num > 1;


But when I run change the query to a select query it runs fine (in that 
it returns all the duplicate rows). For example:


WITH cte AS
( SELECT *, ROW_NUMBER() OVER ( PARTITION BY ticker, date ORDER BY 
ticker, date) my_row_num FROM price_old)

SELECT * FROM cte WHERE my_row_num > 1;

Sample output:

"US00094541" "AAC" "2022-03-08 00:00:00-05" 9.75 9.76 9.75 9.75 
100215 9.75 9.76 9.75 9.75 100215 0 1 2
"US00094541" "AAC" "2022-03-09 00:00:00-05" 9.75 9.76 9.75 9.76 
111334 9.75 9.76 9.75 9.76 111334 0 1 2
"US9823" "AAC" "2022-03-10 00:00:00-05" 9.75 9.76 9.74 9.74 
170474 9.75 9.76 9.74 9.74 170474 0 1 2
"US00090393" "ABCL" "2022-03-08 00:00:00-05" 8.19 8.545 7.81 8.22 
1984348 8.19 8.545 7.81 8.22 1984348 0 1 2


Thanks,
Roger



As Michael Lewis says, you can't use delete from cte but you can build 
your request to do the work with something like (example from you 
original request) :


WITH cte AS
( SELECT *
 , ROW_NUMBER() OVER (PARTITION BY a ORDER BY b) my_row_num
 , ctid -- or pk or other unique identifier if exists
 FROM price_old
)
DELETE FROM tmp24
WHERE ctid IN (SELECT ctid FROM cte WHERE my_row_num = 1);

Benj


OpenPGP_signature
Description: OpenPGP digital signature


Planner choose to use big index instead of smaller one

2022-11-23 Thread benj . dev
Hi,

I have a table tpoint near that 820Mo with 700K lignes
I have created 2 index on this table :

CREATE INDEX idx_small_index ON public.tpoint USING btree (match_id);
-- Size : 4560 kB

CREATE INDEX idx_big_index   ON public.tpoint USING btree (version, match_id, 
playnum, code_action, num_balle );
-- Size : 34 MB

If I execute this request
 
EXPLAIN(analyse, buffers)
SELECT count(*) FROM tpoint
WHERE match_id = 'SM001'

The query planner use the idx_small_index as expected

  ->  Index Only Scan using idx_small_index on tpoint  (cost=0.42..507.10 
rows=624 width=0) (actual time=0.025..0.160 rows=1017 loops=1)
    Index Cond: (match_id = 'SM001'::bpchar)
    Heap Fetches: 199
    Buffers: shared hit=45
                
But if I execute this other request

EXPLAIN(analyse, buffers)
SELECT count(*) FROM tpoint
WHERE match_id LIKE 'SM001%'

The query planner use idx_big_index

  ->  Index Only Scan using idx_big_index on tpoint  (cost=0.42..73940.37 
rows=5191 width=0) (actual time=111.014..143.379 rows=1017 loops=1)
    Filter: (match_id ~~ 'SM001%'::text)
    Rows Removed by Filter: 636819
    Heap Fetches: 132426
    Buffers: shared hit=473963
                
I really don't understand why the planner prefers to use the  "big index" 
instead of the "small" one that containt all needed element to do an INDEX ONLY 
SCAN.
 
If I deactivate the "big index"
UPDATE pg_index SET indisvalid = false WHERE indexrelid = 
'idx_big_index'::regclass
 
I can see that for a first explain :
   ->  Index Only Scan using idx_small_index on tpoint  (cost=0.42..92107.62 
rows=5191 width=0) (actual time=59.980..78.683 rows=1017 loops=1)
    Filter: (match_id ~~ 'SM001%'::text)
    Rows Removed by Filter: 636819
    Heap Fetches: 132426
    Buffers: shared hit=27668 read=564
                
or that for a second explain (No read):                
  ->  Index Only Scan using idx_small_index on tpoint  (cost=0.42..92107.62 
rows=5191 width=0) (actual time=64.012..81.122 rows=1017 loops=1)
    Filter: (match_id ~~ 'SM001%'::text)
    Rows Removed by Filter: 636819
    Heap Fetches: 132426
    Buffers: shared hit=28232

Tests realized on a docker postgres
version = PostgreSQL 15.0 (Debian 15.0-1.pgdg110+1) on x86_64-pc-linux-gnu, 
compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
SET max_parallel_workers_per_gather TO 0

Thanks for any explanation‌
regards

‌


plpgsql function problem whith creating temp table - not correctly using search_path ?

2022-01-11 Thread benj . dev
Hi,

I would like to submit a problem (bug ?) that I encountered while handling 
temporary tables in plpgsql functions.

First, if I create a TABLE and a TEMP TABLE with the same name, and I request 
without specified the schema, the temporary table is used.

-- SHOW search_path; -- => "$user", public
-- Exectute next commands in the same transaction
/* Début Transaction 1 */
DROP TABLE IF EXISTS public.myexemple;
DROP TABLE IF EXISTS pg_temp.myexemple;
CREATE TABLE IF NOT EXISTS myexemple(i INT);
CREATE TEMP TABLE IF NOT EXISTS myexemple(i INT) ON COMMIT DROP;
INSERT INTO public.myexemple VALUES(1);
INSERT INTO public.myexemple VALUES(2);
INSERT INTO public.myexemple VALUES(3);
INSERT INTO myexemple VALUES(7);
SELECT 'public', * FROM public.myexemple
 UNION ALL SELECT 'noschema', * FROM myexemple
 UNION ALL SELECT 'pg_temp', * FROM pg_temp.myexemple;
/* Fin Transaction 1 */
=>
public;1
public;2
public;3
noschema;7
pg_temp;7

Here, all is fine It's the expected behavior.

But If I create the the TEMP TABLE in a function, it's (sometimes) the public 
table which is used and not the temporary table.

CREATE OR REPLACE FUNCTION test_search_path_v1(tmp boolean)
    RETURNS TABLE(ori text, i int)
    LANGUAGE plpgsql
    AS $_$
DECLARE
    txt text;
BEGIN
  CREATE TEMP TABLE IF NOT EXISTS return_table(ori TEXT, i INT) ON COMMIT DROP;
 
  IF $1 THEN
  CREATE TEMP TABLE IF NOT EXISTS my_table(i INT) ON COMMIT DROP;
  ELSE
  CREATE TABLE IF NOT EXISTS my_table(i INT);  
  END IF;

  SELECT setting FROM pg_settings WHERE name = 'search_path' INTO txt;
  RAISE INFO 'search_path = %', txt;
 
  INSERT INTO my_table VALUES((random() * 100)::INT);

  FOR txt IN SELECT schemaname FROM pg_tables WHERE tablename = 'my_table' LOOP
  RAISE INFO '==> %', txt;
  END LOOP;
 
  IF EXISTS (SELECT 1 FROM pg_tables WHERE schemaname ='public' AND tablename = 
'my_table') THEN
  RAISE INFO 'public.my_table exists';
  INSERT INTO return_table SELECT 'public', t.i FROM public.my_table t;
  END IF;
 
  IF EXISTS (SELECT 1 FROM pg_tables WHERE schemaname LIKE 'pg_temp%' AND 
tablename = 'my_table') THEN
  RAISE INFO 'pg_temp.my_table exists';
  INSERT INTO return_table SELECT 'pg_temp', t.i FROM pg_temp.my_table t;
  END IF;
 
  INSERT INTO return_table SELECT '', t.i FROM my_table t;
  RETURN QUERY SELECT t.ori, t.i FROM return_table t;
END;
$_$;

SHOW search_path -- => "$user", public;
DROP TABLE IF EXISTS my_table;
-- Executing each row on differents transactions but in the same session
/*Session A - Transaction 1*/ SELECT * FROM test_search_path_v1(true);  -- => 
OK takes table from pg_temp (no existing table in public)
/*Session A - Transaction 2*/ SELECT * FROM test_search_path_v1(false); -- => 
OK takes table from public
/*Session A - Transaction 3*/ SELECT * FROM test_search_path_v1(true);  -- => 
OK takes table from pg_temp (and the existing from public)
/*Session A - Transaction 4*/ SELECT * FROM test_search_path_v1(false); -- => 
OK takes table from public
/*Session A - Transaction 5*/ SELECT * FROM test_search_path_v1(true);  -- => 
NOK => it takes public and not pg_temp


In fact, if I call 0 or 1 time "test_search_path_v1(false)" all the call with 
"true" will be ok.
But AFTER the second call with false, all subsequent call with true will failed
=> using public instead of pg_temp for the INSERT INTO my_table 
VALUES((random() * 100)::INT)

If I do the test with changing session before exh call, the problem doesn't 
appear
-- Executing each row on differents session
/*Session A */ SELECT * FROM test_search_path_v1(true);  -- => OK takes table 
from pg_temp (no existing table in public)
/*Session B */ SELECT * FROM test_search_path_v1(false); -- => OK takes table 
from public
/*Session C */ SELECT * FROM test_search_path_v1(true);  -- => OK takes table 
from pg_temp (and the existing from public)
/*Session D */ SELECT * FROM test_search_path_v1(false); -- => OK takes table 
from public
/*Session E */ SELECT * FROM test_search_path_v1(true);  -- => OK takes table 
from pg_temp (and the existing from public)


It's possible to bypass te problem with enforce the use of pg_temp like in this 
second version.

CREATE OR REPLACE FUNCTION test_search_path_v2(tmp boolean) RETURNS table(ori 
text, i int)
    LANGUAGE plpgsql
    AS $_$
DECLARE
    txt text;
BEGIN
  CREATE TEMP TABLE IF NOT EXISTS return_table(ori TEXT, i INT) ON COMMIT DROP;
 
  IF $1 THEN
  PERFORM set_config('search_path', 'pg_temp, "$user", public', true); -- 
is_local = true
  CREATE TEMP TABLE IF NOT EXISTS my_table(i INT) ON COMMIT DROP;
  ELSE
  PERFORM set_config('search_path', '"$user", public', true); -- is_local = 
true
  CREATE TABLE IF NOT EXISTS my_table(i INT);  
  END IF;

  SELECT setting FROM pg_settings WHERE name = 'search_path' INTO txt;
  RAISE INFO 'search_path = %', txt;
 
  INSERT INTO my_table VALUES((random() * 100)::INT);

  FOR txt IN SELECT schemaname FROM pg_tables WHERE tablename = 'my_

Re: plpgsql function problem whith creating temp table - not correctly using search_path ?

2022-01-11 Thread benj . dev
‌Hi, Thank you for pointing this part of the documentation.
It's actually works with EXECUTE 'INSERT INTO my_table VALUES((random() * 
100)::INT);'; INSTEAD OF INSERT INTO my_table VALUES((random() * 100)::INT);
And it's possible to suppose that the test_search_path_v2 worked before because 
of the 
PERFORM set_config('search_path', '"$user", public', true); 
I imagine that the changement of search_path have on side effect to invalidate 
the cached plans here and force to recompute the query plan.
I imagine that the probleme appears in  test_search_path_v1 after second call 
maybe because  the cache is kept by following rules such as a certain number of 
executions

In this example, use EXECUTE only in INSERT INTO my_table is sufficient to 
remove the problem.
subsequent SELECT works (without the EXECUTE).
Does doing an EXECUTE on a table have the side effect of invalidating caches 
using that table name or is it just a "chance" here and for added security I 
should use EXECUTE everywhere?

Thanks
 

De : "David G. Johnston"
A : "benj@laposte.net" ,"pgsql-gene...@postgresql.org"
Envoyé: mardi 11 Janvier 2022 15:18
Objet : Re: plpgsql function problem whith creating temp table - not correctly 
using search_path ?
 
On Tuesday, January 11, 2022,  wrote:


SHOW search_path -- => "$user", public;
DROP TABLE IF EXISTS my_table;
-- Executing each row on differents transactions but in the same session
/*Session A - Transaction 1*/ SELECT * FROM test_search_path_v1(true);  -- => 
OK takes table from pg_temp (no existing table in public)
/*Session A - Transaction 2*/ SELECT * FROM test_search_path_v1(false); -- => 
OK takes table from public
/*Session A - Transaction 3*/ SELECT * FROM test_search_path_v1(true);  -- => 
OK takes table from pg_temp (and the existing from public)
/*Session A - Transaction 4*/ SELECT * FROM test_search_path_v1(false); -- => 
OK takes table from public
/*Session A - Transaction 5*/ SELECT * FROM test_search_path_v1(true);  -- => 
NOK => it takes public and not pg_temp

 

Per [1], you are violating:

 

“Because PL/pgSQL saves prepared statements and sometimes execution plans in 
this way, SQL commands that appear directly in a PL/pgSQL function must refer 
to the same tables and columns on every execution; that is, you cannot use a 
parameter as the name of a table or column in an SQL command.”

 

While that wording is odd, it is not even possible to use variables in place of 
table and column names, what you are doing here is effectively the same.  I 
cannot explain the precise observed behavior, and maybe there is a bug here, 
but on its face what you are doing is simply invalid in face of how the session 
relation cache works.

 

[1] 
https://www.postgresql.org/docs/current/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING
 

 

David J.

 




Re: Counting the number of repeated phrases in a column

2022-01-25 Thread benj . dev

Le 25/01/2022 à 18:10, Shaozhong SHI a écrit :

There is a short of a function in the standard Postgres to do the following:

It is easy to count the number of occurrence of words, but it is rather 
difficult to count the number of occurrence of phrases.


For instance:

A cell of value:  'Hello World' means 1 occurrence a phrase.

A cell of value: 'Hello World World Hello' means no occurrence of any 
repeated phrase.


But, A cell of value: 'Hello World World Hello Hello World' means 2 
occurrences of 'Hello World'.


'The City of London, London' also has no occurrences of any repeated phrase.

Anyone has got such a function to check out the number of occurrence of 
any repeated phrases?


Regards,

David


Don't know if it's exactly what you want, but you can replace all 
occurence of the phrase in the text by empty string and compute the diff 
between the initial and the result and next divide by the length of your 
phrase.


Example :
WITH x AS (SELECT 'toto like tata and toto like titi and toto like tutu' 
, 'toto like' phrase)
SELECT (char_length(texte) - char_length(replace(texte, phrase, ''))) / 
char_length(phrase) AS nb_occurence

FROM x



OpenPGP_signature
Description: OpenPGP digital signature


Re: Counting the number of repeated phrases in a column

2022-01-26 Thread benj . dev
>On Tue, 25 Jan 2022 at 21:33, Ivan Panchenko 
>
>wrote:
>
>
>>
>> On 26.01.2022 00:21, benj(dot)dev(at)laposte(dot)net wrote:
>> > Le 25/01/2022 à 18:10, Shaozhong SHI a écrit :
>> >> There is a short of a function in the standard Postgres to do the
>> >> following:
>> >>
>> >> It is easy to count the number of occurrence of words, but it is
>> >> rather difficult to count the number of occurrence of phrases.
>> >>
>> >> For instance:
>> >>
>> >> A cell of value:  'Hello World' means 1 occurrence a phrase.
>> >>
>> >> A cell of value: 'Hello World World Hello' means no occurrence of any
>> >> repeated phrase.
>> >>
>> >> But, A cell of value: 'Hello World World Hello Hello World' means 2
>> >> occurrences of 'Hello World'.
>> >>
>> >> 'The City of London, London' also has no occurrences of any repeated
>> >> phrase.
>> >>
>> >> Anyone has got such a function to check out the number of occurrence
>> >> of any repeated phrases?
>> >>
>> >> Regards,
>> >>
>> >> David
>> >
>> > Don't know if it's exactly what you want, but you can replace all
>> > occurence of the phrase in the text by empty string and compute the
>> > diff between the initial and the result and next divide by the length
>> > of your phrase.
>> >
>> > Example :
>> > WITH x AS (SELECT 'toto like tata and toto like titi and toto like
>> > tutu' , 'toto like' phrase)
>> > SELECT (char_length(texte) - char_length(replace(texte, phrase, '')))
>> > / char_length(phrase) AS nb_occurence
>> > FROM x
>> >
>> This works if the user knows the phrase. As far as I understood, the
>> phrase is not known, and user wants to count number of repeats of any
>> phrases.
>> Of course this can be done with recursive CTE. Split into words,
>> generate all phrases (AFAIK requires recursion), then group and count.
>>
>> But probably in PL/Perl this could be done  more effectively.
>>
>
>
>Is there an example of using recursive CTE to split a text string into
>words?
>
>
>Regards,
>
>
>David

Without recursive, a "brutal" solution may be something like

WITH original_text AS (SELECT 'Hello World World Hello my friend Hello World' 
sentence)
, range_to_search AS (SELECT *, generate_series(1,5) gs FROM original_text) -- 
1 is the minimal group of word searched, 5 is the maximal grouped word searched
, x AS (
SELECT r.sentence, gs
, array_to_string((array_agg(rstt.word) OVER (PARTITION BY gs ORDER BY rstt.pos 
ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING))[1:gs],' ') AS search_words
FROM range_to_search r
LEFT JOIN LATERAL regexp_split_to_table(r.sentence, ' ') WITH ORDINALITY 
rstt(word,pos) ON true
)
SELECT DISTINCT search_words, (char_length(sentence) - 
char_length(replace(sentence, search_words, '')))
/ NULLIF(char_length(search_words),0) AS nb_occurence
FROM x

It's also possible to define a minimal number of word accepted

 

Re: Counting the number of repeated phrases in a column

2022-01-27 Thread benj . dev



Le 27/01/2022 à 18:35, Merlin Moncure a écrit :

On Thu, Jan 27, 2022 at 11:09 AM Rob Sargent  wrote:


On 1/27/22 10:03, Merlin Moncure wrote:

On Wed, Jan 26, 2022 at 5:23 PM Merlin Moncure  wrote:

with s as (select 'Hello World Hello World' as sentence)
select
   phrase,
   array_upper(string_to_array((select sentence from s), phrase), 1) -
1 as occurrances
from
(
   select array_to_string(x, ' ') as phrase
   from
   (
 select distinct v[a:b]  x
 from regexp_split_to_array((select sentence from s), ' ') v
 cross join lateral generate_series(1, array_upper(v, 1)) a
 cross join lateral generate_series(a + 1, array_upper(v, 1)) b
   ) q
) q;

Simplified to:
select distinct array_to_string(v[a:b], ' ') phrase, count(*) as occurrences
from regexp_split_to_array('Hello World Hello World', ' ') v
cross join lateral generate_series(1, array_upper(v, 1)) a
cross join lateral generate_series(a + 1, array_upper(v, 1)) b
group by 1;

  phrase  │ occurances
─┼
  World Hello │  1
  Hello World Hello   │  1
  Hello World │  2
  Hello World Hello World │  1
  World Hello World   │  1

merlin


And since we're looking for repeated phrases maybe add

having count(*) > 1


thanks.  also, testing on actual data, I noticed that a couple other
things are mandatory, mainly doing a bit of massaging before
tokenizing:

select distinct array_to_string(v[a:b], ' ') phrase, count(*) as occurrences
from
(
   select array_agg(t) v
   from
   (
 select trim(replace(unnest(v), E'\n', '')) t
 from regexp_split_to_array(, ' ') v
   ) q
   where length(t) > 1
) q
cross join lateral generate_series(1, array_upper(v, 1)) a
cross join lateral generate_series(a + 1, array_upper(v, 1)) b
group by 1
having count(*) > 1;

We are definitely in N^2 space here, so look for things to start
breaking down for sentences > 1000 words.

merlin



(for better complexity) you may search about "Ukkonen suffix tree"
Similar problem as yours : 
https://www.geeksforgeeks.org/suffix-tree-application-3-longest-repeated-substring/?ref=lbp





OpenPGP_signature
Description: OpenPGP digital signature