Re: delete query using CTE
Le 13/03/2022 à 15:44, Roger Bos a écrit : Hello, trying to use CTE to remove duplicates from a table. The DELETE version does not work, but the SELECT version does, so I am not understanding what the problem is. Any suggestions on how to fix it? Here is my query: WITH cte AS ( SELECT *, ROW_NUMBER() OVER ( PARTITION BY ticker, date ORDER BY ticker, date) my_row_num FROM price_old) DELETE FROM cte WHERE my_row_num > 1; I get the following error: ERROR: relation "cte" does not exist LINE 3: DELETE FROM cte WHERE my_row_num > 1; But when I run change the query to a select query it runs fine (in that it returns all the duplicate rows). For example: WITH cte AS ( SELECT *, ROW_NUMBER() OVER ( PARTITION BY ticker, date ORDER BY ticker, date) my_row_num FROM price_old) SELECT * FROM cte WHERE my_row_num > 1; Sample output: "US00094541" "AAC" "2022-03-08 00:00:00-05" 9.75 9.76 9.75 9.75 100215 9.75 9.76 9.75 9.75 100215 0 1 2 "US00094541" "AAC" "2022-03-09 00:00:00-05" 9.75 9.76 9.75 9.76 111334 9.75 9.76 9.75 9.76 111334 0 1 2 "US9823" "AAC" "2022-03-10 00:00:00-05" 9.75 9.76 9.74 9.74 170474 9.75 9.76 9.74 9.74 170474 0 1 2 "US00090393" "ABCL" "2022-03-08 00:00:00-05" 8.19 8.545 7.81 8.22 1984348 8.19 8.545 7.81 8.22 1984348 0 1 2 Thanks, Roger As Michael Lewis says, you can't use delete from cte but you can build your request to do the work with something like (example from you original request) : WITH cte AS ( SELECT * , ROW_NUMBER() OVER (PARTITION BY a ORDER BY b) my_row_num , ctid -- or pk or other unique identifier if exists FROM price_old ) DELETE FROM tmp24 WHERE ctid IN (SELECT ctid FROM cte WHERE my_row_num = 1); Benj OpenPGP_signature Description: OpenPGP digital signature
Planner choose to use big index instead of smaller one
Hi, I have a table tpoint near that 820Mo with 700K lignes I have created 2 index on this table : CREATE INDEX idx_small_index ON public.tpoint USING btree (match_id); -- Size : 4560 kB CREATE INDEX idx_big_index ON public.tpoint USING btree (version, match_id, playnum, code_action, num_balle ); -- Size : 34 MB If I execute this request EXPLAIN(analyse, buffers) SELECT count(*) FROM tpoint WHERE match_id = 'SM001' The query planner use the idx_small_index as expected -> Index Only Scan using idx_small_index on tpoint (cost=0.42..507.10 rows=624 width=0) (actual time=0.025..0.160 rows=1017 loops=1) Index Cond: (match_id = 'SM001'::bpchar) Heap Fetches: 199 Buffers: shared hit=45 But if I execute this other request EXPLAIN(analyse, buffers) SELECT count(*) FROM tpoint WHERE match_id LIKE 'SM001%' The query planner use idx_big_index -> Index Only Scan using idx_big_index on tpoint (cost=0.42..73940.37 rows=5191 width=0) (actual time=111.014..143.379 rows=1017 loops=1) Filter: (match_id ~~ 'SM001%'::text) Rows Removed by Filter: 636819 Heap Fetches: 132426 Buffers: shared hit=473963 I really don't understand why the planner prefers to use the "big index" instead of the "small" one that containt all needed element to do an INDEX ONLY SCAN. If I deactivate the "big index" UPDATE pg_index SET indisvalid = false WHERE indexrelid = 'idx_big_index'::regclass I can see that for a first explain : -> Index Only Scan using idx_small_index on tpoint (cost=0.42..92107.62 rows=5191 width=0) (actual time=59.980..78.683 rows=1017 loops=1) Filter: (match_id ~~ 'SM001%'::text) Rows Removed by Filter: 636819 Heap Fetches: 132426 Buffers: shared hit=27668 read=564 or that for a second explain (No read): -> Index Only Scan using idx_small_index on tpoint (cost=0.42..92107.62 rows=5191 width=0) (actual time=64.012..81.122 rows=1017 loops=1) Filter: (match_id ~~ 'SM001%'::text) Rows Removed by Filter: 636819 Heap Fetches: 132426 Buffers: shared hit=28232 Tests realized on a docker postgres version = PostgreSQL 15.0 (Debian 15.0-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit SET max_parallel_workers_per_gather TO 0 Thanks for any explanation regards
plpgsql function problem whith creating temp table - not correctly using search_path ?
Hi, I would like to submit a problem (bug ?) that I encountered while handling temporary tables in plpgsql functions. First, if I create a TABLE and a TEMP TABLE with the same name, and I request without specified the schema, the temporary table is used. -- SHOW search_path; -- => "$user", public -- Exectute next commands in the same transaction /* Début Transaction 1 */ DROP TABLE IF EXISTS public.myexemple; DROP TABLE IF EXISTS pg_temp.myexemple; CREATE TABLE IF NOT EXISTS myexemple(i INT); CREATE TEMP TABLE IF NOT EXISTS myexemple(i INT) ON COMMIT DROP; INSERT INTO public.myexemple VALUES(1); INSERT INTO public.myexemple VALUES(2); INSERT INTO public.myexemple VALUES(3); INSERT INTO myexemple VALUES(7); SELECT 'public', * FROM public.myexemple UNION ALL SELECT 'noschema', * FROM myexemple UNION ALL SELECT 'pg_temp', * FROM pg_temp.myexemple; /* Fin Transaction 1 */ => public;1 public;2 public;3 noschema;7 pg_temp;7 Here, all is fine It's the expected behavior. But If I create the the TEMP TABLE in a function, it's (sometimes) the public table which is used and not the temporary table. CREATE OR REPLACE FUNCTION test_search_path_v1(tmp boolean) RETURNS TABLE(ori text, i int) LANGUAGE plpgsql AS $_$ DECLARE txt text; BEGIN CREATE TEMP TABLE IF NOT EXISTS return_table(ori TEXT, i INT) ON COMMIT DROP; IF $1 THEN CREATE TEMP TABLE IF NOT EXISTS my_table(i INT) ON COMMIT DROP; ELSE CREATE TABLE IF NOT EXISTS my_table(i INT); END IF; SELECT setting FROM pg_settings WHERE name = 'search_path' INTO txt; RAISE INFO 'search_path = %', txt; INSERT INTO my_table VALUES((random() * 100)::INT); FOR txt IN SELECT schemaname FROM pg_tables WHERE tablename = 'my_table' LOOP RAISE INFO '==> %', txt; END LOOP; IF EXISTS (SELECT 1 FROM pg_tables WHERE schemaname ='public' AND tablename = 'my_table') THEN RAISE INFO 'public.my_table exists'; INSERT INTO return_table SELECT 'public', t.i FROM public.my_table t; END IF; IF EXISTS (SELECT 1 FROM pg_tables WHERE schemaname LIKE 'pg_temp%' AND tablename = 'my_table') THEN RAISE INFO 'pg_temp.my_table exists'; INSERT INTO return_table SELECT 'pg_temp', t.i FROM pg_temp.my_table t; END IF; INSERT INTO return_table SELECT '', t.i FROM my_table t; RETURN QUERY SELECT t.ori, t.i FROM return_table t; END; $_$; SHOW search_path -- => "$user", public; DROP TABLE IF EXISTS my_table; -- Executing each row on differents transactions but in the same session /*Session A - Transaction 1*/ SELECT * FROM test_search_path_v1(true); -- => OK takes table from pg_temp (no existing table in public) /*Session A - Transaction 2*/ SELECT * FROM test_search_path_v1(false); -- => OK takes table from public /*Session A - Transaction 3*/ SELECT * FROM test_search_path_v1(true); -- => OK takes table from pg_temp (and the existing from public) /*Session A - Transaction 4*/ SELECT * FROM test_search_path_v1(false); -- => OK takes table from public /*Session A - Transaction 5*/ SELECT * FROM test_search_path_v1(true); -- => NOK => it takes public and not pg_temp In fact, if I call 0 or 1 time "test_search_path_v1(false)" all the call with "true" will be ok. But AFTER the second call with false, all subsequent call with true will failed => using public instead of pg_temp for the INSERT INTO my_table VALUES((random() * 100)::INT) If I do the test with changing session before exh call, the problem doesn't appear -- Executing each row on differents session /*Session A */ SELECT * FROM test_search_path_v1(true); -- => OK takes table from pg_temp (no existing table in public) /*Session B */ SELECT * FROM test_search_path_v1(false); -- => OK takes table from public /*Session C */ SELECT * FROM test_search_path_v1(true); -- => OK takes table from pg_temp (and the existing from public) /*Session D */ SELECT * FROM test_search_path_v1(false); -- => OK takes table from public /*Session E */ SELECT * FROM test_search_path_v1(true); -- => OK takes table from pg_temp (and the existing from public) It's possible to bypass te problem with enforce the use of pg_temp like in this second version. CREATE OR REPLACE FUNCTION test_search_path_v2(tmp boolean) RETURNS table(ori text, i int) LANGUAGE plpgsql AS $_$ DECLARE txt text; BEGIN CREATE TEMP TABLE IF NOT EXISTS return_table(ori TEXT, i INT) ON COMMIT DROP; IF $1 THEN PERFORM set_config('search_path', 'pg_temp, "$user", public', true); -- is_local = true CREATE TEMP TABLE IF NOT EXISTS my_table(i INT) ON COMMIT DROP; ELSE PERFORM set_config('search_path', '"$user", public', true); -- is_local = true CREATE TABLE IF NOT EXISTS my_table(i INT); END IF; SELECT setting FROM pg_settings WHERE name = 'search_path' INTO txt; RAISE INFO 'search_path = %', txt; INSERT INTO my_table VALUES((random() * 100)::INT); FOR txt IN SELECT schemaname FROM pg_tables WHERE tablename = 'my_
Re: plpgsql function problem whith creating temp table - not correctly using search_path ?
Hi, Thank you for pointing this part of the documentation. It's actually works with EXECUTE 'INSERT INTO my_table VALUES((random() * 100)::INT);'; INSTEAD OF INSERT INTO my_table VALUES((random() * 100)::INT); And it's possible to suppose that the test_search_path_v2 worked before because of the PERFORM set_config('search_path', '"$user", public', true); I imagine that the changement of search_path have on side effect to invalidate the cached plans here and force to recompute the query plan. I imagine that the probleme appears in test_search_path_v1 after second call maybe because the cache is kept by following rules such as a certain number of executions In this example, use EXECUTE only in INSERT INTO my_table is sufficient to remove the problem. subsequent SELECT works (without the EXECUTE). Does doing an EXECUTE on a table have the side effect of invalidating caches using that table name or is it just a "chance" here and for added security I should use EXECUTE everywhere? Thanks De : "David G. Johnston" A : "benj@laposte.net" ,"pgsql-gene...@postgresql.org" Envoyé: mardi 11 Janvier 2022 15:18 Objet : Re: plpgsql function problem whith creating temp table - not correctly using search_path ? On Tuesday, January 11, 2022, wrote: SHOW search_path -- => "$user", public; DROP TABLE IF EXISTS my_table; -- Executing each row on differents transactions but in the same session /*Session A - Transaction 1*/ SELECT * FROM test_search_path_v1(true); -- => OK takes table from pg_temp (no existing table in public) /*Session A - Transaction 2*/ SELECT * FROM test_search_path_v1(false); -- => OK takes table from public /*Session A - Transaction 3*/ SELECT * FROM test_search_path_v1(true); -- => OK takes table from pg_temp (and the existing from public) /*Session A - Transaction 4*/ SELECT * FROM test_search_path_v1(false); -- => OK takes table from public /*Session A - Transaction 5*/ SELECT * FROM test_search_path_v1(true); -- => NOK => it takes public and not pg_temp Per [1], you are violating: “Because PL/pgSQL saves prepared statements and sometimes execution plans in this way, SQL commands that appear directly in a PL/pgSQL function must refer to the same tables and columns on every execution; that is, you cannot use a parameter as the name of a table or column in an SQL command.” While that wording is odd, it is not even possible to use variables in place of table and column names, what you are doing here is effectively the same. I cannot explain the precise observed behavior, and maybe there is a bug here, but on its face what you are doing is simply invalid in face of how the session relation cache works. [1] https://www.postgresql.org/docs/current/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING David J.
Re: Counting the number of repeated phrases in a column
Le 25/01/2022 à 18:10, Shaozhong SHI a écrit : There is a short of a function in the standard Postgres to do the following: It is easy to count the number of occurrence of words, but it is rather difficult to count the number of occurrence of phrases. For instance: A cell of value: 'Hello World' means 1 occurrence a phrase. A cell of value: 'Hello World World Hello' means no occurrence of any repeated phrase. But, A cell of value: 'Hello World World Hello Hello World' means 2 occurrences of 'Hello World'. 'The City of London, London' also has no occurrences of any repeated phrase. Anyone has got such a function to check out the number of occurrence of any repeated phrases? Regards, David Don't know if it's exactly what you want, but you can replace all occurence of the phrase in the text by empty string and compute the diff between the initial and the result and next divide by the length of your phrase. Example : WITH x AS (SELECT 'toto like tata and toto like titi and toto like tutu' , 'toto like' phrase) SELECT (char_length(texte) - char_length(replace(texte, phrase, ''))) / char_length(phrase) AS nb_occurence FROM x OpenPGP_signature Description: OpenPGP digital signature
Re: Counting the number of repeated phrases in a column
>On Tue, 25 Jan 2022 at 21:33, Ivan Panchenko > >wrote: > > >> >> On 26.01.2022 00:21, benj(dot)dev(at)laposte(dot)net wrote: >> > Le 25/01/2022 à 18:10, Shaozhong SHI a écrit : >> >> There is a short of a function in the standard Postgres to do the >> >> following: >> >> >> >> It is easy to count the number of occurrence of words, but it is >> >> rather difficult to count the number of occurrence of phrases. >> >> >> >> For instance: >> >> >> >> A cell of value: 'Hello World' means 1 occurrence a phrase. >> >> >> >> A cell of value: 'Hello World World Hello' means no occurrence of any >> >> repeated phrase. >> >> >> >> But, A cell of value: 'Hello World World Hello Hello World' means 2 >> >> occurrences of 'Hello World'. >> >> >> >> 'The City of London, London' also has no occurrences of any repeated >> >> phrase. >> >> >> >> Anyone has got such a function to check out the number of occurrence >> >> of any repeated phrases? >> >> >> >> Regards, >> >> >> >> David >> > >> > Don't know if it's exactly what you want, but you can replace all >> > occurence of the phrase in the text by empty string and compute the >> > diff between the initial and the result and next divide by the length >> > of your phrase. >> > >> > Example : >> > WITH x AS (SELECT 'toto like tata and toto like titi and toto like >> > tutu' , 'toto like' phrase) >> > SELECT (char_length(texte) - char_length(replace(texte, phrase, ''))) >> > / char_length(phrase) AS nb_occurence >> > FROM x >> > >> This works if the user knows the phrase. As far as I understood, the >> phrase is not known, and user wants to count number of repeats of any >> phrases. >> Of course this can be done with recursive CTE. Split into words, >> generate all phrases (AFAIK requires recursion), then group and count. >> >> But probably in PL/Perl this could be done more effectively. >> > > >Is there an example of using recursive CTE to split a text string into >words? > > >Regards, > > >David Without recursive, a "brutal" solution may be something like WITH original_text AS (SELECT 'Hello World World Hello my friend Hello World' sentence) , range_to_search AS (SELECT *, generate_series(1,5) gs FROM original_text) -- 1 is the minimal group of word searched, 5 is the maximal grouped word searched , x AS ( SELECT r.sentence, gs , array_to_string((array_agg(rstt.word) OVER (PARTITION BY gs ORDER BY rstt.pos ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING))[1:gs],' ') AS search_words FROM range_to_search r LEFT JOIN LATERAL regexp_split_to_table(r.sentence, ' ') WITH ORDINALITY rstt(word,pos) ON true ) SELECT DISTINCT search_words, (char_length(sentence) - char_length(replace(sentence, search_words, ''))) / NULLIF(char_length(search_words),0) AS nb_occurence FROM x It's also possible to define a minimal number of word accepted
Re: Counting the number of repeated phrases in a column
Le 27/01/2022 à 18:35, Merlin Moncure a écrit : On Thu, Jan 27, 2022 at 11:09 AM Rob Sargent wrote: On 1/27/22 10:03, Merlin Moncure wrote: On Wed, Jan 26, 2022 at 5:23 PM Merlin Moncure wrote: with s as (select 'Hello World Hello World' as sentence) select phrase, array_upper(string_to_array((select sentence from s), phrase), 1) - 1 as occurrances from ( select array_to_string(x, ' ') as phrase from ( select distinct v[a:b] x from regexp_split_to_array((select sentence from s), ' ') v cross join lateral generate_series(1, array_upper(v, 1)) a cross join lateral generate_series(a + 1, array_upper(v, 1)) b ) q ) q; Simplified to: select distinct array_to_string(v[a:b], ' ') phrase, count(*) as occurrences from regexp_split_to_array('Hello World Hello World', ' ') v cross join lateral generate_series(1, array_upper(v, 1)) a cross join lateral generate_series(a + 1, array_upper(v, 1)) b group by 1; phrase │ occurances ─┼ World Hello │ 1 Hello World Hello │ 1 Hello World │ 2 Hello World Hello World │ 1 World Hello World │ 1 merlin And since we're looking for repeated phrases maybe add having count(*) > 1 thanks. also, testing on actual data, I noticed that a couple other things are mandatory, mainly doing a bit of massaging before tokenizing: select distinct array_to_string(v[a:b], ' ') phrase, count(*) as occurrences from ( select array_agg(t) v from ( select trim(replace(unnest(v), E'\n', '')) t from regexp_split_to_array(, ' ') v ) q where length(t) > 1 ) q cross join lateral generate_series(1, array_upper(v, 1)) a cross join lateral generate_series(a + 1, array_upper(v, 1)) b group by 1 having count(*) > 1; We are definitely in N^2 space here, so look for things to start breaking down for sentences > 1000 words. merlin (for better complexity) you may search about "Ukkonen suffix tree" Similar problem as yours : https://www.geeksforgeeks.org/suffix-tree-application-3-longest-repeated-substring/?ref=lbp OpenPGP_signature Description: OpenPGP digital signature