>On Tue, 25 Jan 2022 at 21:33, Ivan Panchenko ><i(dot)panchenko(at)postgrespro(dot)ru> >wrote: > > >> >> On 26.01.2022 00:21, benj(dot)dev(at)laposte(dot)net wrote: >> > Le 25/01/2022 à 18:10, Shaozhong SHI a écrit : >> >> There is a short of a function in the standard Postgres to do the >> >> following: >> >> >> >> It is easy to count the number of occurrence of words, but it is >> >> rather difficult to count the number of occurrence of phrases. >> >> >> >> For instance: >> >> >> >> A cell of value: 'Hello World' means 1 occurrence a phrase. >> >> >> >> A cell of value: 'Hello World World Hello' means no occurrence of any >> >> repeated phrase. >> >> >> >> But, A cell of value: 'Hello World World Hello Hello World' means 2 >> >> occurrences of 'Hello World'. >> >> >> >> 'The City of London, London' also has no occurrences of any repeated >> >> phrase. >> >> >> >> Anyone has got such a function to check out the number of occurrence >> >> of any repeated phrases? >> >> >> >> Regards, >> >> >> >> David >> > >> > Don't know if it's exactly what you want, but you can replace all >> > occurence of the phrase in the text by empty string and compute the >> > diff between the initial and the result and next divide by the length >> > of your phrase. >> > >> > Example : >> > WITH x AS (SELECT 'toto like tata and toto like titi and toto like >> > tutu' , 'toto like' phrase) >> > SELECT (char_length(texte) - char_length(replace(texte, phrase, ''))) >> > / char_length(phrase) AS nb_occurence >> > FROM x >> > >> This works if the user knows the phrase. As far as I understood, the >> phrase is not known, and user wants to count number of repeats of any >> phrases. >> Of course this can be done with recursive CTE. Split into words, >> generate all phrases (AFAIK requires recursion), then group and count. >> >> But probably in PL/Perl this could be done more effectively. >> > > >Is there an example of using recursive CTE to split a text string into >words? > > >Regards, > > >David
Without recursive, a "brutal" solution may be something like WITH original_text AS (SELECT 'Hello World World Hello my friend Hello World' sentence) , range_to_search AS (SELECT *, generate_series(1,5) gs FROM original_text) -- 1 is the minimal group of word searched, 5 is the maximal grouped word searched , x AS ( SELECT r.sentence, gs , array_to_string((array_agg(rstt.word) OVER (PARTITION BY gs ORDER BY rstt.pos ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING))[1:gs],' ') AS search_words FROM range_to_search r LEFT JOIN LATERAL regexp_split_to_table(r.sentence, ' ') WITH ORDINALITY rstt(word,pos) ON true ) SELECT DISTINCT search_words, (char_length(sentence) - char_length(replace(sentence, search_words, ''))) / NULLIF(char_length(search_words),0) AS nb_occurence FROM x It's also possible to define a minimal number of word accepted