Slow down dev database transactions/second for testing?
Hi, I maintain a Django webapp that uses postgresql and can create inefficient queries if I'm not careful. I'm looking for ways to mimic a congested db server in development to expose these queries. The configuration of postgresql is complicated - is there a simple method by which I could, for example limit the number of transactions/second to a certain level by adjusting postgresql.conf? Many thanks! -- R. Richard Brockie Real-time bicycle race management - www.ontheday.net
Re: Slow down dev database transactions/second for testing?
> On 05/02/2023 23:17 CET Richard Brockie wrote: > > I maintain a Django webapp that uses postgresql and can create inefficient > queries if I'm not careful. I'm looking for ways to mimic a congested db > server in development to expose these queries. pgbench is what your looking for: https://www.postgresql.org/docs/current/pgbench.html You can run custom statements with the --file option. Get the statements that Django generates and let pgbench run those to analyze the bottlenecks. Or let pgbench create load for some time (see option --time) while you debug your Django app. > The configuration of postgresql is complicated - is there a simple method by > which I could, for example limit the number of transactions/second to a > certain level by adjusting postgresql.conf? No. Postgres will execute as fast as possible with the available resources. -- Erik
Question regarding UTF-8 data and "C" collation on definition of field of table
Hello all, I have a question regarding the definition of the type of a character field in a table and more specifically about its collation and UTF-8 characters and strings. Let's say that the definition is for example as follows: name character varying(8) COLLATE pg_catalog."C" NOT NULL and also assume that the database default encoding is UTF8 and also the Collate and Ctype is "C"". I plan to store strings of various languages in this field. Are these the correct settings that I should have used on creation of the database?. Thank you in Advance! Kindest regards, Dionisis Kontominas
Re: Question regarding UTF-8 data and "C" collation on definition of field of table
Dionisis Kontominas writes: > Let's say that the definition is for example as follows: > name character varying(8) COLLATE pg_catalog."C" NOT NULL > and also assume that the database default encoding is UTF8 and also the > Collate and Ctype is "C"". I plan to store strings of various languages in > this field. > Are these the correct settings that I should have used on creation of > the database?. Well, it won't crash or anything, but sorting will be according to byte-by-byte values. So the sort order of non-ASCII text is likely to look odd. How much do you care about that? regards, tom lane
Re: Question regarding UTF-8 data and "C" collation on definition of field of table
Hello Tom, Thank you for your response. I suppose that affects the outcome of ORDER BY clauses on the field, along with the content of the indexes. Is this right? Assuming that the requirement exists, to store UTF-8 characters on a field that can be from multiple languages, and the database default encoding is UTF8 which is the right thing I suppose (please verify), what do you think should be the values of the Collation and Ctype for the database to behave correctly? I could not find something specific in the documentation. What I did find interesting though is the below statement: 24.2.2.1. Standard Collations "Additionally, the SQL standard collation name ucs_basic is available for encoding UTF8. It is equivalent to C and sorts by Unicode code point." Is this the right collation in the creation of the database in this use case? If so, what would be the corresponding suitable Ctype? Regards, Dionisis On Mon, 6 Feb 2023 at 00:24, Tom Lane wrote: > Dionisis Kontominas writes: > > Let's say that the definition is for example as follows: > > name character varying(8) COLLATE pg_catalog."C" NOT NULL > > and also assume that the database default encoding is UTF8 and also the > > Collate and Ctype is "C"". I plan to store strings of various languages > in > > this field. > > > Are these the correct settings that I should have used on creation of > > the database?. > > Well, it won't crash or anything, but sorting will be according > to byte-by-byte values. So the sort order of non-ASCII text is > likely to look odd. How much do you care about that? > > regards, tom lane >
Re: Question regarding UTF-8 data and "C" collation on definition of field of table
Dionisis Kontominas writes: >I suppose that affects the outcome of ORDER BY clauses on the field, > along with the content of the indexes. Is this right? Yeah. >Assuming that the requirement exists, to store UTF-8 characters on a > field that can be from multiple languages, and the database default > encoding is UTF8 which is the right thing I suppose (please verify), what > do you think should be the values of the Collation and Ctype for the > database to behave correctly? Um ... so define "correct". If you have a mishmash of languages in the same column, it's likely that they have conflicting rules about sorting, and there may be no ordering that's not surprising to somebody. If there's a predominant language in the data, selecting a collation matching that seems like your best bet. Otherwise, maybe you should just shrug your shoulders and stick with C collation. It's likely to be faster than any alternative. regards, tom lane
Re: Question regarding UTF-8 data and "C" collation on definition of field of table
Hi Tom, 1. Regarding the different languages in the same column, that is normal if the column is a UTF-8 one, i.e. should be able to hold for example English, Greek and Chinese characters. In this case what is the best approach to define the collation and lctype of the column? Either C or ucs_basic maybe or something else or it does not matter, the characters would be stored correctly as long as the database is UTF8 encoding? 2. Is there a real difference between C and ucs_basic collation/lctype? I have looked at the documentation and on the internet and most information converges that they are actually the same as behavior is concerned; actually no difference. What I suspect though as a non-similar is that C after the ASCII part sorts the characters according to byte code point, whereas the ucs_basic sorts those characters according to the Unicode code point which may be different I suppose. Can you confirm? 3. In case the previous is correct I think that the most successful collation for unicode columns would e to set the collation to ucs_basic collation type and lctype accordingly if exists. Regards, Dionisis On Mon, 6 Feb 2023 at 01:19, Tom Lane wrote: > Dionisis Kontominas writes: > >I suppose that affects the outcome of ORDER BY clauses on the field, > > along with the content of the indexes. Is this right? > > Yeah. > > >Assuming that the requirement exists, to store UTF-8 characters on a > > field that can be from multiple languages, and the database default > > encoding is UTF8 which is the right thing I suppose (please verify), what > > do you think should be the values of the Collation and Ctype for the > > database to behave correctly? > > Um ... so define "correct". If you have a mishmash of languages in the > same column, it's likely that they have conflicting rules about sorting, > and there may be no ordering that's not surprising to somebody. > > If there's a predominant language in the data, selecting a collation > matching that seems like your best bet. Otherwise, maybe you should > just shrug your shoulders and stick with C collation. It's likely > to be faster than any alternative. > > regards, tom lane >
Re: Question regarding UTF-8 data and "C" collation on definition of field of table
Why are you specifying the collation to be "C" when the default db encoding is UTF8, and UTF-8 has Greek, Chinese and English encodings? On 2/5/23 17:08, Dionisis Kontominas wrote: Hello all, I have a question regarding the definition of the type of a character field in a table and more specifically about its collation and UTF-8 characters and strings. Let's say that the definition is for example as follows: name character varying(8) COLLATE pg_catalog."C" NOT NULL and also assume that the database default encoding is UTF8 and also the Collate and Ctype is "C"". I plan to store strings of various languages in this field. Are these the correct settings that I should have used on creation of the database?. Thank you in Advance! Kindest regards, Dionisis Kontominas -- Born in Arizona, moved to Babylonia.
Re: Question regarding UTF-8 data and "C" collation on definition of field of table
Because if I don't specify the collation/lctype it seems to get the default from the OS, which in my case is : English_Netherlands.1252 (database encoding UTF8). That might not be best for truly unicode content columns, so I investigated the "C" option, which also seems not to work; might be worse. To reframe my question, when you expect multilingual data in a column and the database encoding is utf8, which seems to accommodate the need for storage, what could be considered as best practice (if it can exist really) for collation and lctype? On Mon, 6 Feb 2023 at 01:57, Ron wrote: > Why are you specifying the collation to be "C" when the default db > encoding > is UTF8, and UTF-8 has Greek, Chinese and English encodings? > > On 2/5/23 17:08, Dionisis Kontominas wrote: > > Hello all, > > > > I have a question regarding the definition of the type of a character > > field in a table and more specifically about its collation and UTF-8 > > characters and strings. > > > > Let's say that the definition is for example as follows: > > > > name character varying(8) COLLATE pg_catalog."C" NOT NULL > > > > and also assume that the database default encoding is UTF8 and also the > > Collate and Ctype is "C"". I plan to store strings of various languages > in > > this field. > > > > Are these the correct settings that I should have used on creation of > > the database?. > > > > Thank you in Advance! > > > > Kindest regards, > > > > Dionisis Kontominas > > -- > Born in Arizona, moved to Babylonia. > > >
Re: Question regarding UTF-8 data and "C" collation on definition of field of table
On Sun, Feb 5, 2023 at 4:19 PM Tom Lane wrote: > If there's a predominant language in the data, selecting a collation > matching that seems like your best bet. Otherwise, maybe you should > just shrug your shoulders and stick with C collation. It's likely > to be faster than any alternative. FWIW there are certain "compromise locales" supported by ICU/CLDR. These include "English (Europe)", and, most notably, EOR (European Ordering Rules): https://en.wikipedia.org/wiki/European_ordering_rules I'm not sure how widely used those are. EOR seems to have been standardized by the EU or by an adjacent institution, so not sure how widely used it really is. It's also possible to use a custom collation with ICU, which is almost infinitely flexible: http://www.unicode.org/reports/tr10/#Customization As an example, the rules about the relative ordering of each script can be changed this way. There is also something called merged tailorings. The OP should see the Postgres ICU docs for hints on how to use these facilities to make a custom collation that matches whatever their requirements are: https://www.postgresql.org/docs/current/collation.html#COLLATION-MANAGING -- Peter Geoghegan
Re: Question regarding UTF-8 data and "C" collation on definition of field of table
Dionisis Kontominas writes: >1. Regarding the different languages in the same column, that is normal >if the column is a UTF-8 one, i.e. should be able to hold for example >English, Greek and Chinese characters. In this case what is the best >approach to define the collation and lctype of the column? Either C >or ucs_basic maybe or something else or it does not matter, the characters >would be stored correctly as long as the database is UTF8 encoding? The characters will be stored correctly, yes. Collation only affects sort order. Ctype affects some other functions like upper/lowercase folding and what is considered a "letter" in regexps. If you use "C" for lc_ctype then only ASCII letters will be folded or recognized as letters. >2. Is there a real difference between C and ucs_basic collation/lctype? No, at least not in Postgres. regards, tom lane
Re: Slow down dev database transactions/second for testing?
On Sun, Feb 5, 2023 at 3:05 PM Erik Wienhold wrote: > > On 05/02/2023 23:17 CET Richard Brockie > wrote: > > > > I maintain a Django webapp that uses postgresql and can create > inefficient > > queries if I'm not careful. I'm looking for ways to mimic a congested db > > server in development to expose these queries. > > pgbench is what your looking for: > https://www.postgresql.org/docs/current/pgbench.html > > You can run custom statements with the --file option. Get the statements > that > Django generates and let pgbench run those to analyze the bottlenecks. Or > let > pgbench create load for some time (see option --time) while you debug your > Django app. > Great - thanks for the suggestion. > > The configuration of postgresql is complicated - is there a simple > method by > > which I could, for example limit the number of transactions/second to a > > certain level by adjusting postgresql.conf? > > No. Postgres will execute as fast as possible with the available > resources. > Understood - thanks again! -- R. Richard Brockie Real-time bicycle race management - www.ontheday.net