RE: text fields and performance for ETL
Hi David. > We’re entitled to a bit of self-centeredness here, especially when the > broader world is so varied One can tell ;-) Current text: [cid:image001.png@01D7D233.09B9B710] Suggested current text addendum: But, if you consider doing ETL from Postgres database to some outer target environment and you seek performance in such interfaces, follow the logic of limited size (varchar or char) data types in your database model. Because if you make a dataflow of known size types, interface code can take a big block of data while for text fields you need to check each record. Optional additional remark: Another good aspect of known data sizes is easier understanding of field content and implicit data (length) control. Best regards Grega From: David G. Johnston Sent: Thursday, November 4, 2021 2:51 PM To: Grega Jesih Cc: Tom Lane ; Pg Docs Subject: Re: text fields and performance for ETL On Thursday, November 4, 2021, Grega Jesih mailto:grega.je...@actual-it.si>> wrote: So why would it then be published ? Because inexperienced programmers take your statement that "it's the same performance" from a wrong perspective, so it would be fair to note, that the remark "it's the same performance" is meant "within any operation inside Postgres database". In the moment we want to take this data elsewhere, the problematic lack of model design comes out. If you wish to supply an actual patch for consideration I’d review it. Absent that the documentation serves the vast majority of readers well as-is. We’re entitled to a bit of self-centeredness here, especially when the broader world is so varied. David J. NOTICE - NOT TO BE REMOVED. This e-mail and any attachments are confidential and may contain legally privileged information and/or copyright material of Actual I.T. or third parties. If you are not an authorised recipient of this e-mail, please contact Actual I.T. immediately by return email or by telephone or facsimile on the above numbers. You should not read, print, re-transmit, store or act in reliance on this email or any attachments and you should destroy all copies of them.
Re: text fields and performance for ETL
On Fri, 2021-11-05 at 09:51 +, Grega Jesih wrote: > Suggested current text addendum: > > But, if you consider doing ETL from Postgres database to some outer target > environment and you seek performance in such interfaces, follow the logic > of limited size (varchar or char) data types in your database model. > Because if you make a dataflow of known size types, interface code can take > a big block of data while for text fields you need to check each record. I am opposed to that. It is not our business to discuss the limitations of a certain third-party software product. If that were something wide-spread, perhaps. But I myself have never seen a problem with "text", as long as the actual size of the data is moderate. > Optional additional remark: > > Another good aspect of known data sizes is easier understanding of field > content and implicit data (length) control. Something like that makes more sense to me. Perhaps, right before the tip you quoted, something like that: If your use case requires a length limit on character data, or compliance with the SQL standard is important, use "character varying". Otherwise, you are usually better off with "text". Yours, Laurenz Albe
Re: text fields and performance for ETL
On Fri, Nov 5, 2021 at 03:15:35PM +0100, Laurenz Albe wrote: > On Fri, 2021-11-05 at 09:51 +, Grega Jesih wrote: > > Suggested current text addendum: > > > > But, if you consider doing ETL from Postgres database to some outer target > > environment and you seek performance in such interfaces, follow the logic > > of limited size (varchar or char) data types in your database model. > > Because if you make a dataflow of known size types, interface code can take > > a big block of data while for text fields you need to check each record. > > I am opposed to that. > > It is not our business to discuss the limitations of a certain third-party > software product. > If that were something wide-spread, perhaps. But I myself have never seen a > problem > with "text", as long as the actual size of the data is moderate. Agreed. > > Optional additional remark: > > > > Another good aspect of known data sizes is easier understanding of field > > content and implicit data (length) control. > > Something like that makes more sense to me. > > Perhaps, right before the tip you quoted, something like that: > > If your use case requires a length limit on character data, or compliance > with the SQL standard is important, use "character varying". > Otherwise, you are usually better off with "text". I can support that if others think it is valuable. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: vacuumdb --analyze-in-stages
On 2021-Oct-26, nikolai.berkoff wrote: > Given all the suggestions I've tried to combine them into one patch again. OK, I pushed something very similar but not exactly your patch ... please have a look if you care to suggest improvements. ... oh, I just noticed I made a typo -- omitted "with" in "running this option". I'll wait before pushing a fix in case there are other rewording suggestions. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: text fields and performance for ETL
On Friday, November 5, 2021, Bruce Momjian wrote: > > > > > Perhaps, right before the tip you quoted, something like that: > > > > If your use case requires a length limit on character data, or > compliance > > with the SQL standard is important, use "character varying". > > Otherwise, you are usually better off with "text". > > I can support that if others think it is valuable. > > The motivating complaint is that we should be encouraging people to use varchar(4000) instead of text so external tools can optimize. If we are not going to do that I really don’t see the pointing in changing away from out current position of “only use text”. True length limit requirements for data are rare, and better done in constraints along with all other the other constraint that may exist for the data. I believe comments with respect to the SQL standard are already present and adequate. David J.
Re: text fields and performance for ETL
On Fri, Nov 5, 2021 at 07:32:12AM -0700, David G. Johnston wrote: > On Friday, November 5, 2021, Bruce Momjian wrote: > > > > > > Perhaps, right before the tip you quoted, something like that: > > > > If your use case requires a length limit on character data, or > compliance > > with the SQL standard is important, use "character varying". > > Otherwise, you are usually better off with "text". > > I can support that if others think it is valuable. > > > > The motivating complaint is that we should be encouraging people to use > varchar > (4000) instead of text so external tools can optimize. If we are not going to > do that I really don’t see the pointing in changing away from out current > position of “only use text”. True length limit requirements for data are > rare, > and better done in constraints along with all other the other constraint that > may exist for the data. I believe comments with respect to the SQL standard > are already present and adequate. Agreed. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: text fields and performance for ETL
On Fri, 2021-11-05 at 11:27 -0400, Bruce Momjian wrote: > On Fri, Nov 5, 2021 at 07:32:12AM -0700, David G. Johnston wrote: > > On Friday, November 5, 2021, Bruce Momjian wrote: > > > > > > > > > > Perhaps, right before the tip you quoted, something like that: > > > > > > If your use case requires a length limit on character data, or > > compliance > > > with the SQL standard is important, use "character varying". > > > Otherwise, you are usually better off with "text". > > > > I can support that if others think it is valuable. > > > > > > > > The motivating complaint is that we should be encouraging people to use > > varchar > > (4000) instead of text so external tools can optimize. If we are not going > > to > > do that I really don’t see the pointing in changing away from out current > > position of “only use text”. True length limit requirements for data are > > rare, > > and better done in constraints along with all other the other constraint > > that > > may exist for the data. I believe comments with respect to the SQL standard > > are already present and adequate. > > Agreed. +1, so let's leave it as it is. Yours, Laurenz Albe