RE: text fields and performance for ETL

2021-11-05 Thread Grega Jesih
Hi David.

> We’re entitled to a bit of self-centeredness here, especially when the 
> broader world is so varied

One can tell ;-)

Current text:

[cid:image001.png@01D7D233.09B9B710]


Suggested current text addendum:

But, if you consider doing ETL from Postgres database to some outer target 
environment and you seek performance in such interfaces, follow the logic of 
limited size (varchar or char) data types in your database model. Because if 
you make a dataflow of known size types, interface code can take a big block of 
data while for text fields you need to check each record.

Optional additional remark:

Another good aspect of known data sizes is easier understanding of field 
content and implicit data (length) control.


Best regards
Grega

From: David G. Johnston 
Sent: Thursday, November 4, 2021 2:51 PM
To: Grega Jesih 
Cc: Tom Lane ; Pg Docs 
Subject: Re: text fields and performance for ETL

On Thursday, November 4, 2021, Grega Jesih 
mailto:grega.je...@actual-it.si>> wrote:

So why would it then be published ? Because inexperienced programmers take your 
statement that "it's the same performance" from a wrong perspective, so it 
would be fair to note, that the remark "it's the same performance" is meant 
"within any operation inside Postgres database". In the moment we want to take 
this data elsewhere,
the problematic lack of model design comes out.
If you wish to supply an actual patch for consideration I’d review it.  Absent 
that the documentation serves the vast majority of readers well as-is.  We’re 
entitled to a bit of self-centeredness here, especially when the broader world 
is so varied.

David J.


NOTICE - NOT TO BE REMOVED.
This e-mail and any attachments are confidential and may contain legally 
privileged information and/or copyright material of Actual I.T. or third 
parties. If you are not an authorised recipient of this e-mail, please contact 
Actual I.T. immediately by return email or by telephone or facsimile on the 
above numbers.
You should not read, print, re-transmit, store or act in reliance on this email 
or any attachments and you should destroy all copies of them.


Re: text fields and performance for ETL

2021-11-05 Thread Laurenz Albe
On Fri, 2021-11-05 at 09:51 +, Grega Jesih wrote:
> Suggested current text addendum:
> 
> But, if you consider doing ETL from Postgres database to some outer target
> environment and you seek performance in such interfaces, follow the logic
> of limited size (varchar or char) data types in your database model.
> Because if you make a dataflow of known size types, interface code can take
> a big block of data while for text fields you need to check each record.

I am opposed to that.

It is not our business to discuss the limitations of a certain third-party 
software product.
If that were something wide-spread, perhaps.  But I myself have never seen a 
problem
with "text", as long as the actual size of the data is moderate.
 
> Optional additional remark:
> 
> Another good aspect of known data sizes is easier understanding of field
> content and implicit data (length) control.

Something like that makes more sense to me.

Perhaps, right before the tip you quoted, something like that:

  If your use case requires a length limit on character data, or compliance
  with the SQL standard is important, use "character varying".
  Otherwise, you are usually better off with "text".

Yours,
Laurenz Albe





Re: text fields and performance for ETL

2021-11-05 Thread Bruce Momjian
On Fri, Nov  5, 2021 at 03:15:35PM +0100, Laurenz Albe wrote:
> On Fri, 2021-11-05 at 09:51 +, Grega Jesih wrote:
> > Suggested current text addendum:
> > 
> > But, if you consider doing ETL from Postgres database to some outer target
> > environment and you seek performance in such interfaces, follow the logic
> > of limited size (varchar or char) data types in your database model.
> > Because if you make a dataflow of known size types, interface code can take
> > a big block of data while for text fields you need to check each record.
> 
> I am opposed to that.
> 
> It is not our business to discuss the limitations of a certain third-party 
> software product.
> If that were something wide-spread, perhaps.  But I myself have never seen a 
> problem
> with "text", as long as the actual size of the data is moderate.

Agreed.

> > Optional additional remark:
> > 
> > Another good aspect of known data sizes is easier understanding of field
> > content and implicit data (length) control.
> 
> Something like that makes more sense to me.
> 
> Perhaps, right before the tip you quoted, something like that:
> 
>   If your use case requires a length limit on character data, or compliance
>   with the SQL standard is important, use "character varying".
>   Otherwise, you are usually better off with "text".

I can support that if others think it is valuable.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: vacuumdb --analyze-in-stages

2021-11-05 Thread Alvaro Herrera
On 2021-Oct-26, nikolai.berkoff wrote:

> Given all the suggestions I've tried to combine them into one patch again.

OK, I pushed something very similar but not exactly your patch ...
please have a look if you care to suggest improvements.

... oh, I just noticed I made a typo -- omitted "with" in "running this
option".  I'll wait before pushing a fix in case there are other
rewording suggestions.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: text fields and performance for ETL

2021-11-05 Thread David G. Johnston
On Friday, November 5, 2021, Bruce Momjian  wrote:

>
> >
> > Perhaps, right before the tip you quoted, something like that:
> >
> >   If your use case requires a length limit on character data, or
> compliance
> >   with the SQL standard is important, use "character varying".
> >   Otherwise, you are usually better off with "text".
>
> I can support that if others think it is valuable.
>
>
The motivating complaint is that we should be encouraging people to use
varchar(4000) instead of text so external tools can optimize.  If we are
not going to do that I really don’t see the pointing in changing away from
out current position of “only use text”.  True length limit requirements
for data are rare, and better done in constraints along with all other the
other constraint that may exist for the data.  I believe comments with
respect to the SQL standard are already present and adequate.

David J.


Re: text fields and performance for ETL

2021-11-05 Thread Bruce Momjian
On Fri, Nov  5, 2021 at 07:32:12AM -0700, David G. Johnston wrote:
> On Friday, November 5, 2021, Bruce Momjian  wrote:
> 
> 
> >
> > Perhaps, right before the tip you quoted, something like that:
> >
> >   If your use case requires a length limit on character data, or
> compliance
> >   with the SQL standard is important, use "character varying".
> >   Otherwise, you are usually better off with "text".
> 
> I can support that if others think it is valuable.
> 
> 
> 
> The motivating complaint is that we should be encouraging people to use 
> varchar
> (4000) instead of text so external tools can optimize.  If we are not going to
> do that I really don’t see the pointing in changing away from out current
> position of “only use text”.  True length limit requirements for data are 
> rare,
> and better done in constraints along with all other the other constraint that
> may exist for the data.  I believe comments with respect to the SQL standard
> are already present and adequate.

Agreed.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: text fields and performance for ETL

2021-11-05 Thread Laurenz Albe
On Fri, 2021-11-05 at 11:27 -0400, Bruce Momjian wrote:
> On Fri, Nov  5, 2021 at 07:32:12AM -0700, David G. Johnston wrote:
> > On Friday, November 5, 2021, Bruce Momjian  wrote:
> > 
> > 
> >     >
> >     > Perhaps, right before the tip you quoted, something like that:
> >     >
> >     >   If your use case requires a length limit on character data, or
> >     compliance
> >     >   with the SQL standard is important, use "character varying".
> >     >   Otherwise, you are usually better off with "text".
> > 
> >     I can support that if others think it is valuable.
> > 
> > 
> > 
> > The motivating complaint is that we should be encouraging people to use 
> > varchar
> > (4000) instead of text so external tools can optimize.  If we are not going 
> > to
> > do that I really don’t see the pointing in changing away from out current
> > position of “only use text”.  True length limit requirements for data are 
> > rare,
> > and better done in constraints along with all other the other constraint 
> > that
> > may exist for the data.  I believe comments with respect to the SQL standard
> > are already present and adequate.
> 
> Agreed.

+1, so let's leave it as it is.

Yours,
Laurenz Albe