Re: Large scale reliable software system

2023-06-27 Thread Saurabh Agrawal
>  Correct. It’s a tragically wrong piece of folk wisdom that’s pretty
general across web development communities.

So, what's your advice, and is there some book / resource to get upto speed?

Thanks!

On Tue, Jun 27, 2023 at 10:38 AM Guyren Howe  wrote:

> Correct. It’s a tragically wrong piece of folk wisdom that’s pretty
> general across web development communities.
>
> On Jun 26, 2023, at 21:32, Michael Nolan  wrote:
>
> It's not just Ruby, dumb databases are preferred in projects like
> WordPress, Drupal and Joomla, too.
>
> Now, if it's because they're used to using MySQL, well maybe that's
> not so hard to understand.  :-)
>
> On Mon, Jun 26, 2023 at 8:05 PM Guyren Howe  wrote:
>
>
> This is a reasonable answer, but I want to offer a caveat.
>
> Likely because of the influence of the originator of Ruby on Rails, it is
> close to holy writ in the web development community that the database must
> be treated as a dumb data bucket and all business logic must be implemented
> in the Ruby or Python or whatever back end code.
>
> This heuristic is nearly always mostly wrong.
>
> Guyren G Howe
> On Jun 26, 2023 at 17:48 -0700, Adrian Klaver ,
> wrote:
>
> On 6/26/23 16:48, B M wrote:
>
> Dear all,
>
> After greeting,
>
> I taught PostgreSQL myself and developed a small scale
> experimentalsoftware system using PostgreSQL in the back-end.
>
> I would like to know your advices to develop a large scale reliable
> software system using PostgreSQL in the back-end, through which i can
> share the storage with the different system users where they login to
> the system through the web application front-end with different
> passwords and usernames , save the privacy of each user data, improve
> overall system security and performance, achieve fast response, make
> backups and save the stored data from loss. The system will be hosted on
> a cloud.
>
>
> https://www.djangoproject.com/
>
>
> Thank you in advance.
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>
>
>
>


Re: 2 master 3 standby replication

2023-06-27 Thread Mohsin Kazmi
I have also searched on this type of architecture, but
unfortunately PostgreSQL does not have any such solution till date.

Regards,


On Fri, Jun 23, 2023 at 1:40 PM Pavel Stehule 
wrote:

> Hi
>
> pá 23. 6. 2023 v 10:37 odesílatel Atul Kumar 
> napsal:
>
>> Hi,
>>
>> Please help me with the query I raised.
>>
>>
> Currently there is not any community based multi master solution.
>
> Regards
>
> Pavel Stehule
>
>
>>
>> Regards.
>>
>> On Fri, 23 Jun 2023, 00:12 Atul Kumar,  wrote:
>>
>>> Hi,
>>>
>>> Do we have any solution to Configure an architecture of replication
>>> having 2 master nodes and 3 standby nodes replicating the data from any of
>>> the 2 master ?
>>>
>>>
>>> Please let me know if you have any link/ dedicated document.
>>>
>>>
>>>
>>> Regards,
>>> Atul
>>>
>>>
>>>
>>>
>>>

-- 
Best Regards
Mohsin Shah


Re: Large scale reliable software system

2023-06-27 Thread Tony Shelver
On Tue, 27 Jun 2023 at 07:08, Guyren Howe  wrote:

> Correct. It’s a tragically wrong piece of folk wisdom that’s pretty
> general across web development communities.
>
> On Jun 26, 2023, at 21:32, Michael Nolan  wrote:
>
> It's not just Ruby, dumb databases are preferred in projects like
> WordPress, Drupal and Joomla, too.
>
> Now, if it's because they're used to using MySQL, well maybe that's
> not so hard to understand.  :-)
>
> On Mon, Jun 26, 2023 at 8:05 PM Guyren Howe  wrote:
>
>
> This is a reasonable answer, but I want to offer a caveat.
>
> Likely because of the influence of the originator of Ruby on Rails, it is
> close to holy writ in the web development community that the database must
> be treated as a dumb data bucket and all business logic must be implemented
> in the Ruby or Python or whatever back end code.
>
> This heuristic is nearly always mostly wrong.
>
> Guyren G Howe
> On Jun 26, 2023 at 17:48 -0700, Adrian Klaver ,
> wrote:
>
> On 6/26/23 16:48, B M wrote:
>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>
>
>
>  The accepted front-end developer wisdom of treating the DB as a dumb data
store works under conditions, for example the DB will never be accessed
from a different ORM / framework, and where the performance attributes of
using an ORM with 'standard' datastructures are acceptable.

The moment you need to plug in something like reporting tools, or access
from a different system / API / framework / language / ORM or whatever, the
approach not having rules / views / procedures / whatever built into the
database falls apart.

Other things to consider are performance / load / overhead:  we have one
system that involves processing through large amounts of data for reports /
queries.  Shipping all that back through the ORM / db interface (ODBC /
JDBC / psycopg2 / whatever for resolution / filtering on the front end
application where SQL / procedures / views could do that in the DB and just
ship back the required data seems counterproductive.

Tony Shelver

>


When will trusted PL/Python be supported?

2023-06-27 Thread Bowen Shi
Dears,

It seems to me that we have untrusted PL/Python for a long time, but
till now we still do not support trusted plpython.

I'd like to know is supporting trusted PL/Python still in the
schedule? What is the reason for the current lack of support, and do
we have any relevant email discussion?

Regards
Bowen Shi




Aw: When will trusted PL/Python be supported?

2023-06-27 Thread Karsten Hilbert
> It seems to me that we have untrusted PL/Python for a long time, but
> till now we still do not support trusted plpython.
>
> I'd like to know is supporting trusted PL/Python still in the
> schedule? What is the reason for the current lack of support, and do
> we have any relevant email discussion?

AFAIR the consensus was that Python cannot really be (made to be) trusted
in the PG sense which is why plpython/trusted was *removed*.

Karsten




Re: Large scale reliable software system

2023-06-27 Thread Avin Kavish
Well, seeing as postgres isn't designed to serve http requests or to run
general purpose code that doesn't involve databases, which you can express
elegantly in python, to answer OP's question - my vote is on the original
answer - Django. It's got everything out of the box - authentication. file
storage. etc etc.

Once you get the application running you can enhance the database as
necessary.

On Tue, Jun 27, 2023 at 4:19 PM Tony Shelver  wrote:

>
> On Tue, 27 Jun 2023 at 07:08, Guyren Howe  wrote:
>
>> Correct. It’s a tragically wrong piece of folk wisdom that’s pretty
>> general across web development communities.
>>
>> On Jun 26, 2023, at 21:32, Michael Nolan  wrote:
>>
>> It's not just Ruby, dumb databases are preferred in projects like
>> WordPress, Drupal and Joomla, too.
>>
>> Now, if it's because they're used to using MySQL, well maybe that's
>> not so hard to understand.  :-)
>>
>> On Mon, Jun 26, 2023 at 8:05 PM Guyren Howe  wrote:
>>
>>
>> This is a reasonable answer, but I want to offer a caveat.
>>
>> Likely because of the influence of the originator of Ruby on Rails, it is
>> close to holy writ in the web development community that the database must
>> be treated as a dumb data bucket and all business logic must be implemented
>> in the Ruby or Python or whatever back end code.
>>
>> This heuristic is nearly always mostly wrong.
>>
>> Guyren G Howe
>> On Jun 26, 2023 at 17:48 -0700, Adrian Klaver ,
>> wrote:
>>
>> On 6/26/23 16:48, B M wrote:
>>
>>
>>
>> --
>> Adrian Klaver
>> adrian.kla...@aklaver.com
>>
>>
>>
>>
>>  The accepted front-end developer wisdom of treating the DB as a dumb
> data store works under conditions, for example the DB will never be
> accessed from a different ORM / framework, and where the performance
> attributes of using an ORM with 'standard' datastructures are acceptable.
>
> The moment you need to plug in something like reporting tools, or access
> from a different system / API / framework / language / ORM or whatever, the
> approach not having rules / views / procedures / whatever built into the
> database falls apart.
>
> Other things to consider are performance / load / overhead:  we have one
> system that involves processing through large amounts of data for reports /
> queries.  Shipping all that back through the ORM / db interface (ODBC /
> JDBC / psycopg2 / whatever for resolution / filtering on the front end
> application where SQL / procedures / views could do that in the DB and just
> ship back the required data seems counterproductive.
>
> Tony Shelver
>
>>


Re: Helping planner to chose sequential scan when it improves performance

2023-06-27 Thread Jeff Janes
On Sun, Jun 25, 2023 at 3:48 PM David Rowley  wrote:

> On Wed, 14 Jun 2023 at 07:28, Patrick O'Toole 
> wrote:
> > Maybe we are barking up the wrong tree with the previous questions. Are
> there other configuration parameters we should consider first to improve
> performance in situations like the one illustrated?
>
> random_page_cost and effective_cache_size are the main settings which
> will influence plan A vs plan B.  Larger values of
> effective_cache_size will have the planner apply more seq_page_costs
> to the index scan.


Squeezing otherwise-random page costs towards seq_page_costs is what bitmap
scans do, and what large index scans with high pg_stats.correlation do.
But effective_cache_size does something else, it squeezes the per page
costs towards zero, not towards seq_page_costs. This is surely not
accurate, as the costs of locking the buffer mapping partition, finding the
buffer or reading it from the kernel cache if not found, maybe faulting the
buffer from main memory into on-CPU memory, pinning the buffer, and
read-locking it are certainly well above zero, even if not nearly as high
as seq_page_cost.  I'd guess they are truly about 2 to 5 times a
cpu_tuple_cost per buffer.  But zero is what they currently get, there is
no knob to twist to change that.


>   Lower values of effective_cache_size will mean
> more pages will be assumed to cost random_page_cost.
>


Sure, but it addresses the issue only obliquely (as does raising
random_page_cost) not directly.  So the change you need to make to them
will be large, and will likely make other things worse.

Cheers,

Jeff


Query regarding managing Replication

2023-06-27 Thread Ashok Patil
Hello Sir/Madam,

I have to perform task of database replication. For that for testing
purpose I have installed Postgress 14 on two different machine. One is
primary (We can say it as server) and another one is secondary (stand by).

Below steps i have performed

*On Primary*

1. Update Postgres.conf with below settings
wal_level = hot_standby
full_page_writes = on
wal_log_hints = on
max_wal_senders = 6
max_replication_slots = 6
hot_standby = on
hot_standby_feedback = on

2. update pg_hba.conf
host replication 172.20.32.63/32 scram-sha-256
host replication 172.20.32.43/32 scram-sha-256

3. Create a repl_user
psql  -d postgres   -U postgres   -c "CREATE ROLE repl_user LOGIN
REPLICATION ENCRYPTED PASSWORD 'xxx';"

4. create the replication slot using below command
psql -d postgres -U postgres  -c "SELECT * FROM
pg_create_physical_replication_slot('standby1', true);"

*On Standby*

5.Perform a base backup of primary to standby
pg_ctl -D ..\data. stop -mi

6. After stopping the cluster delete the data directory
rmdir /s ..\data

7. run pg_basebackup on the standby to copy primary’s data directory to it.
pg_basebackup -D ..\data -Fp -R -Xs -c fast -l 'initial_clone' -P -v -h
172.20.32.63 y -U repl_user

but here i am getting error as
pg_basebackup: error: connection to server at "172.20.32.63", port 5432
failed: Connection timed out (0x274C/10060)
Is the server running on that host and accepting TCP/IP connections?

Will you please let me know which steps is wrong.

Also is there any proper steps given in any document, if yes, will you
please share that.

Thanks in advance.

Regards,
Ashok


Re: Query regarding managing Replication

2023-06-27 Thread Romain MAZIÈRE

Hello,

You can have a look at the parameter : listen_addresses in the file 
postgresql.conf.

By default the value is localhost.

Regards

Romain MAZIÈRE
romain.mazi...@sigmaz-consilium.fr
+33.535.545.085
+33.781.46.36.96
https://sigmaz-consilium.fr

Le 27/06/2023 à 16:04, Ashok Patil a écrit :

Hello Sir/Madam,

I have to perform task of database replication. For that for testing 
purpose I have installed Postgress 14 on two different machine. One is 
primary (We can say it as server) and another one is secondary (stand by).


Below steps i have performed

*On Primary*

1. Update Postgres.conf with below settings
wal_level = hot_standby
full_page_writes = on
wal_log_hints = on
max_wal_senders = 6
max_replication_slots = 6
hot_standby = on
hot_standby_feedback = on

2. update pg_hba.conf
host replication 172.20.32.63/32  scram-sha-256
host replication 172.20.32.43/32  scram-sha-256

3. Create a repl_user
psql  -d postgres   -U postgres   -c "CREATE ROLE repl_user LOGIN 
REPLICATION ENCRYPTED PASSWORD 'xxx';"


4. create the replication slot using below command
psql -d postgres -U postgres  -c "SELECT * FROM 
pg_create_physical_replication_slot('standby1', true);"


*On Standby*
*
*
5.Perform a base backup of primary to standby
pg_ctl -D ..\data. stop -mi

6. After stopping the cluster delete the data directory
rmdir /s ..\data

7. run pg_basebackup on the standby to copy primary’s data directory 
to it.
pg_basebackup -D ..\data -Fp -R -Xs -c fast -l 'initial_clone' -P -v 
-h 172.20.32.63 y -U repl_user


but here i am getting error as
pg_basebackup: error: connection to server at "172.20.32.63", port 
5432 failed: Connection timed out (0x274C/10060)
        Is the server running on that host and accepting TCP/IP 
connections?


Will you please let me know which steps is wrong.

Also is there any proper steps given in any document, if yes, will you 
please share that.


Thanks in advance.

Regards,
Ashok



Re: When will trusted PL/Python be supported?

2023-06-27 Thread Adrian Klaver

On 6/27/23 04:46, Bowen Shi wrote:

Dears,

It seems to me that we have untrusted PL/Python for a long time, but
till now we still do not support trusted plpython.

I'd like to know is supporting trusted PL/Python still in the
schedule? What is the reason for the current lack of support, and do
we have any relevant email discussion?


https://www.postgresql.org/docs/7.4/release-7-4.html

Release date: 2003-11-17

...

"
Make PL/Python an untr"usted language, now called plpythonu (Kevin 
Jacobs, Tom)


The Python language no longer supports a restricted execution 
environment, so the trusted version of PL/Python was removed. If this 
situation changes, a version of PL/Python that can be used by 
non-superusers will be readded.

"

The situation has not changed.



Regards
Bowen Shi




--
Adrian Klaver
adrian.kla...@aklaver.com





Re: Large scale reliable software system

2023-06-27 Thread Merlin Moncure
On Mon, Jun 26, 2023 at 6:49 PM B M  wrote:

> Dear all,
>
> After greeting,
>
> I taught PostgreSQL myself and developed a small scale experimental
> software system using PostgreSQL in the back-end.
>
> I would like to know your advices to develop a large scale reliable
> software system using PostgreSQL in the back-end, through which i can share
> the storage with the different system users where they login to the system
> through the web application front-end with different passwords and
> usernames , save the privacy of each user data, improve overall system
> security and performance, achieve fast response, make backups and save the
> stored data from loss. The system will be hosted on a cloud.
>
> Thank you in advance.
>

* your sql is code, and treat it as such, check it into git etc
* robust deployment strategy is essential to scaling team
* write a lot of tests
* become intimate with pg_stat_statements
* keep your transactions as short as possible while preserving safety
* avoid developers who advocate for keeping business logic out of the
database religiously (controversial)
* try to avoid assumptions that only one technical stack interacts with
your database
* do not waste time making ERDs use a tool that generates them (i like
schemaspy)
* test your DR strategy before disaster strikes
* think about security model up front

merlin


Re: Large scale reliable software system

2023-06-27 Thread Adrian Klaver

On 6/27/23 07:58, Merlin Moncure wrote:



On Mon, Jun 26, 2023 at 6:49 PM B M > wrote:


Dear all,

After greeting,

I taught PostgreSQL myself and developed a small scale
experimentalsoftware system using PostgreSQL in the back-end.

I would like to know your advices to develop a large scale reliable
software system using PostgreSQL in the back-end, through which i
can share the storage with the different system users where they
login to the system through the web application front-end with
different passwords and usernames , save the privacy of each user
data, improve overall system security and performance, achieve fast
response, make backups and save the stored data from loss. The
system will be hosted on a cloud.

Thank you in advance.


* your sql is code, and treat it as such, check it into git etc
* robust deployment strategy is essential to scaling team
* write a lot of tests
* become intimate with pg_stat_statements
* keep your transactions as short as possible while preserving safety
* avoid developers who advocate for keeping business logic out of the 
database religiously (controversial)
* try to avoid assumptions that only one technical stack interacts with 
your database
* do not waste time making ERDs use a tool that generates them (i like 
schemaspy)

* test your DR strategy before disaster strikes
* think about security model up front



+1



merlin



--
Adrian Klaver
adrian.kla...@aklaver.com





typical active table count?

2023-06-27 Thread Jeremy Schneider
Question for other PostgreSQL users

On your moderately busy DB, how many different tables might receive at
least one change/DML during a 10 second window?

10? 50? 100? More? Ballpark guess off the top of your head.

I'm in a discussion & there's questions about whether it's unusual to have
more than 10 or so. The answer isn't clear to me.

Probably worthwhile to call out partitioning explicitly (ie. if this
factors into an answer then mention that fact)

-Jeremy


-- 
http://about.me/jeremy_schneider


typical active table count?

2023-06-27 Thread Jeremy Schneider
Question for other PostgreSQL users

On your moderately busy DB, how many different tables might receive at
least one change/DML during a 10 second window?

10? 50? 100? More? Ballpark guess off the top of your head.

I'm in a discussion & there's questions about whether it's unusual to have
more than 10 or so. The answer isn't clear to me.

Probably worthwhile to call out partitioning explicitly (ie. if this
factors into an answer then mention that fact)

-Jeremy

-- 
http://about.me/jeremy_schneider


Re: When will trusted PL/Python be supported?

2023-06-27 Thread Tom Lane
Adrian Klaver  writes:
> On 6/27/23 04:46, Bowen Shi wrote:
>> I'd like to know is supporting trusted PL/Python still in the
>> schedule? What is the reason for the current lack of support, and do
>> we have any relevant email discussion?

https://www.postgresql.org/message-id/flat/20030525224833.GO31407%40tummy.com

https://www.postgresql.org/message-id/flat/Pine.LNX.4.44.0306182125590.17051-10%40penguin.theopalgroup.com

There's been a little bit of discussion of using RestrictedPython:

https://www.postgresql.org/message-id/flat/9d1f8d830808041008v50104fd8p6181d5ddce85a6a%40mail.gmail.com

but it doesn't seem to have gone anywhere.

regards, tom lane




Re: When will trusted PL/Python be supported?

2023-06-27 Thread Jeffrey Walton
On Tue, Jun 27, 2023 at 12:17 PM Tom Lane  wrote:
>
> Adrian Klaver  writes:
> > On 6/27/23 04:46, Bowen Shi wrote:
> >> I'd like to know is supporting trusted PL/Python still in the
> >> schedule? What is the reason for the current lack of support, and do
> >> we have any relevant email discussion?
>
> https://www.postgresql.org/message-id/flat/20030525224833.GO31407%40tummy.com
>
> https://www.postgresql.org/message-id/flat/Pine.LNX.4.44.0306182125590.17051-10%40penguin.theopalgroup.com
>
> There's been a little bit of discussion of using RestrictedPython:
>
> https://www.postgresql.org/message-id/flat/9d1f8d830808041008v50104fd8p6181d5ddce85a6a%40mail.gmail.com
>
> but it doesn't seem to have gone anywhere.

Off-topic, but I am amazed folks like Tom and Adrian remember those
conversations and can actually find them in the archives.

Jeff




Requesting Information for GSSAPI

2023-06-27 Thread Lingesan Jeyapandy
Team,

We have configured postgres GSSAPI setup on Linux server.   We have huge domain 
AD users in our org.

But we are looking to limit access only to setup AD distributed groups. Is 
there any way to control access only at AD distribution list user can only 
access gssapi configured postgres environment.

Our postgres version is 15.


Regards,

Lingesan Jeyapandy




Re: typical active table count?

2023-06-27 Thread Ben Chobot
We certainly have databases where far more than 100 tables are updated 
within a 10 second period. Is there a specific concern you have?


Jeremy Schneider wrote on 6/27/23 9:01 AM:

Question for other PostgreSQL users

On your moderately busy DB, how many different tables might receive at 
least one change/DML during a 10 second window?


10? 50? 100? More? Ballpark guess off the top of your head.

I'm in a discussion & there's questions about whether it's unusual to 
have more than 10 or so. The answer isn't clear to me.


Probably worthwhile to call out partitioning explicitly (ie. if this 
factors into an answer then mention that fact)


-Jeremy

--
http://about.me/jeremy_schneider




Re: pb with join plan

2023-06-27 Thread Kirk Wolak
On Wed, Jun 21, 2023 at 12:10 PM Marc Millas  wrote:

> Marc MILLAS
> On Wed, Jun 21, 2023 at 12:43 PM Tomas Vondra <
> tomas.von...@enterprisedb.com> wrote:
>
>> On 6/21/23 00:26, Marc Millas wrote:
>> >
>> >
>> > On Tue, Jun 20, 2023 at 11:19 PM David Rowley > > > wrote:
>> >
>> > On Wed, 21 Jun 2023 at 08:34, Marc Millas > > > wrote:
>> > >
>> > > On Tue, Jun 20, 2023 at 10:14 PM David Rowley
>> > mailto:dgrowle...@gmail.com>> wrote:
>> > >>
>> > >> On Wed, 21 Jun 2023 at 07:42, Marc Millas <
>> marc.mil...@mokadb.com
>> > > wrote:
>> > >> > But if I do the same with clause one OR clause 2, I have to
>> > kill the request after an hour, seeing the filesystem showing more
>> > than 140 Mb of increased usage.
>>
>> It's a bit weird the "victor" table is joined seemingly without any join
>> conditions, leading to a cross join (which massively inflates the cost
>> for joins above it). Maybe the anonymized plan mangles it somehow.
>>
>
> So I did try to simplify my pb.
> I create a table with the result of the first 3 joins.
> That table do have 15M lines. all tables have been vacuum analyze
>
> Now if I do an explain analyze of a simple join between that table and my
> original table 4
> using a simple = clause, I get a result in one second (around). and the
> planner guesses for rows seems in line with the observed values .
> if I use a substr(table1.a)= table2.b, the explain analyze get a result in
> 21 seconds and the planner estimates a 65M rows result set while the
> observed is 330 k rows
> so here its 20 times slower and the discrepency between planner rows guess
> and reality is a 200 ratio.
>
> Now, if I try an explain analyze with join on a=b or substr(c)=d or e=f
> then... I kill the query after a quarter an hour without any answer.
> if I try to just explain the query, the planner rows guess becomes more
> than 2 Billions
> the extremely simple query and plan are here, without automatic obfuscation
> https://explain.depesz.com/s/b8Ll
>

First, I am not sure why you cannot send us the explain analyze.  But
moving on...

substr() is a function that mutilates a value such that the index becomes
useless...
If you are looking for the LEFT() of the value, then an INDEX can be used.
I have COLLATION "C" and when I query:
WHERE fld like  fld_b||"%"

The optimizer constructs a query that uses the index on "fld"...
But when I try:

WHERE fld like CONCAT_WS("", fld_b,"%")
It doesn't use the index version. (because the function call is too
complicated to see through)

When using functions in where clauses, indexes either have to be made on
those functions, or often times the index cannot be used.

BTW, I noted the COLLATION.  That turned out to be important, because my
first DB test did NOT use that collation, and the result
of the LIKE was the non-indexed version...

I hope you find something useful in here.

Also, WHERE fld <> 72...  (unless you have a heavily skewed set of
statistics, I read that as.  SCAN everything, and check later,
because this should filter very few rows), whereas fld = 72 will be
blazingly fast.

Kirk


Re: bug or lacking doc hint

2023-06-27 Thread Kirk Wolak
On Mon, Jun 26, 2023 at 4:21 PM Marc Millas  wrote:

> On Mon, Jun 26, 2023 at 4:05 PM Ron  wrote:
>
>> On 6/26/23 07:22, Marc Millas wrote:
>>
>> On Mon, Jun 26, 2023 at 5:47 AM Avin Kavish  wrote:
>>
>>> Sounds like the problem you are having is, the server is running out of
>>> temporary resources for the operation that users are trying to do. So
>>> according to Tom, on the postgres side, the operation cannot be optimized
>>> further.
>>>
>>> I think you have few choices here,
>>>   - See if increasing the resources of the server will allow them to run
>>> the operation
>>>   - Ask users not to do that operation
>>>   - Use a extension like citus to scale horizontally
>>>
>>> But I'm thinking why a massively inefficient join is needed in the first
>>> place. Shouldn't joins be for following keyed relationships. So ideally a
>>> unique indexed column, but at the very least an indexed column. Why is a
>>> join required on a dynamically calculated substring? Can it be made into a
>>> static computed value and indexed? Substring sounds like an op that should
>>> be in the filter stage.
>>>
>>> Can you describe your data model? Maybe we can give some specific advice.
>>>
>>
>> There is a set of big tables containing fine grain health data. The DB is
>> accessed by people doing research on various kind of sicknesses. So, by
>> nature, all columns (I mean ALL) can be used for every kind of SQL
>> including, obviously, lots of intricated joins.
>>
>>
>> This sounds like the kind of problem normally solved by data warehouses.
>> Is your schema designed like a DW, or is it in 3NF?
>>
>
> it's, indeed, some kind of dwh.
> but it's neither a star nor a snowflake .at least not used like those
> standard schemas.
> in one of the big tables (10 billions+ rows), there is around 60 columns,
> describing one event: some guy have had a given sickness, got a given medoc
> etc
> The pb is that its not one simple event with a set of dimensions,
> the  people  using that db are NOT looking for an event according to
> various criterias, they are looking for correlations between each of the
> 60+ columns.
> As a consequence very few indexes are used as most requests end in some
> kind of huge sequential reads.
> The machine was built for this and perform well, but some requests are
> posing pb and we must find solutions/workaround.
> one of the users did rewrite the request using  a select distinct matched
> with left join(s) and table.a is not null set of conditions.
> looks crazy, but does work. I'll get the request tomorrow.
>
Marc,
  Something we did for cases like this... We actually created views that
handled the complex joining.
Then we trained users to select from the views (as opposed to the tables).

  The upside of this approach is that you can really optimize the views for
the joins.  And then the results get filtered by
the where clauses they provide.

  Usually a DBA or heavy DB user creates the views, and gets the permission
to create the indexes that speed them up.

  The DOWNSIDE of this approach is that if you are constantly adding
columns, you have to recreate your views.
And please be careful with Views that depend on views.  While it works, it
creates downsides when you attempt to
change a view.  Often having to drop all of the downstream views, and
recreate them.

HTH

Kirk

>


Re: typical active table count?

2023-06-27 Thread Jeremy Schneider
On 6/27/23 9:32 AM, Ben Chobot wrote:
> We certainly have databases where far more than 100 tables are updated
> within a 10 second period. Is there a specific concern you have?
> 

Thank Ben, not a concern but I'm trying to better understand how common
this might be. And I think sharing general statistics about how people
use PostgreSQL is a great help to the developers who build and maintain it.

One really nice thing about PostgreSQL is that two quick copies of
pg_stat_all_tables and you can easily see this sort of info.

If you have a database where more than 100 tables are updated within a
10 second period - this seems really uncommon to me - I'm very curious
about the workload.

For example:

1) Is the overall total number of tables for this database in the
thousands, 10s of thousands or 100s of thousands?

2) How many CPUs or cores does the server have?

3) Are you using partitions and counting each one? What's the number if
you count each partitioned table as a single table?

4) Would you characterize this database as SaaS, ie. many copies of a
similar schema?  Or is it one very large schema of many different tables?

-Jeremy


-- 
http://about.me/jeremy_schneider





regex failing

2023-06-27 Thread Zahir Lalani
Hi All

Got a weird one. I am using the regex below to case match numeric only values.

'^([0-9]+[.]?[0-9]*)$'

This works well by and large but not for a value like “1234:567”. This seems to 
match positive and then fails due to not being an actual number in the 
subsequent cast.

Any ideas? (PG13)

Z



Re: typical active table count?

2023-06-27 Thread Ron

On 6/27/23 13:47, Jeremy Schneider wrote:

On 6/27/23 9:32 AM, Ben Chobot wrote:

We certainly have databases where far more than 100 tables are updated
within a 10 second period. Is there a specific concern you have?


Thank Ben, not a concern but I'm trying to better understand how common
this might be. And I think sharing general statistics about how people
use PostgreSQL is a great help to the developers who build and maintain it.

One really nice thing about PostgreSQL is that two quick copies of
pg_stat_all_tables and you can easily see this sort of info.

If you have a database where more than 100 tables are updated within a
10 second period - this seems really uncommon to me - I'm very curious
about the workload.


100 tables updates just means /possibly complicated schema/, not necessarily 
high volume.


More important is the number of tables updated in a single transaction.  Are 
you updating (really /modifying/: inserts, updates, deletes) lots of rows in 
all 100 tables in a single transaction, or are multiple users performing one 
of 20 separate transactions, each modifying 5 tables?  Because that bakes a 
huge difference.


And honestly, 100 tables in 10 seconds is 10 tables/second.  If each gets 
one insert, that's a laughably slow transaction rate.  (Unless of course 
there's 85 indices per table, and foreign keys don't have supporting indices.)



For example:

1) Is the overall total number of tables for this database in the
thousands, 10s of thousands or 100s of thousands?

2) How many CPUs or cores does the server have?

3) Are you using partitions and counting each one? What's the number if
you count each partitioned table as a single table?

4) Would you characterize this database as SaaS, ie. many copies of a
similar schema?


Why not multiple databases with the same definition?


   Or is it one very large schema of many different tables?



--
Born in Arizona, moved to Babylonia.

Re: typical active table count?

2023-06-27 Thread Adrian Klaver

On 6/27/23 11:47 AM, Jeremy Schneider wrote:

On 6/27/23 9:32 AM, Ben Chobot wrote:

We certainly have databases where far more than 100 tables are updated
within a 10 second period. Is there a specific concern you have?



Thank Ben, not a concern but I'm trying to better understand how common
this might be. And I think sharing general statistics about how people
use PostgreSQL is a great help to the developers who build and maintain it.


Given that Postgres is used up into the petabyte range, it is reasonable 
to assume that it handles dealing with large multiples of tables. This 
of course is based on sufficient hardware and proactive tuning. 
Personally I think you are getting into the range of premature 
optimization. There are so many ways to use Postgres that unless you 
provide a detailed example of how you want to use it the survey you seem 
to be requesting will likely have more cases that do not apply then 
those that do. To me the way forward is to create a plan for what you 
want accomplish and then ask specific questions based on that or build a 
test/dev setup that institutes the plan and deal with the diversions, if 
any,  from the plan.



-Jeremy





--
Adrian Klaver
adrian.kla...@aklaver.com




Re: regex failing

2023-06-27 Thread David G. Johnston
On Tue, Jun 27, 2023 at 11:59 AM Zahir Lalani 
wrote:

> Hi All
>
>
>
> Got a weird one. I am using the regex below to case match numeric only
> values.
>
>
>
> '^([0-9]+[.]?[0-9]*)$'
>
>
>
> This works well by and large but not for a value like “1234:567”. This
> seems to match positive and then fails due to not being an actual number in
> the subsequent cast.
>
>
>
Works (returns false) here:
select '1234:567' ~ '^([0-9]+[.]?[0-9]*)$';

https://mail.google.com/mail/u/0/?tab=mm&zx=y3hfqt48pyg7#all/FMfcgzGsnLNNXcRVCVNpjQhGknMSVLKn

David J.


Re: regex failing

2023-06-27 Thread Steve Baldwin
Probably not much help but it seems to work ok for me (unless I'm doing
something stupid). You didn't actually say how/where you were doing the
regex match, but the '~' operator seems to work ok.

[~/git/caas-sqs-consumer] psql
psql (15.3 (Homebrew), server 13.7)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384,
compression: off)
Type "help" for help.

bcaas=> select '1234:56' ~ '^([0-9]+[.]?[0-9]*)$';
 ?column?
--
 f
(1 row)

bcaas=> select '1234.56' ~ '^([0-9]+[.]?[0-9]*)$';
 ?column?
--
 t
(1 row)

On Wed, Jun 28, 2023 at 4:59 AM Zahir Lalani 
wrote:

> Hi All
>
>
>
> Got a weird one. I am using the regex below to case match numeric only
> values.
>
>
>
> '^([0-9]+[.]?[0-9]*)$'
>
>
>
> This works well by and large but not for a value like “1234:567”. This
> seems to match positive and then fails due to not being an actual number in
> the subsequent cast.
>
>
>
> Any ideas? (PG13)
>
>
>
> Z
>
>
>


Re: regex failing

2023-06-27 Thread Sergey Fukanchik
On Tue, 27 Jun 2023 at 14:59, Zahir Lalani  wrote:
>
> Hi All
>
>
>
> Got a weird one. I am using the regex below to case match numeric only values.
>
>
>
> '^([0-9]+[.]?[0-9]*)$'
>
>
>
> This works well by and large but not for a value like “1234:567”. This seems 
> to match positive and then fails due to not being an actual number in the 
> subsequent cast.
>
>
>
> Any ideas? (PG13)
>
>
>
> Z
>
>
When executed from psql it works ok. What tool did you use to run your
query? I would suspect that the tool does interpret square brackets
somehow and your regex actually becomes  '^([0-9]+.?[0-9]*)$' which
matches any character between digits.

If you enable query logging in your system - either uncomment
log_statement = 'all' in postgresql.conf  or run ALTER SYSTEM SET
log_statement = 'all'; and restart, then you will find the actual
query in server's log file:
2023-06-27 16:12:35.221 EDT [23580] LOG:  statement: select '1234:56'
~ '^([0-9]+[.]?[0-9]*)$';

make sure it remains the same.
--
Sergey




Re: pb with join plan

2023-06-27 Thread Marc Millas
On Tue, Jun 27, 2023 at 8:12 PM Kirk Wolak  wrote:

> On Wed, Jun 21, 2023 at 12:10 PM Marc Millas 
> wrote:
>
>> Marc MILLAS
>> On Wed, Jun 21, 2023 at 12:43 PM Tomas Vondra <
>> tomas.von...@enterprisedb.com> wrote:
>>
>>> On 6/21/23 00:26, Marc Millas wrote:
>>> >
>>> >
>>> > On Tue, Jun 20, 2023 at 11:19 PM David Rowley >> > > wrote:
>>> >
>>> > On Wed, 21 Jun 2023 at 08:34, Marc Millas >> > > wrote:
>>> > >
>>> > > On Tue, Jun 20, 2023 at 10:14 PM David Rowley
>>> > mailto:dgrowle...@gmail.com>> wrote:
>>> > >>
>>> > >> On Wed, 21 Jun 2023 at 07:42, Marc Millas <
>>> marc.mil...@mokadb.com
>>> > > wrote:
>>> > >> > But if I do the same with clause one OR clause 2, I have to
>>> > kill the request after an hour, seeing the filesystem showing more
>>> > than 140 Mb of increased usage.
>>>
>>> It's a bit weird the "victor" table is joined seemingly without any join
>>> conditions, leading to a cross join (which massively inflates the cost
>>> for joins above it). Maybe the anonymized plan mangles it somehow.
>>>
>>
>> So I did try to simplify my pb.
>> I create a table with the result of the first 3 joins.
>> That table do have 15M lines. all tables have been vacuum analyze
>>
>> Now if I do an explain analyze of a simple join between that table and my
>> original table 4
>> using a simple = clause, I get a result in one second (around). and the
>> planner guesses for rows seems in line with the observed values .
>> if I use a substr(table1.a)= table2.b, the explain analyze get a result
>> in 21 seconds and the planner estimates a 65M rows result set while the
>> observed is 330 k rows
>> so here its 20 times slower and the discrepency between planner rows
>> guess and reality is a 200 ratio.
>>
>> Now, if I try an explain analyze with join on a=b or substr(c)=d or e=f
>> then... I kill the query after a quarter an hour without any answer.
>> if I try to just explain the query, the planner rows guess becomes more
>> than 2 Billions
>> the extremely simple query and plan are here, without automatic
>> obfuscation
>> https://explain.depesz.com/s/b8Ll
>>
>
> First, I am not sure why you cannot send us the explain analyze.  But
> moving on...
>
Kirk, the explain analyze, with the SQL query is directly accessible  on
the explain.depesz link .

>
> substr() is a function that mutilates a value such that the index becomes
> useless...
> If you are looking for the LEFT() of the value, then an INDEX can be used.
> I have COLLATION "C" and when I query:
> WHERE fld like  fld_b||"%"
>

there are NO indexes on those columns. One of the reasons is that the
simplest index on one column is 380 GB on disk
So to put indexes on each criteria, I must add around 1 TB of disk  just
for ttt
the full scan is not a problem. Its fast.. The problem is the nested loop
which do compare each of the 15M lines of ttt to each of the 30K lines of
inc_pha_r.
its an operation done 450 000 000 000 times. so if each comparison is 1
microsecond long, the nested loop is 125 hours long.
And I am not sure that the comparison is done in 1 microsecond...

>
> The optimizer constructs a query that uses the index on "fld"...
> But when I try:
>
> WHERE fld like CONCAT_WS("", fld_b,"%")
> It doesn't use the index version. (because the function call is too
> complicated to see through)
>
> When using functions in where clauses, indexes either have to be made on
> those functions, or often times the index cannot be used.
>
> BTW, I noted the COLLATION.  That turned out to be important, because my
> first DB test did NOT use that collation, and the result
> of the LIKE was the non-indexed version...
>
> I hope you find something useful in here.
>
Thanks for trying

>
> Also, WHERE fld <> 72...  (unless you have a heavily skewed set of
> statistics, I read that as.  SCAN everything, and check later,
> because this should filter very few rows), whereas fld = 72 will be
> blazingly fast.
>
> Kirk
>


Re: typical active table count?

2023-06-27 Thread Jeremy Schneider
On 6/27/23 12:08 PM, Ron wrote:
> On 6/27/23 13:47, Jeremy Schneider wrote:
>> On 6/27/23 9:32 AM, Ben Chobot wrote:
>>> We certainly have databases where far more than 100 tables are updated
>>> within a 10 second period. Is there a specific concern you have?
>>>
>> Thank Ben, not a concern but I'm trying to better understand how common
>> this might be. And I think sharing general statistics about how people
>> use PostgreSQL is a great help to the developers who build and maintain it.
>>
>> One really nice thing about PostgreSQL is that two quick copies of
>> pg_stat_all_tables and you can easily see this sort of info.
>>
>> If you have a database where more than 100 tables are updated within a
>> 10 second period - this seems really uncommon to me - I'm very curious
>> about the workload.
> 
> 100 tables updates just means /possibly complicated schema/, not
> necessarily high volume.
> 
> ...
> 
> And honestly, 100 tables in 10 seconds is 10 tables/second.  If each
> gets one insert, that's a laughably slow transaction rate.  (Unless of
> course there's 85 indices per table, and foreign keys don't have
> supporting indices.)

I don't think the math actually works this way on highly concurrent
systems. In fact, this morning I connected with a coworker who works on
Amazon fulfillment center DBs and there was almost no difference in the
number of tables with insert/update/delete regardless of whether you
looked at a 10 second window or a 2 second window. I was also able to
chat with another coworker at Amazon who got numbers from a couple of
their PG databases, and connected w one person on slack at a different
company who passed along numbers, and got a few emails from Oracle folks.

The numbers reported back to me ranged from 29 to over a hundred.
Obviously there are also lots of small databases behind wordpress
websites with much less activity, but I found this to be an interesting
measure of some respectably busy systems.

The original context was a conversation related to logical replication
of DB changes.

But then I got interested in the general question and topic - and
someone on the Oracle side mentioned system tables which is a really
good point that hadn't occurred to me yet. The original conversation was
concerned with user tables and not system ones, but there would be a
fair amount of ongoing system table activity too.

Besides partitioning, another interesting dimension of the conversation
has been thinking about different categories of workloads. For example:
SaaS or multitenant applications with many copies of a similar schema,
ISVs, ERPs, or large enterprise databases with lots of development
history. All of these categories can easily ramp up the counts.

I'm still interested in more data - if anyone reading this can grab a
couple snapshots of pg_stat_all_tables and report back numbers for a 10
second window and a 2 second window, that would be amazing!

-Jeremy


-- 
http://about.me/jeremy_schneider





connect postgres using url

2023-06-27 Thread Atul Kumar
HI,

Could you please help me in telling how I can connect my postgres database
using http url ?

What steps need to be taken care of ?

Please share any referral link, if possible.



Regards,
Atul


Re: connect postgres using url

2023-06-27 Thread Adrian Klaver

On 6/27/23 16:14, Atul Kumar wrote:

HI,

Could you please help me in telling how I can connect my postgres 
database using http url ?


Are you actually talking about a Web based URL or a URI as in?:

https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNSTRING

If you are talking about a URL then that would be to some Web service 
that is in the background connecting to the database. How that is done 
is going to depend on the Web service.




What steps need to be taken care of ?

Please share any referral link, if possible.



Regards,
Atul


--
Adrian Klaver
adrian.kla...@aklaver.com





Re: connect postgres using url

2023-06-27 Thread Adrian Klaver

On 6/27/23 16:30, Adrian Klaver wrote:

On 6/27/23 16:14, Atul Kumar wrote:

HI,

Could you please help me in telling how I can connect my postgres 
database using http url ?


Are you actually talking about a Web based URL or a URI as in?:

https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNSTRING

If you are talking about a URL then that would be to some Web service 
that is in the background connecting to the database. How that is done 
is going to depend on the Web service.



Realized that this was not good explanation, especially as a URL is 
specialized case of an URI. What I should have been asking is do you 
want a connection a URI that connects directly to the database or a URI 
in the form of a URL that points at Web resource that in turn, in the 
background, connects to the database?






What steps need to be taken care of ?

Please share any referral link, if possible.



Regards,
Atul




--
Adrian Klaver
adrian.kla...@aklaver.com





RE: regex failing

2023-06-27 Thread Zahir Lalani
> -Original Message-
> >
> > Hi All
> >
> >
> >
> > Got a weird one. I am using the regex below to case match numeric only
> values.
> >
> >
> >
> > '^([0-9]+[.]?[0-9]*)$'
> >
> >
> >
> > This works well by and large but not for a value like “1234:567”. This seems
> to match positive and then fails due to not being an actual number in the
> subsequent cast.
> >
> >
> >
> > Any ideas? (PG13)
> >
> >
> >
> > Z
> >
> >

My apologies for wasting your time ☹

The issue was in an unrelated field!!

Thank you

Z