Re: Enforce primary key on every table during dev?
Jeremy Finzel writes: > We want to enforce a policy, partly just to protect those who might forget, > for every table in a particular schema to have a primary key. This can't > be done with event triggers as far as I can see, because it is quite > legitimate to do: > > BEGIN; > CREATE TABLE foo (id int); > ALTER TABLE foo ADD PRIMARY KEY (id); > COMMIT; > > It would be nice to have some kind of "deferrable event trigger" or some > way to enforce that no transaction commits which added a table without a > primary key. > I think you would be better off having an automated report which alerts you to tables lacking a primary key and deal with that policy through other means. Using triggers in this way often leads to unexpected behaviour and difficult to identify bugs. The policy is a management policy and probably should be dealt with via management channels rather than technical ones. Besides, the likely outcome will be your developers will just adopt the practice of adding a serial column to every table, which in itself doesn't really add any value. Tim -- Tim Cross
Re: Enforce primary key on every table during dev?
Ron Johnson writes: > On 03/01/2018 02:08 PM, marcelo wrote: >> >> >> On 01/03/2018 16:42 , Ron Johnson wrote: >>> On 03/01/2018 01:11 PM, marcelo wrote: >>>> >>>> On 01/03/2018 16:00 , Ron Johnson wrote: >>> [snip] >>>>> If your only unique index is a synthetic key, then you can insert the >>>>> same "business data" multiple times with different synthetic keys. >>>>> >>>>> >>>>> -- >>>>> Angular momentum makes the world go 'round. >>>> IMHO, business logic can and must preclude "garbage insertion". Except >>>> you are inserting data directly to database using SQL, any n-tier >>>> architecture will be checking data validity. >>> >>> Any n-tier architecture that's bug-free. >>> >> Do you know about unit testing? > > Way Back When Dinosaurs Still Roamed The Earth and I first learned the > trade, the focus was on proper design instead of throwing crud against the > wall and hoping tests caught any bugs. Because, of course, unit tests are > only as good as you imagination in devising tests. +1. And a good test of your underlying data model is whether you can identify a natural primary key. If you can't, chances are your model is immature/flawed and needs more analysis. -- Tim Cross
Re: Best options for new PG instance
David Gauthier writes: > Hi: > > I'm going to be requesting a PG DB instance (v9.6.7) from an IT dept in a > large corp setting. I was wondering if anyone could comment on the > pros/cons of getting this put on a virtual machine vs hard metal ? Locally > mounted disk vs nfs ? > This is a hard question to answer as there are too many variables. However, I would say that my experience has been that most large organisations are pretty much using VMs for everything, so asking for something on a 'bare metal' basis is likely to result in your request needing special attention and justification. On the other hand, if you make a standard request, it will likely be fulfilled more rapidly. Sys admins are likely to be resistant to a bare metal setup if their infrastructure is based around VMs due tot he additional work and maintenance overheads. All our databases are running on VMs. Some of them are storing fairly large amounts of data (i.e. one application stores large amounts of weather data - adding 650 million records a day with a consolidation after every 100 days. Currently, the DB is using about 6Tb, so not huge, but not insignificant). Disk storage is via SAN. Getting the right performance will require tweaking of memory, cpus etc. The good news is that adding additional memory and CPUs is relatively trivial. For our situation, VMs have been fine and there has been some advantages with SAN storage infrastructure, such as fast snapshots for backups etc. In general, I usually find it best to work with the system admins and follow their recommendations. Provide them with details of your performance requirements and where you feel resource demands may peak and let them propose what they feel would be best suited given whatever infrastructure they have. Tim -- Tim Cross
Re: JDBC connectivity issue
My interpretation of that error is slightly different. I think what it is saying is that for the first IP address, it thinks the port number is the full string "5432,10.16.10.13:5432" and fails when it tries to convert that into a port number. Are you positive the URL format you are trying to use is supported by the JDBC driver your using? On 9 March 2018 at 08:12, chris wrote: > Hi, > > I have a JDBC temping to connect to 2 postgres nodes in a hot standby > configuration. Running postgres 9.4 on centos6. > > What we are trying to accomplish is in an event of a failover, to first > try to connect to the master. If that fails to then reach out to the > standby. > > I looked online and found the suggested way to do this, but it's not > working. > > This is the JDBC line we have: > > *jdbc.url=jdbc:postgresql://10.16.10.12:5432 > <http://10.16.10.12:5432>,10.16.10.13:5432/app_db > <http://10.16.10.13:5432/app_db>* > > Here is the error we are getting: > > *2018-03-07 13:54:36, 994 ERROR:com.zaxxer.hikari.util.PropertyElf - > failed to set properly port number on target class > org.postgresql.ds.PGSimpleDataSource java.langNumberFormatException:for > input string: "5432,10.16.10.13:5432 <http://10.16.10.13:5432>"* > > It looks like the first IP address has disappeared or is not set right. > > Thanks in advance for the help. > > > > -- regards, Tim -- Tim Cross
Prepared statements
a simple question I wasn't able to get a clear answer on It is general best practice to use prepared statements and parameters rather than concatenated strings to build sql statements as mitigation against SQL injection. However, in some databases I've used, there is also a performance advantage. For example, the planner may be able to more easily recognise a statement and reuse an existing plan rather than re-planning the query. I wasn't sure what the situation is with postgres - is there a performance benefit in using prepared statements over a query string where the values are just concatenated into the string? thanks, Tim -- Tim Cross
Re: Postgresql 9.3 Server will not start after Ubuntu Upgrade
Ken Beck writes: > I recently upgraded my OS from Ubuntu 14.04 LTS to 16.04 LTS and since, > postgresql refuses to re-start. I am concerned that if I poke around too > much, I may lose my old data. I have consulted various web postings here > and in ubuntu forums, and have not found an obvious solution. But please > excuse me, I am not a Linux expert, and I had a friend help me with > setup issues several years ago. They are no longer available, and I am > having trouble getting things to run. > This is likely a Ubuntu/Debian related issue. I vaguely remember that following an upgrade of a Ubuntu system, the upgrade issued a warning about additional steps needed to be taken in order to upgrade postgres. I'm pretty sure that the default postgres for 16.04 was 9.6, so it is possible that you need to upgrade your postgres installation to 9.6 or take other action to continue using 9.3 It might be worthwhile looking in /usr/share/doc/postgresql-common and see if there is some relevant documentation about upgrading etc. I also suspect there may be some startup changes you will need to verify. I think Ubuntu 16.04 uses systemd while 14.04 used upstart? After a Ubuntu upgrade, there is usually an upgrade log you can also check to see if anything failed or provided warnings regarding additional actions required. HTH Tim -- Tim Cross
Re: Rationale for aversion to the central database?
On 9 April 2018 at 07:39, Guyren Howe wrote: > I am a Rails developer at a medium-large size company. I’ve mostly worked > at smaller companies. I’ve some exposure to other web development > communities. > > When it comes to databases, I have universally encountered the attitude > that one should treat the database as a dumb data bucket. There is a *very* > strong aversion to putting much of any business logic in the database. I > encounter substantial aversion to have multiple applications access one > database, or even the reverse: all abstraction should be at the application > layer. > > My best theory is that these communities developed at a time when Windows > was more dominant, and just generally it was *significantly* easier to use > MySQL than Postgres for many, particularly new, developers. And it is > pretty reasonable to adopt an aversion to sophisticated use of the database > in that case. > > This attitude has just continued to today, even as many of them have > switched to Postgres. > > This is only a hypothesis. I am now officially researching the issue. I > would be grateful for any wisdom from this community. > > > Aside: it is rare to find a situation in life or anywhere where one widely > adopted thing is worse in *every way* than another thing, but this > certainly was and largely still continues to be the case when one compares > MySQL and Postgres. So why do folks continue to use MySQL? I find this > mystifying. > It is interesting looking at many of the responses to this thread. I see a lot at each extreme - either put lots of stuff inthe database or use the database as just a 'dumb' store and put everything in the application code. I think the real solution is somewhere in the middle. I've lost count of the number of applications where the application code is jumping through all sorts of hoops to do basic data operations which would be far better handled in the database and can easily be done using just ANSI SQL (so is portable). It drives me crazy when people tell me the database is slow when they are doing 'select * from table' and then filtering and sorting the data in their application. Applications should take advantage of what the database does well. Unfortunately, I see far too many developers who are uncomfortable with SQL, don't know how to structure their queries efficiently (lots of nested sub queries etc, cartesian joins etc). At the other extreme is those who tend to put almost everything in the database - including business policy and business 'rules' which are probably better categorised as current business strategy. First, I think it is nearly always a mistake to try and enforce business policy with technology. Policies change too often and should be dealt with via administrative measures. Technology can certainly be used to raise alerts regarding policy breeches, but should not be used to enforce policies. Likewise, some business rules are more akin to strategies than being actual static rules and can change with little notice, rhyme or reason. These probably should not be 'hard coded' into the database. Other rules are more stable and unlikely to ever change and are likely good candidates for being encoded in the database as either functions or constraints. I do feel that often the big problem is with management who fail to understand the time and effort needed to develop a good data model. Developers are put under pressure to deliver functionality and as long as it looks correct at the interface level, all is good. Little thought is really put into long term maintenance or performance. From a developer perspective, time put into becoming an expert in React, Angular, Node, Python etc is probably going to earn them more bonus points than time spent on developing skills in defining good data models or understanding of the power/functionality of the underlying database engine. Of course, this does tend to be short sighted as a good data model will tend to make it easier to add/enhance an application and understanding your database system will make changes and enhancements less daunting. For me, the sign of a good developer is one who is able to get the balance right. They understand the strengths and weaknesses of ALL the components involved and are able to select the technology mix which suits the problem domain and are able to get the right balance between business responsiveness to change and long term maintenance/viability. Unfortunately, such developers are rare, so it will usually mean there are a team of people with different skills and what will matter is how well they are able to work together as a team and come up with an architecture which satisfies the business requirements. -- regards, Tim -- Tim Cross
Re: Rationale for aversion to the central database?
Peter J. Holzer writes: > In my applications I use SQL heavily. RDBMs are good at processing > queries, so use them for that. If all you want is a key-value store, > don't use PostgreSQL. I'm not very fond of ORMs. I know what I want to > do and can express it in SQL. An ORM makes me translate that into a > different (and usually inferior) query language, which is then > translated back into SQL. That doesn't make things easier for me. > Could not agree more! My experience has been that ORMs just get in the way. Worse yet, when I've investigated performance problems raised by developers, I've often found it is due to the ORM layer, which is unable to map more complex queries efficiently. The only 'layer' I've ever used which I liked was HugSQL. I quite liked this approach as you write the queries in SQL and these get exposed to the application layer as high level functions, so gives a nice clean interface. > > I come from Oracle, not MySQL, But I have also used MySQL, and I guess > the very wide gap in capabilities between Oracle and MySQL made me > cautious about putting too much into the database. There is also the > expectation that you should be able to use a different database engine > (SQL is a standard, right?) just like you should be able to use a > different C compiler, but in practice that never works. And of course I > wasn't very impressed with PL/SQL. (PostgreSQL gives you a much wider > range of languages for stored procedures than Oracle, but PL/PerlU still > isn't quite the same as Perl (And I suspect it's the same for Python). > > hp Again, totally agree. Nice in theory and reminds me of the 'write once, run everywhere' dream. Very few of the places I've worked have actually maintained cross database functionality for long, if at all. The problem is that while SQL may have a standard, how that standard is implemented is very different. When I have worked at places which tried to be database neutral, they inevitably give up as they find that in the end, they needed to maintain separate SQL or have separate database maintenance teams anyway. You will only get seamless SQL across different databases if your SQL is very basic, in which case, you probably don't need a full blown RDMS anyway. Most of the time, your choice of database will be dictated by your dominate platform in the market your application targets. -- Tim Cross
Re: Commands linked to pg_wrapper not working with non-root users
Jonathan Leroy - Inikup writes: > Hi, > > I'm using multiples versions of PostgreSQL from the postgresql.org > repository (http://apt.postgresql.org/) on Debian Jessie, on multiples > servers. > Each postgresql-client-XX package depends on postgresql-client-common, > which provides the pg_wrapper script > (/usr/share/postgresql-common/pg_wrapper). > > A lot of PostgreSQL commands are linked to pg_wrapper. E.g.: > /usr/bin/psql -> ../share/postgresql-common/pg_wrapper > > Here's my issue : when I'm logged as an user which is not root or > postgresql, I can't use any of the commands linked to pg_wrapper: > > user1@server1:~ $ /usr/bin/psql --version > Error: Invalid data directory > > > However, everything works with postgres or root user: > > root@server1:~ # /usr/bin/psql --version > psql (PostgreSQL) 9.4.16 > > > Also, everything works fine if I bypass pg_wrapper: > > user1@server1:~ $ /usr/lib/postgresql/9.4/bin/psql --version > psql (PostgreSQL) 9.4.16 > > I am missing something ? > > Thanks, Check your settings in /etc/postgresql-common/user_clusters. The wrapper script uses that file to determine what databases to connect to or what is the user default database cluster. It can also be overridden with a local ~/.postgresqlrc, so check there are no old settings there as well. Tim -- Tim Cross
Re: Postgresql database encryption
Vikas Sharma writes: > Hello Guys, > > Could someone throw light on the postgresql instance wide or database wide > encryption please? Is this possible in postgresql and been in use in > production?. > > This is a requirement in our production implementation. > This sounds like a lazy management requirement specified for 'security' purposes by people with little understanding of either technology or security. I suspect it comes form a conversation that went along the lines of "There has been lots in the news about cyber threats" "Yes, we need our system to be secure" "I know, lets make one of the requirements that everything must be encrypted, that will stop them" "Great idea, I'll add it as requirement 14". This is a very poor requirement because it is not adequately specified, but more critically, because it is specifying a 'solution' rather than articulating the requirement in a way which would allow those with the necessary expertise to derive an appropriate solution - one which may or may not involve encryption or hashing of data and which may or may not be at the database level. What you really need to do is go back to your stakeholders and ask them a lot of questions to extract what the real requirement is. Try to find out what risk they are trying to mitigate with encryption. Once this is understood, then look at what the technology can do and work out the design/implementation from there. It is extremely unlikely you just want all the data in the database encrypted. When you think about it, such an approach really doesn't make sense. In basic terms, if the data is encrypted, the database engine will need to be able to decrypt it in order to operate (consider how a where clause needs to be able to interpret actions etc). If the db can read the data, the keys must be in the database. If the keys are in the database and your database is compromised, then your keys are compromised. So provided you protect your database from compromise, you achieve the same level of security as you do with full data encryption EXCEPT for access to the underlying data files outside of the database system. For this, you will tend to use some sort of file system encryption, which is typically managed at the operating system level. Again, for the operating system to be able to read the file system, the OS must have access to the decryption keys, so if your OS is compromised, then that level of protection is lost as well (well, that is over simplified, but you get the idea). What this level of protection does give you is data at rest protection - if someone is able to access hour disks through some other means, they cannot read the data. This is the same principal most people should be using with their laptops. Protect the OS with a password and have the data on disk encrypted. Provided nobody can login to your laptop, they cannot read your data. Without this encryption, you can just take the disk out of the laptop, mount it on another system and you have full access. With disk encryption, you cannot do that. Same basic principal with the server. At the database level, a more typical approach is to use one way hashing for some sensitive data (i.e. passwords) and possibly column level encryption on a specific column (much rarer) or just well structured security policies and user roles that restrict who has access to various tables/columns. To implement this successfully, you need details regarding the domain, sensitivity of various data elements and the threats you need to protect against. If you cannot get this information because your stakeholders don't really know what their risks are and have not done a proper assessment and what you are really dealing with is bureaucracy which just as a dumb "data must be encrypted" policy, just use full disk encryption and state that all data is encrypted on disk" and your done. Tim -- Tim Cross
Re: Postgresql database encryption
Ron writes: > On 04/20/2018 03:55 PM, Vick Khera wrote: >> On Fri, Apr 20, 2018 at 11:24 AM, Vikas Sharma > <mailto:shavi...@gmail.com>> wrote: >> > > Someone really needs to explain that to me. My company-issued laptop has > WDE, and that's great for when the machine is shut down and I'm carrying it > from place to place, but when it's running, all the data is transparently > decrypted for every process that wants to read the data, including malware, > industrial spies, > It really depends on the architecture. In many server environments these days, some sort of network storage is used. Having the 'disk' associated with a specific server encrypted can provide some level of protection from another machine which also has access to the underlying infrastructure from being able to access that data. The other level of protection is for when disks are disposed of. There have been many cases where data has been retrieved off disks which have been sent for disposal. Finally, the basic physical protection. Someone cannot just access your data centre, remove a disk from the SAN and then access the data. Then of course there is the bureaucratic protection - "Yes boss, all our data is encrypted on disk." Tim " -- Tim Cross
Re: Postgresql database encryption
Ron writes: > On 04/20/2018 06:11 PM, Stephen Frost wrote: >> Greetings, >> >> * Ron (ronljohnso...@gmail.com) wrote: >>> On 04/20/2018 03:55 PM, Vick Khera wrote: >>>> On Fri, Apr 20, 2018 at 11:24 AM, Vikas Sharma >>> For anyone to offer a proper solution, you need to say what purpose your >>>> encryption will serve. Does the data need to be encrypted at rest? Does it >>>> need to be encrypted in memory? Does it need to be encrypted at the >>>> database level or at the application level? Do you need to be able to >>>> query the data? There are all sorts of scenarios and use cases, and you >>>> need to be more specific. >>>> >>>> For me, using whole-disk encryption solved my need, which was to ensure >>>> that the data on disk cannot be read once removed from the server. >>> Someone really needs to explain that to me. My company-issued laptop has >>> WDE, and that's great for when the machine is shut down and I'm carrying it >>> from place to place, but when it's running, all the data is transparently >>> decrypted for every process that wants to read the data, including malware, >>> industrial spies, >>> >>> Thus, unless you move your DB server on a regular basis, I can't see the >>> usefulness of WDE on a static machine. >> The typical concern (aka, attack vector) isn't around moving the DB >> server on a regular basis or about someone breaking into your data >> center and stealing your drives, it's making sure that disposal of >> equipment doesn't result in valuable data being retained on the >> drives when they leave the data center for replacement or disposal. > > That makes some sense, but years of added CPU overhead to mitigate a problem > that could be solved by writing zeros to the disk as a step in the decomm > process seems more than a bit wasteful. Problem is that decomm process relies on someone actually following the process. Too often, this part fails. The overhead with WDE these days is minimal anyway. Good security is always about layers of protection and should never just rely on a single control. Tim -- Tim Cross
Postgres and fsync
Hi all, the recent article in LWN regarding issues with fsync and error reporting in the Linux kernel and the potential for lost data has prompted me to ask 2 questions. 1. Is this issue low level enough that it affects all potentially supported sync methods on Linux? For example, if you were concerned about this issue and you had a filesystem which supports open_sync or open_datasync etc, is switching to one of these options something which should be considered or is this issue low level enough that all sync methods are impacted? 2. If running under xfs as the file system, what is the preferred sync method or is this something which really needs to be benchmarked to make a decision? For background, one of our databases is large - approximately 7Tb with some tables which have large numbers of records with large inserts per day i.e. approx 1,600,000,000 new records per day added and a similar number deleted (no updates), maintaining a table size of about 3Tb, though it is expected we will be increasing the number of retained records and will see the table grow to about 6Tb. This represents a fair amount of I/O and we want to ensure we have the fastest I/O we can achieve with highest data reliability we can get. The columns in the table are small i.e. 7 double precision, 2 integer, 1 date and 2 timestamp. Platform is RHEL, Postgres 9.6.8, filesystem xfs backed by HP SAN. Current wal_sync_method is fsync. Tim -- Tim Cross
Re: Postgres and fsync
Andres Freund writes: > Hi, > > On 2018-04-23 08:30:25 +1000, Tim Cross wrote: >> the recent article in LWN regarding issues with fsync and error >> reporting in the Linux kernel and the potential for lost data has >> prompted me to ask 2 questions. > > Note that you need to have *storage* failures for this to > happen. I.e. your disk needs to die, and there's no raid or such to fix > the issue. > > >> 1. Is this issue low level enough that it affects all potentially >> supported sync methods on Linux? For example, if you were concerned >> about this issue and you had a filesystem which supports open_sync or >> open_datasync etc, is switching to one of these options something which >> should be considered or is this issue low level enough that all sync >> methods are impacted? > > No, the issue is largely about datafiles whereas the setting you refer > to is about the WAL. > > Greetings, > > Andres Freund OK, thanks. -- Tim Cross
Re: Long running INSERT+SELECT query
Steven Lembark writes: > On Fri, 27 Apr 2018 19:38:15 +0300 > Vitaliy Garnashevich wrote: > >> We're going to try using "SELECT 1 FROM table FOR KEY SHARE" for each of >> the tables, which are referenced by results, before running the big >> query. That should be up to a million of rows in total. It will probably >> not cover the case when a record is INSERT'ed and then DELETE'd after >> the calculation has begun, but such cases should be even more rare than >> the DELETE's we're currently facing. > > Thing about using a couple of Materialized Views for the worst > part of it. +1 re: materialised views - I have found them to be extremely useful for situations where you want a snapshot of data and need to present it in a way which is easier to process, especially when the underlying data is changing faster than your reporting process can generate the report. -- Tim Cross
Re: Index/trigger implementation for accessing latest records
Alastair McKinley writes: > Hi, > > > I have a table that stores a location identifier per person which will be > appended to many times. > > However, for many queries in this system we only need to know the most recent > location per person, which is limited to about 1000 records. > > > Is the following trigger/index strategy a reasonable and safe approach to > fast access to the latest location records per person? > > > 1. A boolean column (latest_record default true) to identify the latest > record per person > 2. A before insert trigger that updates all other records for that person > to latest_record = false > 3. A partial index on the latest_record column where latest_record is true > > > Aside from performance, is it safe to update other records in the table from > the insert trigger in this way? > > > Minimal example is shown below: > > > create table location_records > > ( > > id bigserial, > > person_id bigint, > > location_id bigint, > > latest_record boolean not null default true > > ); > > > create function latest_record_update() returns trigger as > > $$ > > BEGIN > > update location_records set latest_record = false where person_id = > new.person_id and latest_record is true and id != new.id; > > return new; > > END; > > $$ language plpgsql; > > > create trigger latest_record_trigger before insert on location_records > > for each row execute procedure latest_record_update(); > > > create index latest_record_index on location_records(latest_record) where > latest_record is true; > > > insert into location_records(person_id,location_id) values (1,1); > > insert into location_records(person_id,location_id) values (1,2); > > insert into location_records(person_id,location_id) values (1,3); > > > insert into location_records(person_id,location_id) values (2,3); > > insert into location_records(person_id,location_id) values (2,4); > > > select * from location_records; > My personal bias will come out here I don't think using a trigger is a good solution here. Although very powerful, the problem with triggers is that they are a 'hidden' side effect which is easily overlooked and often adds an additional maintenance burden which could be avoided using alternative approaches. Consider a few months down the road and your on holidays. One of your colleagues is asked to add a new feature which involves inserting records into this table. During testing, they observe an odd result - a field changing which according to the SQL they wrote should not. The simple new feature now takes twice as long to develop as your colleague works out there is a trigger on the table. Worse yet, they don't notice and put there changes into production and then issue start getting raised about communications going to the wrong location for customers etc. Triggers often become a lot more complicated than they will initially appear. In your example, what happens for updates as opposed to inserts? What happens if the 'new' location is actually the same as a previously recorded location etc. In your case, I would try to make what your doing more explicit and avoid the trigger. There are a number of ways to do this such as - A function to insert the record. The function could check to see if that customer has any previous records and if so, set the boolean flag to false for all existing records and true for the new one. You might even want to break it up into two functions so that you have one which just sets the flag based on a unique key parameter - this would provide a way of resetting the current location without having to do an insert. - Use a timestamp instead of a boolean and change your logic to select the current location by selecting the record with the latest timestamp. - Keep the two actions as separate SQL - one to insert a record and one to set the current location. This has the advantage of making actions clear and easier to maintain and can be useful in domains where people move between locations (for example, I've done this for a University where the data represented the students current address, which would change in and out of semester periods, but often cycle between two addresses, their college and their parental home). The downside of this approach is that applications which insert this information must remember to execute both SQL statements. If you have multiple interfaces, this might become a maintenance burden (one of the advantages of using a DB function). Tim -- Tim Cross
Re: issues when installing postgres
On 10 May 2018 at 09:45, Adrian Klaver wrote: > On 05/09/2018 02:47 PM, Antonio Silva wrote: > >> Hello Adrian >> >> Are you using the Ubuntu or Postgres repos? >> > I'm using the Ubuntu repos >> >> Can you connect to Postgres using psql? >> > No I cannot >> > > What does ps ax | grep post show? > > My guess is you are going to have to reinstall Postgres. > > >> Thanks >> >> Antonio >> >> >> 2018-05-09 10:36 GMT-03:00 Adrian Klaver > <mailto:adrian.kla...@aklaver.com>>: >> >> >> On 05/08/2018 05:54 PM, Antonio Silva wrote: >> >> Hello! >> >> >> Comments inline. >> >> I bought a new computer and I installed Ubuntu 18.04 and after >> PostgreSQL. >> sudo apt install postgresql postgresql-contrib pgadmin3 >> >> >> Are you using the Ubuntu or Postgres repos? >> >> >> Nevertheless I had some issues with configuration files and >> decided to >> uninstall it completely >> sudo apt purge postgresql postgresql-contrib pgadmin3 >> >> When I installed it again I notice that postgresql.conf and >> pg_hba.conf >> were the oldies files. Then I uninstall Postgres once more and >> removed the >> directory /etc/postgresql/ >> >> After a new install I noticed that the directory >> /etc/postgresql/ was >> completely empty - it was not created again. There are no more >> postgresql.conf and pg_hba.conf files. I could find only >> postgresql.conf.sample and pg_hba.conf.sample at >> /usr/share/postgresql/10 >> >> /etc/init.d/postgresql status says that Postgres is running fine >> >> >> Can you connect to Postgres using psql? >> >> >> ● postgresql.service - PostgreSQL RDBMS >> Loaded: loaded (/lib/systemd/system/postgresql.service; enabled; >> vendor >> preset: enabled) >> Active: active (exited) since Tue 2018-05-08 10:43:23 -03; 1h >> 55min ago >> Process: 6451 ExecStart=/bin/true (code=exited, status=0/SUCCESS) >> Main PID: 6451 (code=exited, status=0/SUCCESS) >> >> >> To be running the server would need its conf files somewhere. >> >> >> What should I have to do to heve the folder /etc/postgresql/10/ >> ... and all >> its files agais? >> >> >> If you are going to use the packages then yes. >> >> >> >> >> I really appreciate any help. Thanks in advance. >> >> All the best >> >> -- Antônio Olinto Ávila da Silva >> >> >> >> -- Adrian Klaver >> adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com> >> >> >> > > -- > Adrian Klaver > adrian.kla...@aklaver.com > > Probably not relevant, but I noticed your also installing pgadmin3. I don't believe pgadmin3 will work with Postgres 10. You need pgadmin4, which isn't available as a package on ubuntu 18.04. Also, be aware that Ubuntu has also been pushing 'snaps', so make sure that Postgres hasn't been installed as a snap package (I think the command is snap lis). I recall when I tried to install postgres in 17.10, which I did from the 'software centre' on ubuntu, it initially installed it as a snap. I had to remove the snap and then use apt to get the normal deb package install. the snap package system does not use the normal locations for config files. This could also be a ubuntu 18.04 issue. This version was only released a couple of weeks ago and it is the first version which has Postgres 10 as the default. I would try the following 1. use systemctl to stop postgresql service 2. remove all postgres packages making sure all config files are also removed 3. Use synaptic to make sure all postgres package and associated config files have been removed. 4. Reboot 5. run apt update and then apt upgrade 6 re-install using apt (not the software centre). -- regards, Tim -- Tim Cross
Re: When use triggers?
hmidi slim writes: > HI, > > I'm working on a microservice application and I avoid using triggers > because they will not be easy to maintain and need an experimented person > in database administration to manage them. So I prefer to manage the work > in the application using ORM and javascript. > However I want to get some opinions and advices about using triggers: when > should I use them? How to manage them when there are some problems? I think triggers are generally best avoided. They do have a purpose, but like regular expressions and Lisp style macros, they are abused more often than used appropriately. When used correctly, they can help to ensure your code is robust, clear and easy to maintain. The big issue with triggers is that they are really a side effect of some other action. As such, they are obscure, easily missed, difficult to debug and often frustrating to maintain. In nearly 30 years of working with different databases, I've rarely found triggers necessary. As mentioned by others in this thread, they can be useful when you need low level auditing and like all guidelines, there are always exceptions, but in general, they should usually be the last choice, not the first. Database functions on the other hand are extremely useful and probably something more developers should take advantage of. There are far too many applications out there which are doing things within external application code which could be handled far more efficiently and consistently as a database function. The challenge is in getting the right balance. My rule of thumb is to develop under the assumption that someone else will have this dumped on them to maintain. I want the code to be as easy to understand and follow as possible and I want to make it as easy to make changes and test those changes as possible. Therefore I prefer my code to consist of simple units of functionality which can be tested in isolation and have a purpose which can be understood without requiring a knowledge of hidden actions or unexpected side effects. If a function cannot be viewed in a single screen, it is probably too big and trying to do too many different things which should be broken up into smaller functions. regards, Tim -- Tim Cross
Re: pgdg-keyring (or apt-key) failure on fresh 9.6 install
Moreno Andreo writes: > Hi folks, > I'm trying to install Postgresql 9.6 on a test machine in Google Cloud > Platform > After a fresh install with Debian 9 (just after the instance has been > created) I follow steps from here > > https://wiki.postgresql.org/wiki/Apt > > (instead of pg 10 I install pg 9.6) > > During the installation process i encounter the following strange > warnings that, even if that's a test machine, make me think twice before > going ahead. > > [...] > Processing triggers for man-db (2.7.6.1-2) ... > Setting up pgdg-keyring (2017.3) ... > Removing apt.postgresql.org key from trusted.gpg: Warning: The postinst > maintainerscript of the package pgdg-keyring > Warning: seems to use apt-key (provided by apt) without depending on > gnupg or gnupg2. > Warning: This will BREAK in the future and should be fixed by the > package maintainer(s). > Note: Check first if apt-key functionality is needed at all - it > probably isn't! > OK > Setting up xml-core (0.17) ... > [...] > > I have to say that installation is successfully and database server goes > up and apparently with no problems at all. > This looks like a warning for the package maintainers regarding ensuring the package depends on either gnupg or gnupg2 and nothing you need to worry about unless you are building/maintaining deb packages for postgres. The Debian package manager, apt, uses gpg keys to verify the authenticity of packages it downloads. My guess is that previously, you only needed to ensure the package had a dependency on apt-key and now apt has/is changing such that you need to have an explicit dependency on either gnupg or gnupg2. -- Tim Cross
Re: pgdg-keyring (or apt-key) failure on fresh 9.6 install
Moreno Andreo writes: > Hi Tim, > > Il 29/05/2018 00:06, Tim Cross ha scritto: >> Moreno Andreo writes: >> >>> Hi folks, >>> I'm trying to install Postgresql 9.6 on a test machine in Google Cloud >>> Platform >>> After a fresh install with Debian 9 (just after the instance has been >>> created) I follow steps from here >>> >>> https://wiki.postgresql.org/wiki/Apt >>> >>> (instead of pg 10 I install pg 9.6) >>> >>> During the installation process i encounter the following strange >>> warnings that, even if that's a test machine, make me think twice before >>> going ahead. >>> >>> [...] >>> Processing triggers for man-db (2.7.6.1-2) ... >>> Setting up pgdg-keyring (2017.3) ... >>> Removing apt.postgresql.org key from trusted.gpg: Warning: The postinst >>> maintainerscript of the package pgdg-keyring >>> Warning: seems to use apt-key (provided by apt) without depending on >>> gnupg or gnupg2. >>> Warning: This will BREAK in the future and should be fixed by the >>> package maintainer(s). >>> Note: Check first if apt-key functionality is needed at all - it >>> probably isn't! >>> OK >>> Setting up xml-core (0.17) ... >>> [...] >>> >>> I have to say that installation is successfully and database server goes >>> up and apparently with no problems at all. >>> >> This looks like a warning for the package maintainers regarding ensuring >> the package depends on either gnupg or gnupg2 and nothing you need to >> worry about unless you are building/maintaining deb packages for postgres. > Brilliant. That's what I needed to know. Just to avoid bitter surprises > in the future... :-) >> >> The Debian package manager, apt, uses gpg keys to verify the >> authenticity of packages it downloads. My guess is that previously, you >> only needed to ensure the package had a dependency on apt-key and now >> apt has/is changing such that you need to have an explicit dependency on >> either gnupg or gnupg2. >> > ... so if I update/upgrade this instance in the future it will be > automatically fixed (and there shouldn't be issues), right? > Thanks a lot! > Right. In fact, there are no issues now. That warning is from the Debian package management system and about the package management system, so nothing to do with Postgres. When you upgrade in the future, provided the new Postgres packages have been created with the dependency for gnupg/gnupg2, there will be no warnings. Personally, I tend to prefer using the packages which come with the particular flavour of Linux your installing as they are often more in-line with the current version of the package management system being used. I only grab packages from the specific Postgres repo if the package is not in the current version of the distribution I'm installing. -- Tim Cross
Re: LDAP authentication slow
nc service to synchronise passwords across AD and LDAP. Attributes in both systems are managed by MS Forefront. Our environment has a large mix of technologies - servers are roughly evenly split between Linux and MS - still probably more Linux, though MS has been increasing in recent years. Databases are a mix of Oracle and Postgres plus a smattering of MySQL. Staff numbers are around 3k with about 60% on MS, 35% OSX and 5% Linux. Client base is about 80k. The reason we use both openLDAP and AD is because there are differences between the two which are important for some of our applications (for example, attributes which are single valued under LDAP standards but can be multi-valued under AD) and because we need additional schemas which are easy to implement in standards compliant LDAP, but difficult in AD. We also found that when just requiring LDAP functionality, openLDAP out performed AD. How easily this can be done in your environment will depend on your identity management solution. Depending on what that is, it may be as easy as just adding another downstream target and a few mapping rules. In this case, it would probably be an overall win. However, if your IAM system cannot manage things easily, this is probably not practical. There has been another thread regarding LDAP performance where the issue looks like it could be a DNS related problem. It seems establishing connections is close when LDAP address is specified using name and faster when just an IP address is used. This could be something else you should look at. We had an issue a while back where our central IT provider made changes to DNS to improve security and enabling better handling of misbehaving clients - essentially, it was a DNS throttling configuration which would temporarily block requests from an IP if the number of requests being made was above a specified threshold. This caused some initial problems for us as we found some application libraries did not perform proper DNS caching and would regularly exceed the threshold. It also took some trial and error to get the right watermark for the throttling. A simple test like using IP address rather than name would likely help to identify if DNS related issues could be the cause or whether it is just an AD specific issue. Definitely check AD logs as well - the issue could be simply that adding a new system has increased demand sufficiently to degrade performance of AD (though I would expect there would be complaints from others outside the DB area if this was the case). The GSSAPI approach is not as complicated as it sounds, but it can be affected by environment/infrastructure architecture and it will be critical to ensure you have good time synchronisation. This can be somewhat challenging in hybrid environments where you have a mix of local and remote services. When it all works, it is great, but when you do have a problem, diagnosis can be challenging. The overall approach of having one identity with one password per entity is IMO the right approach and your only hope for good password policy application. However, getting to that point can be very challenging. -- Tim Cross
Re: Whither 1:1?
Olivier Gautherot writes: > On Fri, Jun 1, 2018 at 12:52 PM, Guyren Howe wrote: > >> It’s come to my attention that what seems an obvious and useful database >> design pattern — 1:1 relations between tables by having a shared primary >> key — is hardly discussed or used. >> >> It would seem to be a very simple pattern, and useful to avoid storing >> nulls or for groups of fields that tend to be used together. >> >> Thoughts? Is there some downside I can’t see? >> > > You will get a benefit in terms of space only if the optional fields in the > second table exist in a reduced number of instances - and the second table > is significantly wider. This can make a difference on big tables but this > gain may be offset by the cost of the join. In this perspective, I don't > think that there is a clear benefit or drawback: it should be evaluated on > a case-by-case basis. > Well said. Like many database design decisions, there are more guidelines than rules. Nearly all decisions have pros and cons. Therefore, you need to assess on a per project basis. In addition to the structure of data (size, null frequency etc), you also need to consider how the data is used. It is also important to consider who will be using the data, how they will access it and what level of sophistication/understanding they have. The clarity of your data model is also important as future decisions may be made by others and the better they understand the design/model, the higher the likelihood the system will evolve in a sane and maintainable manner. There other point to remember is that all designs often have an element of surprise - you make a decision under an assumption which turns out not to hold due to variables you didn't consider or don't have control over. Sometimes there are alternatives which may actually perform better because they are optimised within the system - for example, some of the benefits for this approach could be realised using partitions. I have used this technique, but from memory, this was done as the system evolved and we found there was a benefit from having a smaller 'main' table. It isn't a design decision I recall adopting during an initial modelling of the system, but it may be something to consider once you find a performance problem (along with other options) which needs to be addressed. I'm not aware of any guideline or school of thought which rules out this as an option. Possibly the reason it appears to be used infrequently is because it doesn't realise the benefits you might expect or is simply not an problem in a majority of use cases. Tim -- Tim Cross
Re: LDAP authentication slow
Jeff Janes writes: > On Thu, May 31, 2018 at 8:23 AM, C GG wrote: > > In the meantime, I did what I promised Adrian Klaver I would do and I added >> the AD servers to the /etc/hosts file. That had an immediate and dramatic >> effect on the performance. That confirms (at least to me) that DNS >> resolution was playing a large part in the slowdown. I'm going to >> experiment with running a local DNS caching server to see if that will give >> the same effect. >> > > I had this problem at one site, and with the same solution. As best as I > could tell, Windows was not using DNS as the main way of resolving > hostnames. (People assure me that NetBIOS and WINS are almost completely > dead, but WireShark tells a different tail--I don't recall the exact name, > but it was certainly something other than DNS). So the fact that AD's > built in DNS sucks was not a problem for Windows users, which means there > was no impetus on the Windows admin to fix it. And the delay on resolution > was always 5 seconds plus a very small handful of milliseconds. So it was > clearly some kind of designed throttling or timeout, there is no way random > congestion could get you so close to 5.00 every time. > We had particular problems with a specific client. In the end, it turned out that this client used a DNS resolve library which, unlike most other libraries, did not use local DNS cache. Every name lookup involved a full DNS resolution process. Temporary solution was to use IP addresses until admins/devs could work out how to fix the client or find a more robust solution where address isn't 'hard coded'. At the time, the admins responsible for managing the DNS were getting frustrated as their service was being blamed for a local client problem. Key take away, this stuff can be complex to diagnose and a systematic evidence based investigation is often required - problem is, that takes time and is seldom welcomed. -- Tim Cross
Re: Microsoft buys GitHub, is this a threat to open-source
Joshua D. Drake writes: > On 06/04/2018 10:31 AM, Rich Shepard wrote: >> On Mon, 4 Jun 2018, Joshua D. Drake wrote: >> >>> No but it does show why using non open source platforms for open source >>> projects is an inherently bad idea. >> >> Joshua, >> >> Sourceforge seems to be out of favor, too, so are there any open source >> platforms that provide services that sourceforge and github do? > > Gitlab which can also be self hosted, the one GNU does (I don't recall > the name). > I find gitLab to be a pretty good alternative. However, I don't think there is any need to panic. While it is possible (likely?) that MS will change the terms and conditions which work in favour of maintaining their profitability, which may cause some problems for particularly large open source projects, nothing is going to happen over night or so quickly that projects won't have an opportunity to find an alternative. There is an alternative perspective to seeing MS purchase of Github which is a little more positive. The challenge for open source is that at some point, there is a cost associated with storage, collaboration and sharing of source code. This cost has to be paid for by someone. While we can hope for philanthropic donations and gifts to pay this cost, it probably isn't a sustainable solution. If on the other hand, there is a profitable business which can maintain profitability while incorporating open source support as part of core business, then we may have a more sustainable and reliable solution. I am no MS fan and have little experience in the MS suite of products, but I think most would have to acknowledge that MS has started to embrace open source far more than it did in the past. Consider for example their VS Code editor or the fact Windows now comes with a Bash shell and more integrated support for Linux. I suspect that we will see little change in Github in the short term and provided MS can maintain long term profitability, we may see little long-term change as well. Of course, this only holds for the very relaxed definition of open source. RMS would/will be using this as a clear example of MS destroying open source and the weakness of the general open source movement when it fails to emphasise freedom. For a strict open source definition which emphasises freedom rather than just 'openness', Github would likely already be ruled out due to their restrictive terms and conditions regarding ownership and licenses. However, the subtleties of RMS's concerns are often misunderstood and incompatible with our tendency to focus on short term, low friction solutions. For now, I'll just take a leaf out of 'the Guide', grab my towel and not panic! Tim -- Tim Cross
Re: Load data from a csv file without using COPY
Ravi Krishna writes: > In order to test a real life scenario (and use it for benchmarking) I want to > load large number of data from csv files. > The requirement is that the load should happen like an application writing to > the database ( that is, no COPY command). > Is there a tool which can do the job. Basically parse the csv file and > insert it to the database row by row. > Not clear what you mean by 'real world scenario', but you could possibly use PG's foreign table support and define a csv file as a foreign table and then have scripts which read from there and do whatever insert/update etc you need. However, this has a high level of 'fakery' going on and probably not testing what you really want. There are lots of ways that applications write to the database - different drivers (e.g. jdbc, odbc, pg etc), different commit and transaction strategies and even different ways to handle things like an update or insert process. You can even use streams and copy from an application. To get 'real world' equivalence, you really need to use the same interface as the application you are comparing. Most languages have support for processing CSV files, so you may be better off writing a small 'wrapper' app which uses the same drivers and assuming your database connectivity has been abstracted into some sort of module/library/class, use the same interface to write to the database that the application uses. Tim -- Tim Cross
Re: Using COPY to import large xml file
Anto Aravinth writes: > Thanks for the response. I'm not sure, how long does this tool takes for > the 70GB data. > > I used node to stream the xml files into inserts.. which was very slow.. > Actually the xml contains 40 million records, out of which 10Million took > around 2 hrs using nodejs. Hence, I thought will use COPY command, as > suggested on the internet. > > Definitely, will try the code and let you know.. But looks like it uses the > same INSERT, not copy.. interesting if it runs quick on my machine. > > On Sun, Jun 24, 2018 at 9:23 PM, Adrien Nayrat > wrote: > >> On 06/24/2018 05:25 PM, Anto Aravinth wrote: >> > Hello Everyone, >> > >> > I have downloaded the Stackoverflow posts xml (contains all SO questions >> till >> > date).. the file is around 70GB.. I wanna import the data in those xml >> to my >> > table.. is there a way to do so in postgres? >> > >> > >> > Thanks, >> > Anto. >> >> Hello Anto, >> >> I used this tool : >> https://github.com/Networks-Learning/stackexchange-dump-to-postgres >> If you are using nodejs, then you can easily use the pg-copy-streams module to insert the records into your database. I've been using this for inserting large numbers of records from NetCDF files. Takes between 40 to 50 minutes to insert 60 Million+ records and we are doing additional calculations on the values, not just inserting them, plus we are inserting into a database over the network and into a database which is also performing other processing. We found a significant speed improvement with COPY over blocks of insert transactions, which was faster than just individual inserts. The only downside with using COPY is that it either completely works or completely fails and when it fails, it can be tricky to work out which record is causing the failure. A benefit of using blocks of transactions is that you have more fine grained control, allowing you to recover from some errors or providing more specific detail regarding the cause of the error. Be wary of what indexes your defining on your table. Depending on the type and number, these can have significant impact on insert times as well. -- Tim Cross
Re: Using COPY to import large xml file
On Mon, 25 Jun 2018 at 11:38, Anto Aravinth wrote: > > > On Mon, Jun 25, 2018 at 3:44 AM, Tim Cross wrote: > >> >> Anto Aravinth writes: >> >> > Thanks for the response. I'm not sure, how long does this tool takes for >> > the 70GB data. >> > >> > I used node to stream the xml files into inserts.. which was very slow.. >> > Actually the xml contains 40 million records, out of which 10Million >> took >> > around 2 hrs using nodejs. Hence, I thought will use COPY command, as >> > suggested on the internet. >> > >> > Definitely, will try the code and let you know.. But looks like it uses >> the >> > same INSERT, not copy.. interesting if it runs quick on my machine. >> > >> > On Sun, Jun 24, 2018 at 9:23 PM, Adrien Nayrat < >> adrien.nay...@anayrat.info> >> > wrote: >> > >> >> On 06/24/2018 05:25 PM, Anto Aravinth wrote: >> >> > Hello Everyone, >> >> > >> >> > I have downloaded the Stackoverflow posts xml (contains all SO >> questions >> >> till >> >> > date).. the file is around 70GB.. I wanna import the data in those >> xml >> >> to my >> >> > table.. is there a way to do so in postgres? >> >> > >> >> > >> >> > Thanks, >> >> > Anto. >> >> >> >> Hello Anto, >> >> >> >> I used this tool : >> >> https://github.com/Networks-Learning/stackexchange-dump-to-postgres >> >> >> >> If you are using nodejs, then you can easily use the pg-copy-streams >> module to insert the records into your database. I've been using this >> for inserting large numbers of records from NetCDF files. Takes between >> 40 to 50 minutes to insert 60 Million+ records and we are doing >> additional calculations on the values, not just inserting them, >> plus we are inserting into a database over the network and into a >> database which is >> also performing other processing. >> >> We found a significant speed improvement with COPY over blocks of insert >> transactions, which was faster than just individual inserts. The only >> downside with using COPY is that it either completely works or >> completely fails and when it fails, it can be tricky to work out which >> record is causing the failure. A benefit of using blocks of transactions >> is that you have more fine grained control, allowing you to recover from >> some errors or providing more specific detail regarding the cause of the >> error. >> > > Sure, let me try that.. I have a question here, COPY usually works when > you move data from files to your postgres instance, right? Now in node.js, > processing the whole file, can I use COPY > programmatically like COPY Stackoverflow ? > Because from doc: > > https://www.postgresql.org/docs/9.2/static/sql-copy.html > > I don't see its possible. May be I need to convert the files to copy > understandable first? > > Anto. > >> >> > Yes. Essentially what you do is create a stream and feed whatever information you want to copy into that stream. PG sees the. data as if it was seeing each line in a file, so you push data onto the stream wherre each item is seperated by a tab (or whatever). Here is the basic low level function I use (Don't know how the formatting will go!) async function copyInsert(sql, stringifyFN, records) { const logName = `${moduleName}.copyInsert`; var client; assert.ok(Array.isArray(records), "The records arg must be an array"); assert.ok(typeof(stringifyFN) === "function", "The stringifyFN arg must be a function"); return getClient() .then(c => { client = c; return new Promise(function(resolve, reject) { var stream, rs; var idx = 0; function done() { releaseClient(client); client = undefined; resolve(idx + 1); } function onError(err) { if (client !== undefined) { releaseClient(client); } reject(new VError(err, `${logName}: COPY failed at record ${idx}`)); } function arrayRead() { if (idx === records.length) { rs.push(null); } else { let rec = records[idx]; rs.push(stringifyFN(rec)); idx += 1; } } rs = new Readable; rs._read = arrayRead; rs.on("error", onError); stream = client.query(copyFrom(sql)); stream.on("error", onError); stream.on("end", done);
Re: Using COPY to import large xml file
Anto Aravinth writes: > Thanks a lot. But I do got lot of challenges! Looks like SO data contains > lot of tabs within itself.. So tabs delimiter didn't work for me. I thought > I can give a special demiliter but looks like Postrgesql copy allow only > one character as delimiter :( > > Sad, I guess only way is to insert or do a through serialization of my data > into something that COPY can understand. > The COPY command has a number of options, including setting what is used as the delimiter - it doesn't have to be tab. You need to also look at the logs/output to see exactly why the copy fails. I'd recommend first pre-processing your input data to make sure it is 'clean' and all the fields actually match with whatever DDL you have used to define your db tables etc. I'd then select a small subset and try different parameters to the copy command until you get the right combination of data format and copy definition. It may take some effort to get the right combination, but the result is probably worth it given your data set size i.e. difference between hours and days. -- Tim Cross
Re: FK v.s unique indexes
Rafal Pietrak writes: > > In particular, contrary to what the ERROR says, the target table *does > have* a "unique constraint matching given keys", admittedly only > partial. Yet, why should that matter at all? A unique index, partial or > not, always yield a single row, and that's all what matters for FK. Right? > Is that correct? I would have thought that if you have a multi-key unique index and you only provide values for some of the keys in the index, you would have no guarantee of a single row being returned. If this was true, then the additional keys are superfluous. Have you tried doing the same thing where the fk keys and remote unique index keys are equal in number? -- Tim Cross
Re: Disabling/Enabling index before bulk loading
Ravi Krishna writes: > We recently did a test on COPY and found that on large tables (47 million > rows , 20GB of raw data) the > difference in COPY with 16 indexes and COPY without any index is 1:14. That > is, COPY is 14 times slower > when data is ingested with all indexes as opposed to COPY first without index > and then create all index. > This is very system dependent. On our system, when we tested a similar approach, we found that the time saved through dropping the indexes before copy was lost when rebuilding the indexes afterwards. In fact, it ended up being slightly slower. I suspect a lot depends on the number and types of indexes you have. Your example had a lot more indexes than ours. We were loading 22 batches with 2.5M records per batch. While copy was significantly faster than transaction based inserts (minutes vs hours), the differences between indexes and no indexes was measured in minutes. We only had 3 or 4 indexes. > I googled for earlier posting on this and it looks like this has been asked > before too. > > This is what I am thinking to do: > > 1 - Extract index definition and save it as a SQL somewhere, either a file or > a table. > 2 - Drop all indexes. > 3 - Ingest data via COPY > 4 - Recreate all indexes saved in (1). > > Is there a generic sql or script or tool to accomplish (1). > We are loading data via Javascript using pg and pg-copy-streams modules. It is pretty straight forward to drop the indexes and recreate them afterwards via sql, so we didn't look for a tool as such. As data is only inserted into this table and only by this process, we also turned off autovacuum for this table, performing vacuum and analyze manually after load. Tim -- Tim Cross
Re: Open Source tool to deploy/promote PostgreSQL DDL
y of scripts over the constraints of a GUI. Given the high level of variability in environments, you are probably best off developing the process and scripts rather than trying to find an existing tool. Putting a web front end is likely easier than finding a tool flexible enough to fit with the environment which avoids situations where the tool begins to dictate how you operate (tail wagging the dog). Tim -- Tim Cross
Re: Do we need yet another IDE (SQL development assistant) for PostgreSQL?
Dmitry Igrishin writes: > вс, 15 июл. 2018 г. в 22:42, Chuck Davis : > >> If you decide to proceed on this project there's no need to reinvent the >> wheel. >> >> I use Netbeans for my development. it has quite a good facility for >> working with databases and I use it regularly with Postgres. Since >> Netbeans is now licensed under Apache 2 you might find useful code >> there. Be assured it uses JDBC for access but JDBC is universally >> available and the folks at Postgresql have done quite a nice job with >> JDBC drivers. Of course, this already works on all platforms. The >> implementation is basic but very useful: i.e. a good starting point. >> > Thank you for the point. I'm the C++ programmer and I'm author of the > C++ client library for PostgreSQL - Pgfe and I'm going to use it in this > project. But I'm not sure about the cross-platform GUI toolkit. The cross-platform GUI toolkit will be the challenge. Your idea to make it integrate with user's preferred editor is a good idea as editors are like opinions and certain anatomical parts - everyone has one! Finding an appropriate API to do this will be a challenge. I seem to remember reading somewhere that Oracle was going to remove swing from the core java library. I've always been a little disappointed with Java UIs and found they don't give the cross-platform support that Java originally promised, plus OSX/macOS has not made Java as welcome as it use to be. If you do choose Java, it will need to work under openJDK as this is what most Linux users will have installed. Tim -- Tim Cross
Re: User documentation vs Official Docs
Joshua D. Drake writes: > -general. > > Over the last year as I have visited many meetups and interacted with > people at conferences etc... There are three prevailing issues that > continue to come up in contributing to the community. This email is > about one of them. Where is the "user" documentation? The official > documentation is awesome, if you know what you are doing. It is not > particularly useful for HOWTO style docs. There is some user > documentation in the wiki but let's be honest, writing a > blog/article/howto in a wiki is a pain in the butt. > > What does the community think about a community run, community > organized, sub project for USER documentation? This type of > documentation would be things like, "10 steps to configure replication", > "Dumb simple Postgres backups", "5 things to NEVER do with Postgres". I > imagine we would sort it by version (9.6/10.0 etc...) as well as break > it down via type (Administration, Tuning, Gotchas) etc... > > What do we think? > I think encouraging user developed docs is a great idea. However, I'm not sure how your proposal really addresses the issue. How would your proposal deal with the "but let's be honest, writing a blog/article/howto in a wiki is a pain in the butt" issue? Writing decent documentation or clear examples is hard and the only thing worse than no documentation is misleading or confusing documentation. My only real concern would be to further fracture the PG user base. If there are barriers preventing users from adding documentation to the existing documents or wiki, perhaps it would be better to try and address those first? Tim -- Tim Cross
Re: Do we need yet another IDE (SQL development assistant) for PostgreSQL?
Dmitry Igrishin writes: > пн, 16 июл. 2018 г. в 1:14, Tim Cross : > >> >> Your idea to make it integrate with user's preferred editor is a good >> idea as editors are like opinions and certain anatomical parts - >> everyone has one! Finding an appropriate API to do this will be a >> challenge. >> > I see two options here: the core of the tool acts as a long-lived server or > as a short-lived > console application which communicates with the editor's plugin via > stdin/stdout. > Btw, what the text editor do you prefer? :-) > Most of the time, I use Emacs on either Linux or macOS. With the support it has for running a psql process, it works pretty well for most things. There are pretty reasonable packages for writing SQL and 'static' completion. Getting things setup can take a bit of effort, but once it is working, it tends to work pretty well. The two areas where it lacks are dynamic completion i.e. completing on objects the user has created such as table names and column names/function names etc. and decent result formatting. >> >> I seem to remember reading somewhere that Oracle was going to remove >> swing from the core java library. I've always been a little disappointed >> with Java UIs and found they don't give the cross-platform support that >> Java originally promised, plus OSX/macOS has not made Java as welcome as >> it use to be. If you do choose Java, it will need to work under openJDK >> as this is what most Linux users will have installed. >> > For now, the possible options for the GUI part are Qt, wxWidgets or FLTK, > or even Electron. I would look at either Qt or even Electron (I believe visual code is written using Electron, which is the other editor I use from time to time). There was an Emacs project called Eclaim (I think) which interfaced with Eclipse services in order to provide dynamic completion when doing Java. That could be worth checking out for ideas to borrow. Tim -- Tim Cross
Re: User documentation vs Official Docs
Peter Eisentraut writes: > On 17.07.18 02:13, Joshua D. Drake wrote: >> On 07/16/2018 05:08 PM, Alvaro Herrera wrote: >>> >>> Sounds like wiki pages could solve need this pretty conveniently. If >>> and when the content is mature enough and migrates to the tutorial main >>> documentation pages, the wiki pages can be replaced with redirects to >>> those. >> >> Anyone who writes a lot is going to rebel against using a wiki. They are >> one of the worst to write in from a productivity perspective. I would >> rather write in Docbook, at least then I can template everything and we >> could have a standard xsl sheet etc... > > I don't really buy that. The wiki seems just fine for writing short to > medium size how-to type articles. We already have good content of that > sort in the wiki right now. It's not like there isn't going to be > anyone who will rebel against any of the other tool chains that have > been mentioned. If using web widgets to author content on the wiki is the main impediment for contributing content, maybe we should see if the wiki provides alternative access methods. I've used wikis in the past which allowed users to upload content via xmlrpc, api etc. Perhaps something similar could be made available for those making significant contributions or to a select few 'curators' who could accept content from others. If it is the interface that is the problem, we should try to address that first rather than simply switching to something new which will have its own problems. However, I don't know if this is the case or not. Tim -- Tim Cross
Re: User documentation vs Official Docs
Peter J. Holzer writes: > On 2018-07-18 08:09:35 +1000, Tim Cross wrote: >> If using web widgets to author content on the wiki is the main >> impediment for contributing content, maybe we should see if the wiki >> provides alternative access methods. I've used wikis in the past which >> allowed users to upload content via xmlrpc, api etc. Perhaps something >> similar could be made available for those making significant >> contributions or to a select few 'curators' who could accept content >> from others. > > There are also browser plugins like It's all text, textern, wasavi, etc. > which allow the user to use a real text editor instead of a text area. > > hp +1. Should have remember that option given I have such a plugin myself and use it often! -- Tim Cross
Re: User documentation vs Official Docs
Our University provides access to a Linux server for any student (not just those in data science etc) or staff member and that computer has Postgres available for anyone who want to use it. The server is also accessible remotely (80% of our student base is remote/on-line). You also get a shell account and can install any software which can be installed and run from that account. At another University I do some work for, they have moved to a virtual environment, where students are able to spin up a virtual computer on demand and have full access to install whatever software they like (though there are some constraints on what can be setup to 'persist' across instances. You could install PG, but I'm not sure if it would be restored next time you spin up hyour virtual server). >From personal experience, I can say that when I was a student, a $60 book was very difficult to justify/afford and I greatly valued on-line resources at that time. I made extensive use of the library, but obtaining specific books was not as easy as asking for them - the library has limited space and can only maintain collections on a demand basis, so you were unlikely to get a book just based on request. A further aspect about on-line resources not yet mentioned is the accessibility aspect. As a blind programmer, I know the huge benefits of electronic resources compered to dead trees! Tim On Fri, 20 Jul 2018 at 11:03, Melvin Davidson wrote: > > > On Thu, Jul 19, 2018 at 8:54 PM, Adrian Klaver > wrote: > >> On 07/19/2018 05:43 PM, Melvin Davidson wrote: >> >>> >>> >>> >> >>> >>> > Then again people might use shared, university or library computers >>> Would you please be so kind as to inform us which university or library >>> allows users to install software on a _shared_ computer. >>> >> >> Pretty sure Ken was referring to looking up documentation, not running >> Postgres. >> >> >>> BTW, since you mention library, that is an excellent way to have the >>> books ordered and shared.>FOR FREE<. AFAIK, all that is required is for >>> someone to request the library purchase the book, to be used for shared >>> learning. >>> >>> >>> -- >>> *Melvin Davidson** >>> Maj. Database & Exploration Specialist** >>> Universe Exploration Command – UXC*** >>> Employment by invitation only! >>> >> >> >> -- >> Adrian Klaver >> adrian.kla...@aklaver.com >> > > > Pretty sure Ken was referring to looking up documentation, not running > Postgres. > That does not correlate. To have the need to look up documentation implies > that the user has a computer running PostgreSQL. > As universities DO NOT ALLOW software to be installed on shared computers, > and this is the case especially in a library, it implies > the user has their own computer. As libraries allow users/citizens to > request books be purchased >at no cost to the user/citizen, the > argument that someone cannot afford a book is now a moot point. > > -- > *Melvin Davidson* > *Maj. Database & Exploration Specialist* > *Universe Exploration Command – UXC* > Employment by invitation only! > -- regards, Tim -- Tim Cross
Re: How to revoke privileged from PostgreSQL's superuser
bejita0...@yahoo.co.jp writes: > Hello, > > I am a newbie DBA. > > I have a request for revoking the access to user's data from DBA-user. > I think the request is right because users should be the only ones can access > their data. > But DBA-user also need full access to the other data? It means that DBA-user > also needs to be a superuser. > > So I conclude the request that how to revoke privileged from superuser in > postgres. > > As my knowledge, the superuser in PostgreSQL bypasses all the permission > check. > So that, there is no way to do it in PostgreSQL, is that right? > > Is there some DBAs are faced with this before? > There are certainly DBAs who have had very similar requests. Often, they are generated by non-technical people who don't really understand how the technology works and have concerns over who has access to the data (a common one is people who are concerned about who has access to their email - we had a similar challenge from our Chief legal officer who was paranoid sys admins were reading his highly sensitive email, this is despite the fact 1 week previously, I was on a flight sitting in the seat behind him while he read his email on his iPad, which I could (but didn't) easily read over his shoulder!). The key to handling this sort of request is to dig deeper to understand what the real risk is that they want addressed and work out how you can do that within the constraints of the technology and what makes sense within your context. I'm sure someone will respond to this thread with all sorts of highly restrictive and powerful controls that will restrict access to the data, but if they are not appropriate for your business context, will likely cripple the very process you are trying to protect. All controls/restrictions cause some level of inconvenience - the challenge is in getting the balance right so that the identified risk is mitigated with the least level of inconvenience to normal business operations. The reality is that at various times, humans will need the ability to access the data in ways which will limit, if not completely prevent, your ability to restrict access. This is particularly relevant for system and database administrators. It is pretty much 100% impossible to guarantee that a sys admin or DBA cannot access data. However, what you can do is approach the problem slightly differently and look at ways to make this access harder and more importantly, make sure that all access is logged appropriately and can be audited, ensuring the logging/auditing system is also protected from deletion or modification. Other posts in the thread include some good pointers on what you can do to help with this. The principals are pretty straight forward. Possibly the most important thing to do is ensure there is no 'anonymous' access e.g. you cannot login to the database as 'postgres' or some other generic account which multiple people have access to. Instead, ensure that everyone with any level of administrator privilege has to login using an account which is specific to them and not shared. The second thing to do is ensure the logging level is appropriate and that all logging is also stored/recorded on a system which the administrator does not have access to and ensure the level of privileges every individual has is at the minimum they require to get the job done. It is also important that logs and audit trails are regularly reviewed to ensure nobody is abusing the system and all controls are still appropriate (things change, new systems come on line, old ones a retired, business processes change etc). If necessary, consider controls which restrict access to accounts with extended privileges to certain hosts e.g. DBA 'Phil' can only log into the database from server xxx.xxx.xxx.xxx and he can only log into that server between 9am and 5pm Mon - Fri etc. Maybe he has to use a hardware token etc. In most cases, provided you can give strong guarantee that unauthorised data access can be identified, you will satisfy the security requirements and this is often far more feasible than outright blocking of access. Finally, it is also important that all staff are aware of the organisations policies, procedures and controls regarding data access. They need to know what is expected of them and what is unacceptable. -- Tim Cross
Re: Safe operations?
On Mon, 13 Aug 2018 at 11:24, Adrian Klaver wrote: > On 08/12/2018 05:41 PM, Samuel Williams wrote: > > I wish the documentation would include performance details, i.e. this > > operation is O(N) or O(1) relative to the number of rows. > > > > I found renaming a table was okay. > > > > How about renaming a column? Is it O(1) or proportional to the amount of > > data? > > > > Is there any documentation about this? > > https://www.postgresql.org/docs/10/static/sql-altertable.html > > "RENAME > > The RENAME forms change the name of a table (or an index, sequence, > view, materialized view, or foreign table), the name of an individual > column in a table, or the name of a constraint of the table. There is no > effect on the stored data. > " > > Just wondering - what about the case when the column being renamed is also > referenced in an index or check constraint? (I would guess you cannot > rename a column used in a check constraint without first removing it, but > for an index, would this result in the index being rebuilt (or do you have > to take care of that manually or are such references abstracted such that > the column name "text" is irrelevant tot he actual structure of the > index?). > > > -- > Adrian Klaver > adrian.kla...@aklaver.com > > -- regards, Tim -- Tim Cross
Re: Safe operations?
On Mon, 13 Aug 2018 at 12:23, Olivier Gautherot wrote: > On Sun, Aug 12, 2018 at 11:06 PM, Tim Cross wrote: > >> >> On Mon, 13 Aug 2018 at 11:24, Adrian Klaver >> wrote: >> >>> On 08/12/2018 05:41 PM, Samuel Williams wrote: >>> > I wish the documentation would include performance details, i.e. this >>> > operation is O(N) or O(1) relative to the number of rows. >>> > >>> > I found renaming a table was okay. >>> > >>> > How about renaming a column? Is it O(1) or proportional to the amount >>> of >>> > data? >>> > >>> > Is there any documentation about this? >>> >>> https://www.postgresql.org/docs/10/static/sql-altertable.html >>> >>> "RENAME >>> >>> The RENAME forms change the name of a table (or an index, sequence, >>> view, materialized view, or foreign table), the name of an individual >>> column in a table, or the name of a constraint of the table. There is no >>> effect on the stored data. >>> " >>> >>> Just wondering - what about the case when the column being renamed is >>> also referenced in an index or check constraint? (I would guess you cannot >>> rename a column used in a check constraint without first removing it, but >>> for an index, would this result in the index being rebuilt (or do you have >>> to take care of that manually or are such references abstracted such that >>> the column name "text" is irrelevant tot he actual structure of the >>> index?). >> >> > Tim, as far as I know, names are only an attribute tagged to an OID. > Internal relations are though these OIDs, not names, so renaming a column > is really one-shot. Names are mainly a more convenient way of referring to > objects. > > Olivier > thanks Olivier, that is what I suspected and your explanation fits with my mental model. I had assumed table/column names are convenience for humans and that the system would use OIDs etc for internal references. -- regards, Tim -- Tim Cross
Re: vPgSql
Vlad ABC writes: > On Fri, 2018-08-17 at 15:45 +0300, Dmitry Igrishin wrote: >> >> Looking nice! Thank you. But I unable to start it on Ubuntu, because >> there is no bash(1) in /usr/bin. > > Thank you, i'll fix it. > I think pretty much all *nix systems put core shells like sh, bash, zsh etc in /bin (as it is guaranteed to be available immediately at boot, while /usr is not - it could be a separate partition which isn't available until later in the boot process). A way to avoid platform differences is to use /usr/bin/env e.g. #!/usr/bin/env bash . >> Also, it is open source? > > No, it is freeware oh well, too bad. -- Tim Cross
Re: AW: CTE with JOIN of two tables is much faster than a regular query
kpi6...@gmail.com writes: >> -Ursprüngliche Nachricht- >> Von: Ravi Krishna >> Gesendet: Samstag, 18. August 2018 18:25 >> >> > What can I do to improve the performance of the regular query without >> using a CTE? >> >> Why do you care ? When I find that I can write a SQL 3 different ways, I >> will >> go for the most efficient one. So why not accept the CTE version of this >> SQL. >> Just curious. > > We're using object mapping / entity frameworks (e.g. XPO, Entity Framework > Core). These frameworks support regular queries out-of-the box; a CTEs > require additional effort and are more difficult to maintain. > Ah, another reason to avoid object mapping/entity frameworks! I guess really the same reason - loss of flexibility and expressive power. Sorry, having a similar battle with some developers who are insisting on using a particular framework because it makes maintenance easier as it 'automates' creation of controllers (MVC). However, they are frustrated by performance and I'm frustrated as the framework also fails to pass additional information, such as PGAPPNAME, which would make some analysis easier. Part of the reason for the performance issues is because the developers are doing things with result sets within the client that would be far more efficient performed within the database. One way I have resolved this in the past is to create database procedures which present a 'mapped' view back to the framework layer which hides the SQL from the framework. Works well, with the only main downside being you now have SQL in a different (another) place, which can make some people uncomfortable and can be a maintenance issue if all your developers are just front-end devs who treat a database as just a key/value repository. . Tim -- Tim Cross
Re: Can Pg somehow recognize/honor linux groups to control user access ?
Dimitri Maziuk writes: > On 08/22/2018 11:38 AM, Ravi Krishna wrote: >> >> In fact DBAs don't even need to get involved when a new user needs DB >> access. Sysadmin had to just add that user in a group and we are done. > > How is that different from giving your grants to a database role and > just telling the new user the name and password of that role to connect as? I hope your not suggesting that multiple users use the same login credentials to access the database? This would totally destroy the important security principals of attribution and repudiation. I have not looked at PAM for PG, but that is certainly something worth checking out. I have used it in other contexts and it is very powerful. Depending on the implementation, theoretically, it should be possible to have PAM provide the information to determine what database roles to give to a login. What you will probably need is some additional component to act as the 'meta-directory' so that you can have consistent usernames, uid/gid across servers/systems (i.e. ldap/ad) Something else to look at is openLDAP. Again, not yet looked at it in the PG context, but have used it successfully to manage access to other systems (including databases in other RDMS) before. Currently, we use openLDAP and PAM to manage access on Linux servers. One of the things on my 'todo' list is to look at it in a PG context, just haven't got there yet. Avoid any solution which requires a 'generic' account with shared passwords. Apart from the security implications, you will almost certainly run into problems with auditors and many regulatory standards. -- Tim Cross
Re: Can Pg somehow recognize/honor linux groups to control user access ?
Joshua D. Drake writes: > On 08/22/2018 08:56 AM, David Gauthier wrote: >> Hi: >> >> The title says it all. I need to be control who can gain access to a >> DB based on a linux user group. I can set up a generic role and >> password, but also want to prevent users who are not in a specific >> linux group from accessing the DB. For code that works with the DB, >> this is easy (just chmod the group on the code file(s)). But is there >> a way to add an additional gauntlet that checks membership in the >> linux group if, for example, they were trying to get in using psql at >> the linux prompt ? >> >> There are a couple hundred in the linux group and the list of names >> changes constantly. I suppose creating a DB role per user in the linux >> group may be possible if something like a cron was maintaining this >> (creating/dropping uid based roles as the group membership changes) >> then give everyone the same password. But does that prevent someone >> outside the linux group from just logging in with someone else's uid >> and the generic password? >> I'm hoping that this is a common need and that someone has a good >> solution. >> >> Thanks in Advance for any help! > > You could probably write a pam module to do it but it seems to be your > are inverting the problem and should be looking at this from a Postgres > not Linux perspective. Perhaps consider using an SSO solution for both > Linux and Postgres. > I think this is more an identity management problem rather than SSO (SSO can be considered a sub-topic within identity management). In this case, you are probably looking for some mix of 'single sign on' and 'same sign on', depending on business requirements. This is a large and complex problem which needs a 'whole of business' approach. Essentially, you need a single directory which is a 'source of truth' for all accounts. It manages usernames, user and group IDs, passwords and attributes representing what services and resources each individual is entitled to access. Downstream systems, such as PG, use the information in this central repository to manage authentication (username/password) and authorisation (which services and what roles within those services). A very common architecture is to have synchronised openLDAP and Active Directory as the centralised directory (source of truth) and some application which sits on top to manage the data in the directory servers (MS FIM/MIM, Oracle IDM, Dell Quest etc). How complex and large this ends up being really depends on the particulars of the environment i.e. size, mix of platforms, etc. I would start with either openLDAP or Active Directory (depending on environment) and then PAM if more fine grained control is required that cannot be satisfied via ldap/ad. Tim -- Tim Cross
Re: unorthodox use of PG for a customer
David Gauthier writes: > Hi Everyone: > > I'm going to throw this internal customer request out for ideas, even > though I think it's a bit crazy. I'm on the brink of telling him it's > impractical and/or inadvisable. But maybe someone has a solution. > > He's writing a script/program that runs on a workstation and needs to write > data to a DB. This process also sends work to a batch system on a server > farm external to the workstation that will create multiple, parallel > jobs/processes that also have to write to the DB as well. The workstation > may have many of these jobs running at the same time. And there are 58 > workstation which all have/use locally mounted disks for this work. > > At first blush, this is easy. Just create a DB on a server and have all > those clients work with it. But he's also adamant about having the DB on > the same server(s) that ran the script AND on the locally mounted disk. He > said he doesn't want the overhead, dependencies and worries of anything > like an external DB with a DBA, etc... . He also wants this to be fast. > I would agree the customers proposed architecture has problems. It is likely to be fragile and difficult to maintain. At some point, there willl likely be a need to consolidate the data in all these separate databases, which could lead to other problems. It sounds like there is some previous experience which has caused problems for your customer and they are trying to avoid a repeat by defining the technical solution rather than asking for a solution. The first step is to spend more time talking to your customer and getting to understand the underlying reasons why he is proposing those technical/architecture constraints. I think once you have full details regarding his requirements and the risks as he perceives them, you will likely be able to come up with a much more workable solution which will both address his/her concerns and be an architecture which is solid and maintainable. There is a good chance all the reasons will not be purely technical. My wild guess is that previously, there has been problems with central IT services - probably bureaucracy and poor response times or communication or there was misalignment with regards to expectations. I often encounter this type of problem working with researchers who are very smart and informed in their local area of expertise, but less so when it comes to understanding the complexities, maintenance overheads and other activities associated with providing reliable services (backups, upgrades, tuning etc). The two areas (IT and customer) frequently speak different languages even when using the same words. It can be extremely challenging to get clear, consistent and agreed requirements. For example, what does 'fast' mean? The 'fast' requirement and the desire to have things run locally could indicate a concern regarding network performance. I find performance is often blamed on the network, but this is less often the case in modern networks. More often than not it is poor algorithm design, badly tuned databases or badly designed database schemas and CPU or memory limits. Pointing out the maintenance overheads and possible failure points in his proposed architecture may help. Being able to collect real metrics to demonstrate where bottlenecks and performance issues reside will also help going forward - good metrics and hard numbers can often circumvent circular arguments regarding problem causes. Also consider your internal business processes. I've seen too many good architecture solutions becoming perceived as failures because the other non-technical stuff failed - poor communications, failure to align technical maintenance with business requirements, unnecessary delays or poor responses to enquiries and questions and inability to adapt to changing business requirements in a timely manner. This is often the most frustrating part - you can be an excellent technical person able to define and implement really good technical solutions, but if the customer is unable to use the solution effectively, it will be seen as a technical failure. Tim -- Tim Cross
Re: using a plpgsql function argument as a table column.
Off the top of my head, I think you could do this using dynamic (execute) SQL in a function. However, it is going to be messy, possibly slow and likely fragile. You would need to query the catalogue to get the column names in the table and then build the SQL dynamically 'on the fly'. Without having more detail, my spider sense tells me you have the wrong table/relationship design. While you may be able to get it to work, it is likely you will run into constant problems and additional complexity that could be avoided with a different design. You really want a design where your queries are driven by the data in your tables and not by the names of columns. I would seriously consider re-examining your schema design, look at how your design fits in with the normal forms and adapt as necessary. Tim On Wed, 29 Aug 2018 at 15:10, ss wrote: > I have a table with many years as columns. y1976, y2077, .. , > y2019,y2020 I want to dynamically return a column from a function. > > > select * from FUNCTION('y2016') . > > select t1.cola t1.colb, t1.colc, t2.y2016 from . Where t2.y2016 != 0; > > or if I select year y2012 I want FUNCTION('y2012') > > select t1.cola t1.colb, t1.colc, t2.y2012 from . Where t2.y2012 != 0; > > > to generalize > > select * from FUNCTION( year_column ) > > select t1.cola t1.colb, t1.colc, t2.year_column from . Where > t2.year_column != 0; > > is it possible? if so how? > > > -- regards, Tim -- Tim Cross
Re: Ways to deal with large amount of columns;
a <372660...@qq.com> writes: > Hi all: > > > I need to make a table contains projected monthly cashflow for multiple > agents (10,000 around). > > > Therefore, the column number would be 1000+. > > Not sure your data model is correct. Typically, with something like this, increasing the number of agents would result in tables with more rows rather than more columns. Tables with large numbers of columns is often a 'code smell' and indicates the underlying data model needs to be reviewed. Designs which result in new columns being required because you are adding new data sources is almost certainly an indication of the need to review the data model. Postgres (and many other databases) have lots of functionality to help deal with tables that have large numbers of rows, but working with tables that have large numbers of columns has less functionality and options. While it is very tempting to create a table and then start coding, you will almost always get a much better result and simpler code if you spend some initial time to really analyse your domain, understand the data elements and how they relate to each other, map them out into a data model and then start development. Have a look at https://en.wikipedia.org/wiki/Database_normalization for some background on the normal forms and why they are useful. HTH Tim -- Tim Cross
Re: Ways to deal with large amount of columns;
On Fri, 31 Aug 2018 at 10:47, a <372660...@qq.com> wrote: > Thank you very much. Creating a function seems to be a good idea :) > > > -- Original message -- > *From:* "David G. Johnston"; > *Sendtime:* Thursday, Aug 30, 2018 8:31 PM > *To:* "a"<372660...@qq.com>; > *Cc:* "pgsql-general"; > *Subject:* Re: Ways to deal with large amount of columns; > > On Thursday, August 30, 2018, a <372660...@qq.com> wrote: > >> Hi all: >> >> I need to make a table contains projected monthly cashflow for multiple >> agents (10,000 around). >> >> Therefore, the column number would be 1000+. >> >> I would need to perform simple aggregate function such as count, sum or >> average on each cashflow projected. >> >> So if there is anyway of doing this? Will there be anything like define a >> macro in C that I can manipulate multiple columns by simple word that >> representing them. >> > > Better to design a data model that doesn't have so many columns. Otherwise > generating dynamic SQL via the for,at function and loops and such is your > best bet. Can be down in pl/pgsql or in your preferred programming > language. Psql variables can maybe be useful too. > > David J. > > Perhaps post your proposed table design/definition. There is nothing in what you have described so far which would indicate a necessity to have more columns as you increase the number of agents. It would be normal to have something like | agent_id | year | cash_on_hand | bank | creditors | debtors | and queries like select sum(cash_on_hand) from table where agent_id = 'agent1' and yesr = 2018; to get the sum of cash on hand for agent1 in 2018. instead of something like | agent1_cash2018 | agent2_cash2017 | which will not work well. Tim -- regards, Tim -- Tim Cross
Re: WARNING: could not flush dirty data: Function not implemented
Tom Lane writes: > Andres Freund writes: >> On 2018-09-02 19:29:49 -0400, Tom Lane wrote: >>> If this is on Ubuntu, I don't understand why you're talking >>> about Windows. > >> The OP said "Ubuntu 18.04 as Windows bash" - so I assume this is >> postgres compiled as a linux binary is running on MS's new-ish linux >> emulation. > > Whee ... so you get to cope with all the bugs/idiosyncrasies of three > operating system layers, not just one. That comment has made my day - thanks Tom! -- Tim Cross
Re: WARNING: could not flush dirty data: Function not implemented
Tom Lane writes: > Ravi Krishna writes: >>> Whee ... so you get to cope with all the bugs/idiosyncrasies of three >>> operating system layers, not just one. I concur that running Postgres >>> in the underlying Windows O/S is probably a much better idea. > >> Me too, but this is purely for learning and I am much more use to Linux >> stack then ... gasp Windows :-) > > Hmm, so maybe you should install Ubuntu as the native O/S, and when > you need Windows, run it inside a VM? > > regards, tom lane This is what I do and it works well except. If you don't run the windows VM very often (like me), when you do, start it before lunch or that next long meeting. The updates will grind things to a crawl. When you run every day or fairly frequently, you don't notice them, but if you only run once every 4+ weeks, it can have a big impact and take ages. Linux as the host and using the VM is still better than the weird idiosyncrasies of Windows as the main workstation client (IMO). Where office policy has insisted on Windows as the workstation, I have had reasonable success with running virtualbox with Linux, though these days, the Windows environment is often too locked down to allow this. I've not yet experimented with the virtual linux layer in w10. -- Tim Cross
Re: Converting to number with given format
Gabriel Furstenheim Milerud writes: > Sorry, > So basically what I'm trying to achieve is the following. There is an input > file from the user and a configuration describing what is being inserted. > For example, he might have Last activity which is '-MM-DD HH:mi:ss' and > Join date which is only '-MM-DD' because there is no associated timing. > For dates this works perfectly and it is possible to configure what the > input from the user will be. Think it is as a dropdown where the user says, > this is the kind of data that I have. > > Maybe that is not possible with numbers? To say in a format something like > "my numbers have comma as decimal separator and no thousands separators" or > "my numbers are point separated and have comma as thousands separator" > > Nice thing of having a string for the format is that I can use it as a > parameter for a prepared statement. > I think this is normally something much better dealt with at the client level. Things like comma separator/grouping in numbers is really just a 'human' thing and is very locale dependent. The values 9,999 and are the same values. Things can quickly become complicated as you can have locale information at both the server and client end and they may not be the same. As you should always be sanitising your data before inserting into the database anyway, you may as well just add this as another check at the client end. Tim -- Tim Cross
Re: Weird procedure question
digimer writes: > Hi all, > > I've got an interesting use case that I am stuck on. It's a bit of a > complicated environment, but I'll try to keep it simple. > > In short; I have a history schema that has tables that match the > public schema, plus one 'history_id' column that has a simple sequential > bigserial value. Then I have a procedure and trigger that, on UPDATE or > INSERT, copies the data to history. Example use case is that I can > UPDATE a sensor value in the public table and it's also INSERTs the data > into history. So public shows just the most recent values, but I can see > changes over time in the history schema. > > I have built my system to support writing to one or more DBs. I keep > a list of connected DBs and send INSERT/UPDATE calls to a method that > then runs the UPDATE/INSERT against all connected databases, as a form > of redundancy. This all works fine. > > The problem I've hit is that the 'history_id' differs between the > various databases. So I want to switch this to 'history_uuid' and use > UUIDs instead of bigserial. > > Now the question; > > Can I tell a produce to use a specific UUID? > > The idea is to generate a UUID for 'history_uuid' so that I have > consistency across databases. Of course, if an UPDATE will change > multiple rows, then I'll need to predefine multiple UUIDs. This is where > things start to get really complicated I think... Maybe I could pass an > array of UUIDs? I don't care if I find out which UUID was used for which > record, just that the same UUID was used for the same record when the > procedure is (re)run on other DBs. > > The databases are not clustered, on purpose. I've been trying to > handle all the HA stuff in my application for various reasons. > > If it helps, here is an example pair of tables, the procedure and the > trigger I currently use; > > > CREATE TABLE host_variable ( > host_variable_uuid uuid not null primary key, > host_variable_host_uuid uuid not null, > host_variable_name text not null, > host_variable_value text not null, > modified_date timestamp with time zone not null > ); > ALTER TABLE host_variable OWNER TO admin; > > CREATE TABLE history.host_variable ( > history_id bigserial, > host_variable_uuid uuid, > host_variable_host_uuid uuid, > host_variable_name text, > host_variable_value text, > modified_date timestamp with time zone not null > ); > ALTER TABLE history.host_variable OWNER TO admin; > > CREATE FUNCTION history_host_variable() RETURNS trigger > AS $$ > DECLARE > history_host_variable RECORD; > BEGIN > SELECT INTO history_host_variable * FROM host_variable WHERE > host_uuid = new.host_uuid; > INSERT INTO history.host_variable > (host_variable_uuid, > host_variable_host_uuid, > host_variable_name, > host_variable_value, > modified_date) > VALUES > (history_host_variable.host_variable_uuid, > history_host_variable.host_variable_host_uuid, > history_host_variable.host_variable_name, > history_host_variable.host_variable_value, > history_host_variable.modified_date); > RETURN NULL; > END; > $$ > LANGUAGE plpgsql; > ALTER FUNCTION history_host_variable() OWNER TO admin; > > CREATE TRIGGER trigger_host_variable > AFTER INSERT OR UPDATE ON host_variable > FOR EACH ROW EXECUTE PROCEDURE history_host_variable(); > > > I know this might sound odd, but I didn't want to complicate things > with how my system works. However, if it would help solve the problem, > I'm happy to dig into more detail. > > Thanks! I think James has probably given you the input you need - basically, don't allow the system to automatically set the modified time - make that parameter to your function or set that value before the copy to the history tables - content would then be the same, so uuid v3 should work. However, I do think you have another big problem lurking in the shadows. What happens if any of your connected databases are unavailable or unreachable for a period of time? I suspect your going to run into update anomalies and depending on your setup/environment, possibly even partitioning problems (depending on number of clients and typology etc). These are well known problems in distributed or replication systems. You appear to be implementing a 'poor mans' replication system. There are lots of complex issues to deal with and I wonder why you want to take them on when PG has already got well tested and robust solutions for this that would simplify your architecture and avoid the need to re-implement functionality which already exists? regards, Tim -- Tim Cross
Re: Detecting which columns a query will modify in a function called by a trigger
stan writes: > On Mon, Mar 02, 2020 at 11:02:54AM -0800, Adrian Klaver wrote: >> On 3/2/20 10:59 AM, stan wrote: >> > I need to implement a fairly fine grained security model. Probably a bit >> > finer that I can do with the standard ownership functionality. >> > >> > My thinking on this is to create a table that contains the users, and a >> > "permission bit" for each function that they may want to do, vis a vi >> > altering an existing row,or rows, or inserting new rows. >> > >> > Looks relatively straight forward, if fairly time consuming to do. But I >> > would need to know which column(s) a given query would add..alter from the >> > function to implement this via a trigger. looks like I see most of what I >> > need t do this in the docs, but I can't quite figure out if I can get this >> > down to what column(s) a given trigger will modify. Is this possible? >> >> Before you get too far into this I would look at RLS: >> >> https://www.postgresql.org/docs/12/ddl-rowsecurity.html >> > Thanks for pointing that out. > > Using that functionality was my original plan, but let me describe why I do > not think it > can do what I need. This may be an indication of my weakness in design > though. > > Envision a table with a good many columns. This table represents the "life > history" of a part on a project. Some of the columns need to be > created/modified by the engineer. Some need to be created/modified by the > purchasing agent, some of the columns need to be created by the receiving > department, some of the columns need to be created/modified by the accounts > payable department. > > Make sense? When you speak of columns needing to be created/modified, do you really mean columns or rows? It would be a very unusual approach to allow multiple different 'agencies' to create/modify underlying table design. If this is the case, then you are in an impossible position and have no hope of implementing anything that will be maintainable and you will never be able to manage security. I'm hoping you mean different agencies which need to add/modify rows wihtin the tables? -- Tim Cross
Re: Improve COPY performance into table with indexes.
times, multiple but shorter performance impacts can be more acceptable than one long one). - Make sure you have good logging enabled and check for things like overly frequent writing of WALs. This can have significant impact on performance. If your rows are large, you may be adversely impacting performance writing the WAL cache etc. -- Tim Cross
Re: Estimated resources for a 500 connections instance (VM)
David Gauthier writes: > After looking at some of the factors that can affect this, I think it may > be important to know that most of the connections will be almost idle (in > terms of interacting with the DB). The "users" are perl/dbi scripts which > connect to the DB and spend the vast majority of the time doing things > other than interacting with the DB. So a connection is consumed, but it's > not really working very hard with the DB per-se. I am cleaning up some of > that code by strategically connecting/disconnecting only when a DB > interaction is required. But for my edification, is it roughly true that 2 > connections working with the DB 100% of the time is equivalent to 20 > connections @ 10% = 200 connections @ 1 % (if you know what I mean) ? Based on that additional info, I would definitely follow Laurenz's suggestion. Long time since I used Perl DBI, but I'm pretty sure there is is support for connection pools or you can use one of the PG connection pooling solutions. There is a fixed memory allocation per connection, so 2 connections at 100% is not the same as 20 connections @ 10%. Using a connection pool is usually the first thing I will setup. If additional connections are still required, then I would increase the limit in small jumps - definitely would not go from 100 to 500. BTW running PG on a virtual is not an issue in itself - this is very common these days. However, I would ensure you are up-to-date wrt latest minor release for that version and would use clients with the same version as the master. -- Tim Cross
Re: timestamp and timestamptz
Niels Jespersen writes: > Hello all > > > > We have some data that have entered a timestamp column from a csv. The data > in the csv are in utc. We want to access the data in > our native timezone (CET). > > > > I am considering a few alternatives: > > > > 1. Early in the process, convert to timestamptz and keep this datatype. > > 2. Early in the process, convert to timestamp as understood in CET. > This will imply by convention that the data in the timestamp > column represents CET. Users will need to be told that data represents CET, > even if data is somwhere in the future kept in another > country in another timezone. > > > > I probably should choose 1 over 2. But I am a bit hesitant, probably because > we almost never have used timestamptz. > > > > Can we agree that the below query is selecting both the original utc > timestamps and 2 and 1 (as decribed above)? > > > > set timezone to 'cet'; > > select read_time read_time_utc, (read_time at time zone 'utc')::timestamp > read_time_cet, (read_time at time zone 'utc')::timestamptz > read_time_tz from t limit 10; > > > > We are on Postgres 12. > > > Keep your life simple - just go with option 1. Keep all timestamps in UTC and let clients deal with it in whatever way they need to. This will also help deal with issues associated with daylight savings time (DST can be a real pain as different locations have it and others don't and the 'switchover' date is subject to political whims and can change). Your option 2 will cause all sorts of issues and keep in mind that most larger countries have multiple timezones, so even if your data is all associated with a single country, you can potentially have multiple conversion routines required. On most *nix systems, clock time is UTC as well, so having everything in UTC really helps when you want to do diagnosis across your database and system log files etc. -- Tim Cross
Re: Using unlogged tables for web sessions
Stephen Carboni writes: > Hello. > > I was wondering if anyone was using unlogged tables for website > sessions in production. I'm interested if it breaks the prevailing > opinion that you don't put sessions in PG. This really depends on what you define as website session data and what benefit you would see compared to the additional overhead of maintaining this session information remotely (from the client). Depending on your application, there is often some session information which makes more sense stored on the back end server rather than in the client - notably, data used by your server API to modify responses or possibly encrypted data to handle 'remember me' type functionality. However, you probably don't want to store session data used by the client API e.g. browser Javascript as this would introduce additional network overheads, latency, load on web and db server, increased web and db server API complexity and possibly additional data privacy/security concerns you will need to manage. This can be hard to justify when you have good client data storage facilities available. I have not come across a use case where it made sense to store ALL session data remotely in the database. I have seen situations with a very specialised application where having a more full featured LOCAL (to the client) database server to record session information can be useful, but this is rare. -- Tim Cross
Re: Unable to connect to the database: TypeError: net.Socket is not a constructor
uestions/40599069/node-js-net-socket-is-not-a-constructor > that > "there are no plain TCP sockets in the browser, so that is why trying to > use `net.Socket` in the browser (via webpack, browserify, etc.) won't work" > > Environment Info: > > System: > OS: Linux 5.3 Ubuntu 18.04.4 LTS (Bionic Beaver) > CPU: (8) x64 Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz > Binaries: > Node: 12.15.0 - ~/.nvm/versions/node/v12.15.0/bin/node > Yarn: 1.22.4 - ~/.nvm/versions/node/v12.15.0/bin/yarn > npm: 6.14.4 - ~/.nvm/versions/node/v12.15.0/bin/npm > Browsers: > Chrome: 81.0.4044.92 > Firefox: 75.0 > npmGlobalPackages: > @vue/cli: 4.2.3 > > So... how to solve the problem? > Looking forward to your kind help. > Marco This has nothing to do with postgres. This is a purely Javascript issue combined with differences between the browser Javascript API and node javascript API. Bottom line - if your trying to connect directly to a PG database from within Javascript code running inside the browser, it won't work. The browser does not have the 'net' library. You can only do this from within node.js. Likewise, references to express are also red hearings - you don't have a web server inside the browser either. If your not trying to develop inside the browser but are in fact developing at the node.js level, then you don't need webpack. I think what you really need to do is step back and look closer at your architecture. Typically, you would put all your database interaction stuff in the web server using node.js. A common design pattern would be to use one of the node.js web servers, like express (but there are others) and have something like nginx as a proxy server in front of it. You would then wrap your database interaction in a node.js API which you would then call from your client browser using http/https or web sockets. -- Tim Cross
Re: Lock Postgres account after X number of failed logins?
Wolff, Ken L writes: > As Stephen states, even some basic functionality in this regard would go a > long way. Perhaps something could be built into the postgresql-contrib RPM? > Right now the only way I see is to write a hook, which involves changing > source code, which then puts us into the situation of (1) maintaining our own > code tree and (2) figuring out how to produce a new set of RPMs. > > I realize Postgres is a community project and that there are a great number > of other valuable feature/enhancement requests in the queue. Just adding my > $.02 here. > The problem here is that everyone has valid points. Tom is quite correct that this sort of security policy really needs to be implemented in a single central location, such as LDAP, AD or some other IAM middleware. Having security policies implemented separately in different systems is where failures creep in and why maintenance becomes a problem. Where Tom's solution fails is with smaller companies that cannot afford this level of infrastructure. They can still fall victim to the same level of regulatory bureaucracy, but without the necessary level of technical resources of larger organisations. For these organisations, basic facilities, like the ability to lock an account after a certain number of failed login attempts for a period of time is a very useful feature. My suggestion would be to develop the basic requirements and contribute the result to Postgres. This would give back to the community and eliminate the need to maintain separate code in the long-term. The cost of paying for extra resources to do this development and maintenance is still going to be less than the licensing costs for that commercial competitor. Just requesting the facility is unlikely to result in any acceptable outcome within any reasonable time frame. If your security people are really on top of their game, they will be providing you with a security architecture which fulfils the enterprise architecture requirements and which centralises IAM management. This is really the only truly secure solution which guarantees access is removed from all system in a timely manner, enables effective logging and auditing of access, ensures consistent application of security policy and allows consistent response to security incidents and events. While requiring additional resources to establish, it does tend to result in reduced maintenance costs in the longer term. -- Tim Cross
Re: Lock Postgres account after X number of failed logins?
Geoff Winkless writes: > On Wed, 6 May 2020 at 00:05, Tim Cross wrote: >> Where Tom's solution fails is with smaller companies that cannot afford >> this level of infrastructure. > > Is there an objection to openldap? It's lightweight (so could > reasonably be run on the same hardware without significant impact), > BSD-ish and mature, and (with the password policy overlay) should > provide exactly the functionality the OP requested. > OpenLDAP is certainly the way I would go. However, for a number of reasons, smaller companies seem somewhat resistant to that level of integration. I suspect it is primarily because LDAP skills are less prevalent amongst admins in these areas. Often, these companies don't really have a planned architecture - things have grown organically and got to the point where existing resources are fully allocated just trying to keep all the bits running. It can be hard to sell the idea, especially as those making the decisions are not across the issues and from where they sit, it all looks to be working and your asking for more resources when it doesn't seem to be broken. The IT guys often fail to sell the benefits because they focus on the technical aspects rather than on the business aspects. One client I helped had admins who had been trying to move everything over to a centralised LDAP solution for ages and failing. They had presented great justification for why it was needed, but it focused on the technical benefits rather than the business continuity, process improvement and security benefits. Once we put together a new business case which focused on improved processes for managing access, reduced security audit costs and improved security controls, they were sold and made the project a priority. Based on additional info I saw from the OP and plans to roll out many databases, I think a centralised directory service approach is really their only saleable and maintainable solution. In fact, they probably need to look at their overall identity management architecture. Failure to get that basic service correct will result in major support issue blow out as they increase their customer base. -- Tim Cross
Re: Best way to use trigger to email a report ?
David G. Johnston writes: > On Fri, May 8, 2020 at 9:26 AM David Gauthier > wrote: > >> psql (9.6.0, server 11.3) on linux >> >> Looking for ideas. I want a trigger to... >> 1) compose an html report based on DB content >> 2) email the report to a dist list (dl = value of a table column) >> >> If this will involve hybrid coding, I prefer PL/perl. The linux env has >> both "mail" and "mutt" (if this is of any help). >> >> The idea is to send a report to the list when all the data has been >> collected for a particular job and the final status of the job is updated >> as a col of a rec of a certain table. Probably a post update trigger. >> > > I'd probably limit the trigger to checking for the completion of the data > collection and inserting a record into a "email job" table. Then I'd have > cron on a linux machine periodically run a script that queries the "email > job" table for work, perform the work, and then either flag the work as > done or remove the job record. > Yep, exactly how I would do it as well. Personally, triggers are my last choice. In some situations, they are the right choice and when you do need them, keep them as small and simple as possible. Others have mentioned the issues of using external calls inside a trigger. In addition to the problem with waiting on external processes to complete, you also have all the messy external work things to take care of (like network down, server unavailable, etc). The solution I've used in the past is to have procedures in the database which generate the email report and insert that data into an email table. I then have a shell level script (could be perl, javascript, java, whatever) which looks in this table periodically and if it finds a report which has not been sent, extract it, optionally format it and send it. On successful completion, set a 'sent' flag on the report record in the DB (or just delete it - I prefer to set a flag so that if something failed unexpectedly, you still have the report). There are ways you can trigger periodic activity in the database, but to be honest, CRON is easy and reliable and avoids needing to add additional extensions etc to the DB. Turning off the report, changing the time etc, is a simple crontab edit. -- Tim Cross
Re: Enforcing uniqueness on [real estate/postal] addresses
Peter Devoy writes: > Hi list > > I need to store addresses for properties (as in real estate) so in my > naivety I created a unique constraint like this: > > ALTER TABLE properties > ADD CONSTRAINT is_unique_address > UNIQUE ( > description, --e.g. Land north of Foo Cottage > address_identifier_general, > street, > postcode > ); > > Of course, if any of the fields are NULL (which they often are) I end > up with duplicates. > > One solution may be to add NOT NULL constraints and use empty strings > instead of NULL values but, until asking around today, I thought this was > generally considered bad practice. > > Please can anyone recommend a way of approaching this? Perhaps empty strings > are pragmatic in this situation? > > Kind regards > > Hi Peter, Personally, I don't like the idea of using empty strings just to avoid having nulls. This is probably a personal preference, but for me null and '' are quite different. A null indicates an unknown - we don't know what the value is. An empty string i.e. '' means there is no value (i.e. we know it has no value). The difference is quite subtle and may not seem relevant. It may not be or it may be or it may become relevant in the future. General rule of thumb for me is that my model should reflect the known information and should always avoid any data transformation or mapping which reduces the known information. I would step back a bit and think about why/what constraint you really need and what needs to be unique. The first field which jumps out for me is description. Is this really a unique value? Would it be possible to have two properties with the same description? Does it matter if two properties have the same description? Does the description really affect property uniqueness. If two records have the same street, postcode and general_property_identifier, but different descriptions, are they really two different records? Will description change over time? As description is a fairly subjective value, I would be tempted to not include it in your unique constraint at all. In fact, I would probably keep description in a separate table as it may be reasonable to have multiple descriptions for a property. If you want just a single description, then you can leave it in this table. I would not put a unique or not null constraint on it. This would leave you with address_identifier_general, street and postcode. None of those will be unique by themselves. You will only get uniqueness when you combine all 3. Can any of them be null? I would suspect not, so I would define them with not null constraints. I would then probably add a composite unique index using all 3 values to enforce uniqueness. Depending on your application needs, I would probably add a unique property_id field to the table as well (which would be the value I would used to link records in other tables, such as a property_description table). Of course, there are other constraints you could consider - post code probably has a set format which you might want to enforce or perhaps you can access a complete listing of valid postcodes and import that into your system as a postcode table. In that case, your postcode field might be better defined as a foreign key constraint into the postcode table. When defining your constraints, it is important to consider what information is known at the point of initial data entry. Is there a need to enter partial data (for example, you might know the street and postcode, but not the general_property_identifier. Is it expected or reasonable to allow entry of this sort of partial data? If so, how will that work with your uniqueness constraints? (it may be quite reasonable to require all 3 fields be known). The point is, you need to know how the system will be used and what the expectations of the users are. Maybe there is a legitimate business case to allow partial data entry, in which case, you may need a different approach or a way to identify partial/incomplete records etc. -- Tim Cross
Re: Ubuntu 20.04: apt.postgresql.org/pub/repos/apt focal-pgdg InRelease' doesn't support architecture 'i386'
Hugh writes: > Hi, > > While this doesn't appear to be a bug that causes problems of any kind, I do > have a question about its cause. > > The "error" listed in the Subject: line is basically what I'm seeing. The > entire message is below, particularly the 'N:' at the end. Is there a repo > setting I should change to prevent the request for '386' architecture? Thank > you in advance for your assistance. > > user@ubuntu:~$ sudo apt update > > Hit:1 http://us.archive.ubuntu.com/ubuntu focal InRelease > Get:2 http://us.archive.ubuntu.com/ubuntu focal-updates InRelease [107 kB] > > Get:3 http://security.ubuntu.com/ubuntu focal-security InRelease [107 kB] > > Get:4 http://us.archive.ubuntu.com/ubuntu focal-backports InRelease [98.3 kB] > > Get:5 http://us.archive.ubuntu.com/ubuntu focal-updates/main amd64 DEP-11 > Metadata [90.4 kB] > Hit:6 http://apt.postgresql.org/pub/repos/apt focal-pgdg InRelease > > Get:7 http://us.archive.ubuntu.com/ubuntu focal-updates/universe amd64 DEP-11 > Metadata [21.4 kB] > Get:8 http://us.archive.ubuntu.com/ubuntu focal-backports/universe amd64 > DEP-11 Metadata [532 B] > Get:9 http://security.ubuntu.com/ubuntu focal-security/main amd64 DEP-11 > Metadata [16.6 kB] > Get:10 http://security.ubuntu.com/ubuntu focal-security/universe amd64 DEP-11 > Metadata [208 B] > Fetched 441 kB in 1s (367 kB/s) > Reading package lists... Done > Building dependency tree > Reading state information... Done > All packages are up to date. > N: Skipping acquire of configured file 'main/binary-i386/Packages' as > repository 'http://apt.postgresql.org/pub/repos/apt focal-pgdg InRelease' > doesn't support architecture 'i386' > This error is because by default the debian package manager is trying to download details on all supported architectures, but failing to find one for i386. You can add an architecture tag to the repository definition in the source list file i.e. /etc/apt/sources.list or /etc/apt/sources.list.d/postgres.list (or whatever you have called itIf). Try adding the arch option as deb [ arch=amd64 ] http://. deb-src [arch=amd64 ] ... This should tell apt to only look for the amd64 packages. -- Tim Cross
Re: Should I use JSON?
stan writes: > Worming on a small project, and have been doing a lot of Perl scripting to > parse various types of files to populate the database. Now I need to get > data from a cloud services provider (time-keeping). They have a REST API > that returns data in a JSOSN format. > > So here is the question, should I just manually parse this data, as I have > been doing to insert into appropriate entities into the database? Or should I > insert the JSON data, and use some queries in the database to populate my > tables from the JSON tables? Given you plan to store your data in 'normal' tables and you are already using a scripting language to get the data from the remote API and your already processing data in various forms using Perl, I would not bother. All you will really do is add another layer of complexity and skill requirement (i.e. JSON in the database and writing JSON queries using PG's SQL JSON support). -- Tim Cross
Re: GPG signing
Marc Munro writes: > I need to be able to cryptographically sign objects in my database > using a public key scheme. > > Is my only option to install plpython or some such thing? Python > generally makes me unhappy as I'm never convinced of its stability or > the quality of its APIs, and it is not obvious to me which of the many > different gpg-ish packages I should choose. > > Any other options? Am I missing something? > This is something you would normally implement at the application level, using the database as just the store for the data and signatures or signed digests. Main reason for this is to allow for secure key management. It is very difficult to implement a secure key management solution at the database level unless it is designed into the fundamental architecture of the rdbms. It is the same age old problem - how can you encrypt data AND have the keys for the encrypted data in the same place. The main reason for encryption is so that if your store gets compromised, the data cannot be read. However, if your key is also in the store, then when your compromised, your key is compromised and your encryption becomes a mute issue. If on the other hand you handle the encryption/signing at the application level, you can separate your key store and data store so that compromise of one doesn't also compromise the other. This of course does create other issues - most notably being that now you have an additional mandatory layer between you and your data (for example, you can use psql to query your database, but all you can see is encrypted objects. In your case, this may not be as big an issue because you state you want to sign rather than encrypt. You could, for example, design your application so that the data is in one column and the signature is in the other (or use json or other 'object' types that allow attributes/properties). This would allow simple querying of the data and verification of data integrity to be performed as separate operations. All you then need to ensure is that every time data is modified, a new signature is generated. I would also verify you really do need full cryptographic signing rather than just some less rigid integrity verification, like a basic checksum hash. Crypto signing is most useful when you want to both verify the integrity of something and it's source. i.e. this data has not been changed and was signed by X. In this case, each source is encrypted/signed with a different private/secret key. If on the other hand you just want to know that the data has not been modified, you can generate a checksum/hash of the data when it is inserted/updated and store that in a separate column. This data may or may not be encrypted depending on your use case. In this situation, you only need one key, the key used to encrypt the column or no keys if you don't actually need to encrypt it. While non-encrypted checksums is not as secure, not all applications need that level of security. In some cases, having a separate checksum is sufficient. If someone wants to modify the data 'unofficially', in addition to compromising the db and modifying the data, they have to also generate a new hash of the modified data and store that in the appropriate column. If you just want to protect against accidental modification of the data or have reasonable confidence (for some measure of reasonable), just having a checksum hash may be sufficient. big advantage with the simpler case with no actual data encryption is that other clients can access/use the data and not require access to the encryption key. -- Tim Cross
Re: AW: Linux Update Experience
Zwettler Markus (OIZ) writes: > Hi Marco, > > > > How do you handle these conflicts? No longer updating that regularly or not > at all anymore? > Not doing the updates is a poor option due to the potential security vulnerabilities this may lead to. Likewise, delaying the application of updates is going to increase risks as well. In fact, we have found such approaches can make the situation worse. Delaying updates tends to result in more updates being applied at once, which makes it harder to identify problems when they do occur. I think the best approach is to apply updates as soon as possible. Apply the updates to a test or uat environment (or your dev environment if you don't have a test/uat/staging one). If there are issues, resolve them before applying the updates in prod. We have found it rare for updates to be an issue if your running the packages from the distribution. Problems seem to be more common when your running packages sourced from an external repository which may lag behind changes made by the distribution. When issues do occur, we look at the type of update e.g. security, bug fix, bump in dependency versions, new version etc and make a call as to the priority and respond accordingly. This may mean delaying applying the update, actively working to resolve the issue with investigations and debugging, raising an issue/bug with the package maintainers etc. We also classify all our systems, services, databases etc according to whether they are core business processes or supporting processes. We apply updates to supporting systems before core systems. This also affects when we apply updates. For example, we would not apply updates to a core system on Friday afternoon. In fact, we may apply updates to core systems outside main business hours. If issues are encountered when applying updates to core systems, resolution of those issues are highest priority. For secondary systems, we are more likely to do the updates during business hours, will accept longer outages/down times and may not make resolution of issues the highest priority.
Re: Oracle vs. PostgreSQL - a comment
er the bugs you seem to have unless I wondered off into their 'add-ons'. Sticking with the base RDBMS, I found it to be pretty solid and reliable. However, I prefer Postgres. The main reason is that with Oracle, you really need to choose one road or the other - either be a developer or be a DBA. This is especially true with the introduction of things like DAtaGuard, GoldenGAte etc. Oracle takes a lot more administration than Postgres and there is a lot of knowledge to stay on top of. The oracle installation process is horrible. In addition to all the library crap, you also commonly run into bugs in their scripts. Worse still, some of those bugs have been there for 20 years and are just 'known' issues experienced DBAs deal with. Their documentation site is also horrible. As a developer, provided you stick with basic database functionality, i.e. SQL and PL/SQL and avoid their extensions/add-ons, like their various pub/sub, rules engine, PSP or anything they have obtained by purchasing a company and 'integrating' it, it is pretty good. I think they have one of the best locking models out there. The way they handle indexes and updates is also much faster than postgres and you have to worry less about the structure of your queries with respect to performance. Still, I prefer Postgres. The main reason is that although I may need to think about how I structure queries, updates and indexes a bit more, on the whole, it gets out of my way and does what I want - provide a reliable data store that I can use and get the job done without having to spend hours caught up in DBA tasks. Updates are easy and the basic architecture is easy. The biggest challenge when migrating from oracle to postgres is recognising they are completely different and while they may both provide a compliant SQL implementation, the similarities stop there. If I'm in an environment where someone else is responsible for all the DBA stuff, Oracle is nice to work with. However, you tend to only be in that situation when your working in a large, usually bureaucratic, environment, which tends to detract from the whole experience in other ways. If your unlucky enough to also be using any of the Oracle 'value add' extensions, development frameworks, application layers etc, it is really horrible and mind numbing. apart from this, Oracle licensing is an absolute nightmare. Apart from the expense, the complexity is unbelievable and it is almost impossible to know with any certainty what you will be paying in 12, 24 or more months. -- Tim Cross
Re: Oracle vs. PostgreSQL - a comment
Paul Förster writes: > and then, some day, a developer approaches a DBA with a query which is > generated and, if printed out in a 11pt. sized font, can fill a billboard on > a street, to optimize it or search for what's wrong with it, or why it > performs so slow... That's usually when I play BOFH because I'm not willing > to debug 10 pages which its creator hasn't even cared to take a look at > first. :-P :-) > > Same goes for the app guys sending me 10 MB of Java stack trace by email > containing one single line of ORA-x. They should send only that line > along with a (approximate) time when it occurred. If I get the full stack > trace, I send it back to them telling them they should come back when they > find the line containing the ORA message. They usually don't come back > because they don't know how to grep. :-) Some do, though, and those are the > ones I try to help. > >> If I'm in an environment where someone else is responsible for all the DBA >> stuff, Oracle is nice to work with. > > yes, leave the cardiac arrest to us DBAs. :-P > Yes, even after longer time doing Oracle, I still never felt as comfortable or across things as much as I do with PG. Started with Oracle 7 and stayed until 11g and each year, it got worse rather than better. After working as a DBA, I know exactly what you mean. Sometimes, DBA has to equal "Don't Bother Asking". As a developer, I have to admit being somewhat embarrassed by the frequently poor understanding amongst many developers regarding the technology they are using. I've never understood this. I come across developers all the time who are completely clueless once outside their IDE or editor. Too often, they have little understanding of the hosting environment, the base protocols they are using, the RDBMS or even basic SQL. I don't understand how you can develop anything of quality if you don't have a thorough understanding of all the technology involved. I'm probably just a dinosaur - I also prefer VI and Emacs as my primary development environments and will use psql and sqlplus before Taod, pgAdmin, sqlDeveloper etc. Tim P.S. for moving Oracle databases, we use to just use sed and change the paths in the control file. Worked remarkably well. Often used this technique to 'refresh' our dev or testing systems to current prod data. -- Tim Cross
Re: Should I enforce ssl/local socket use?
Michel Pelletier writes: > Hello, > > I'm the author of the pgsodium cryptography library. I have a question > about a best practice I'm thinking of enforcing. Several functions in > pgsodium generate secrets, I want to check the Proc info to enforce that > those functions can only be called using a local domain socket or an ssl > connection. If the connection isn't secure by that definition, secret > generating functions will fail. > > If someone really wants to point the gun at their foot, they can connect > with an unsecured proxy. My goal would be to make bypassing the check > annoying. > > Any thoughts? Is this an insufferably rude attitude? Are there scenarios > where one can foresee needing to generate secrets not over ssl or a domain > socket? > I'm never very fond of enforcing a particular behaviour as it assumes we understand all environments and use cases. Far better to make this the default behaviour, but allow users to disable it if they want and clearly document that option as insecure. I also suspect that without the ability to somehow disable the checks, people will find elaborate ways to work around them which are almost certainly going to be even worse from a security perspective. -- Tim Cross
Re: Table partitioning with sequence field in postgresql12
Srinivasa T N writes: > Hi, >Partitioning of a table with sequence id as one of its fields is > supported in postgresql12? > > Regards, > Seenu. A sequence is really just an 'atomic' number generator, you get the next value, which is guaranteed to be larger than the last 'nextval' (up until maxvalue). It is unaware of the use i.e. whether it will be used in a insert or what table that insert is against. So I'm not sure what your concern with a partitioned table is? Can you elaborate? -- Tim Cross
Re: Table partitioning with sequence field in postgresql12
Srinivasa T N writes: > Hi, >I have a parent table with one of the field as ""gid" int4 DEFAULT > nextval('"ami_smart_new".aoi_boundary_gid_seq'::regclass)". > >I create child tables which inherit parent and use hash partition. When > I directly insert into child tables, will there be any race condition > causing two child tables getting the same sequence value for gid? > Assuming all inserts use the default e.g. nextval from the same sequence, you won't get duplicates. You could get a duplicate if an insert sets an explicit value for gid of course or if rows in any table were inserted with a gid which was not obtained from the same sequence using nextval i.e. parent and children use same sequence. The sequence is just a counter with the property that no two calls to nextval from that sequence will have the same value. You cannot make any additional assumptions e.g. cannot assume gid values will be inserted in order or there won't be 'gaps ' etc. -- Tim Cross
Re: Persistent Connections
Bee.Lists writes: > I have an issue with a server (v10) that’s seeing increasing connections > until it’s maxxed-out. > > max_connections for my 4-core server is set to 12. > > I’ve installed pg_stat_activity and pg_stat_statements. > > I access this server through a web app, and another client on two machines. > I also send queries through SSH on the LAN. psql queries indicate there are > too many clients already. I’ve been getting errors from my workstation > through a Ruby gem that says dropped connections happen. Essentially, the > server isn’t giving up connections and clients from all sides are showing > some issues. > > pg_stat_activity has shown a handful of idle queries that are quite simple. > I’ve tested those queries and they seem fine on a workstation client. I’m > assuming these queries somehow don’t finish and leave the connection open, > but I could be wrong. All of this is new to me. > > It was suggested on Slack that it sounds like my workstation had some TCP > issues with these connections, and that it was a firewall. I disabled the > firewall and the same happened. The firewall has been restarted. > > I am running no connection pool tool of any sort, and I’ve never used one. > > At this point I don’t know what to even be watching for. Connections > increase and connections “drop”, according to the gem I’m using. I have > simple queries that are idle and won’t disappear for some reason. > > How can I figure this out so the connections remain within the > max_connections limit, and connections are not dropped? > > Any insight appreciated. > Sounds like your web app may not be closing connections once it has finished with them? The fact your seeing idle connections would seem to support this. I would be verifying there isn't a code path in your client application which is failing to close a connection correctly. Seeing connections go up and down in the gem may not be telling you the full story - could be that your client connection objects are being destroyed in your app, but are not closing the connection correctly so PG is unaware the client has finished. Typically, due to the overhead of making a connection, you don't want your client app/web app to create a new connection for every query. Instead, you would use some type of connection pool. Many development languages support some form of pooling as part of their db connection library (don't know about Ruby, but JS, Python, PHP, Java, Perl all do) and there are external 3rd party solutions like pgbouncer which sit between your client and the database and work a bit like a connection broker which will manage a pool of connections. >From the description of what your doing, I would first look to see what level of connection pooling your development language supports. This will likely be easier to configure and use than setting up a whole additional bit of infrastructure which is probably far more powerful than you need. I would also go through your code and make sure that every time you create a database connection, there is some code which is closing that connection once your finished with it. This is a frequent source of problems, especially during development when your code might throw an error and the code you have to disconnect does not get executed or you simply forget to issue a disconnect when your finished. A connection pool can help in tracking down such issues as well. Most pooling solutions will allow you to set a max pool size. In addition to enabling you to 'reserve' a set number of connections for a client, you will know which client seems to be running out of connections, helping to identify the culprit. -- Tim Cross
Re: UPDATE on 20 Million Records Transaction or not?
Jason Ralph writes: > Hello List, > PostgreSQL 11.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 > (R > ed Hat 4.4.7-23), 64-bit > > I am planning an update on a table with 20Million records, I have been > researching the best practices. I will remove all indexes and foreign keys > prior to the update, however I am not sure if I should use a transaction or > not. > My thought process is that a transaction would be easier to recover if > something fails, however it would take more time to write to the WAL log in a > transaction. > > Would it make sense to make a back up of the table then execute update > without a transaction? How would you guys do it? > This is really the sort of thing you need to test in a development environment. There are just too many unknowns to provide a definitive answer. You need to run this with minimal 'adjustments' in a dev scenario to get a baseline and then decide what, if anything, you need to do. Just because you have 20M records, it doesn't mean that you have to do anything 'special'. A lot will depend on how many of that 20M need to be updated, the size of the records and fields being updated, how the updated records are spread through the table, what other processes are updating or querying the table during the operation, which indexes are used and which are unnecessary or just update overhead etc, etc. Trying to optimise this operation without knowing where the bottlenecks are is extremely difficult. Unless you know that every one of the 20M records will be updated, I wouldn't rush into removing all indexes. I would definitely look at breaking up the update into smaller transactions. As to the question about whether to use a transaction or not, it really depends on your data and how easily you can recover from a partial transaction. Assume the update is going to be interrupted before completion. If you can recover from that situation easily, then not using a full transaction with commit/rollback may be acceptable. otherwise, I would avoid it. At any rate, baseline first as you may be surprised. I was recently working on an application using PG 9.6 which had numerous web clients and a backend process which ran multiple times a day and which updated millions of rows each run in a table with over 100M records. The full update would take around 20 - 30 min, but consisted of multiple transactions. The data was part of a weather information system used for predictive modelling. The data consisted of rainfall, max/min temp, humidity and vapour pressure for 9am and 3pm and solar radiation for grid points 5km apart covering all of Australia. The table had records for every day for the last 4+ years and as yet, does not use partitioning. This is a lot of records, but each record is quite small. While at some point, we will need to look at additional optimisations like partitioning, the performance is currently within acceptable limits. The only optimisation we have done is basic db tuning. When I did this tuning, performance saw a significant improvement, but none of it was specific to this large table or the update process. The key point is that large numbers of records doesn't necessarily mean that an update will be slow (or more specifically too slow for acceptable performance, whatever that is). Assuming you will need to take lots of special action may be premature - you may need to do none or only a couple of things. Get a baseline first and you will know how big of an issue you have. You will also be able to determine if what you try has any benefit. -- Tim Cross
Re: Persistent Connections
Bee.Lists writes: >> On Jun 23, 2020, at 8:09 PM, Tim Cross wrote: >> >> Sounds like your web app may not be closing connections once it has >> finished with them? The fact your seeing idle connections would seem to >> support this. I would be verifying there isn't a code path in your >> client application which is failing to close a connection correctly. Seeing >> connections go up and down in the gem may not be telling you the full story >> - could >> be that your client connection objects are being destroyed in your app, >> but are not closing the connection correctly so PG is unaware the client >> has finished. > > Hi Tim. I can’t speak for the gem. I’m assuming its garbage collection is > working. But yes, it does look that way. I found someone else who was > having similar issues as myself: > > https://stackoverflow.com/questions/60843123/djangopostgres-fatal-sorry-too-many-clients-already > > I’m also seeing the connection count rise overnight from crontabs. > > For some clarity, the gem is Sequel, which is on top of the PG gem (Ruby). > I’ve spoken to the Sequel author and he says everything is fine. I have some > evidence it’s a connection issue and the gem is doing its job, as I’m seeing > it’s happening elsewhere with crontabs and other clients. > Rather than a problem with the libraries, I would be looking for a problem with the code which uses those libraries. If it was a problem with either the Sequel or PG gems (or with Postgres for that matter), it would be a lot more wide spread and you would be seeing a lot more reports. I'm not familiar with Sequel and haven't used Ruby for nearly 20 years, but have used plenty of other PG libraries. You mention garbage collection and I'm sure that is working fine in Ruby. However, you cannot rely on that to correctly cleanup your PG connections. Somewhere in your code, there has to be code the developer writes which tells the library you are finished with the connection. For example, the JS PG package has the command 'disconnect'. Essentially, your code needs to tell the remote PG server you have finished with the connection so that it knows it can clean up things on its end. If your code is not issuing explicit disconnect commands, what is happening is that the connection on your client side are being cleanup when the connection object goes out of scope and the garbage collector kicks in. However, on the PG side, the connections hang around until PG times them out, which takes much longer and could easily cause you to hit the limit, especially as you have such a very small limit. I think you need to verify that in all your client code, somewhere there is explicit code being called which is telling PG you are disconnecting the connection. Don't assume this is automagically happening as part of GC. > >> Typically, due to the overhead of making a connection, you don't want >> your client app/web app to create a new connection for every query. >> Instead, you would use some type of connection pool. Many development >> languages support some form of pooling as part of their db connection >> library (don't know about Ruby, but JS, Python, PHP, Java, Perl all do) >> and there are external 3rd party solutions like pgbouncer which sit >> between your client and the database and work a bit like a connection >> broker which will manage a pool of connections. > > That’s why I’m thinking installing a connection pooler would solve all of > this. pgbouncer is what I’m looking at now. > I doubt this will solve your problem. It might hide the problem or it might lengthen the time between failures, but it is very unlikely to solve the problem. It may help identify the source of the problem. Have you verified the PG gem doesn't support pooling? If it does, that would be far easier to configure and use than installing pgbouncer. Based on your description of the app and the small number of connections you have PG configured for, adding pgbouncer is like putting a fighter jet engine in a family SUV. >> From the description of what your doing, I would first look to see what >> level of connection pooling your development language supports. This >> will likely be easier to configure and use than setting up a whole >> additional bit of infrastructure which is probably far more powerful >> than you need. > > I will do that. This is all new. > >> I would also go through your code and make sure that every time you >> create a database connection, there is some code which is closing that >> connection once your finished with it. This is a frequent source of >> problems, especially during development when your code might throw an >> error and the code you hav
Re: Persistent Connections
Peter J. Holzer writes: > On 2020-06-24 13:55:00 -0400, Bee.Lists wrote: >> On Jun 24, 2020, at 6:47 AM, Peter J. Holzer wrote: > Does "I have 37 queries" mean you have seen 37 queries of this type in > some time window (e.g. the last day or hour) or does it mean you are > currently seeing 37 connections where the last query was of this type? > > If it's the latter, you very obviously have at least 37 (more likely > 37 + 5 = 42) connections. So you web app is configured to open dozens of > connections concurrently. You might want to look into that. > I just had another thought. Based on a VERY brief scan of the Sequel API, I suspect it uses a connection pool by default. So it is quite likely that the expectations on when the connections are closed is incorrect. It could easily be that the web app creates a connection pool as soon as it is started and keeps that pool open until either the web server is closed down or a pool timeout kicks in (some connection pools use a max lifetime setting for connections and will close a connection after that period, replacing it with a new connection). It is also quite likely that the Sequel GEM creates a connection pool with a default number of connections if not explicitly defined by the developer. This default could be close to or even exceed the number set for max connections within PG (especially as the OP has indicated it is very small). I have to say, I do hate ORMs. They always reduce the flexibility and power offered by SQL, tend to result in code where more processing is done in the client which would have been faster and more efficiently done by the SQL engine and hides details which make troubleshooting even harder. However, the worst issue is that it also results in developers not understanding the power of the underlying RDMS and encourages poor DB schema design. Those who support such technologies typically point to the benefits of database neutrality such systems can provide. In over 30 years of DB work, I have yet to see such neutrality actually work. It is a pipe dream. -- Tim Cross
Re: Efficiently advancing a sequence without risking it going backwards.
Christopher Browne writes: > On Thu, 9 Jul 2020 at 12:59, Jeremy Schneider > wrote: > >> >> > On Jul 6, 2020, at 19:06, Paul McGarry wrote: >> > >> > I don't think I can use setval(), because it risks making sequences go >> backwards, eg: >> > >> > 1) Check values >> > DB1sequence: 1234 >> > DB2sequence: 1233 (1 behind) >> > 2) setval('DB2sequence',1234); >> > >> > but if between (1) and (2) there are 2 nextval(DB2sequence) calls on >> another process, (2) would take the sequence back from 1235 to 1234 and I >> would end up trying to create a duplicate key ID from the sequence. >> >> An ability to “lock” the sequence momentarily would give you the tool you >> need, but I don’t think it’s there. >> >> Total hack, but if your application or users can retry when the rare error >> is encountered then one idea is to rename the sequence momentarily while >> you do the setval() then rename it back. Do an initial check without >> renaming, then re-check after renaming and before the setval() call. >> >> If you put retry logic into your application then make sure to include >> back-off logic so you don’t get an outage induced by thundering herd. >> > > This is increasingly looking like a set of attempts to intentionally abuse > what sequences were designed for. > > The use-case where you need a lock on the value so that there can't > possibly be a hole in the sequence points at the notion of having some > other kind of a function that takes out a lock on a table, and serially > gives out "MAX+1" as the next value. > > That isn't a very difficult function to write; the problem with it is that > that sort of function will forcibly serialize all inserts through the > function+table lock that is giving out "MAX+1" values. That's going to be > WAY slower than using a sequence object, and about 98% of the time, people > will prefer the sequence object, particularly because it's about 98% faster. > > I'm not quite sure if anyone has put out there a standard-ish idiom for > this; that seems like a not TOO difficult "exercise for the user." > > There will definitely be more failure cases, and *wildly* more fighting, in > a concurrent environment, over tuple locks. > > - An obvious failure is that if one connection asks for the new MAX+1, gets > it, and then the transaction fails, for some later, out-of-relevant-scope, > reason, you'll still potentially get some "holes" in the series of values. > > - If there are 10 connections trying to get MAX+1 concurrently, only one > can get it at a time, and that connection can't relinquish the lock until > its transaction has completed, and the 9 must wait, regardless of how much > work the "winner" still has to do. > > These are amongst the reasons why people conclude they *don't* want that > kind of functionality. > > It makes me think that the problem needs to be taken back to that initial > point of "I think I need some somewhat coordinated sequences", and poke at > what the *real* requirement is there, and why someone thinks that the > values should be "somewhat coordinated." Something seems off there. I agree and was going to write something similar. All the 'solutions' are problematic in one way or the other and seem to be due to a misconception about the role for sequences or some requirement which needs to be re-examined. -- Tim Cross
Re: Issue with timestamp without time zone datatype with default value as now()/localtimestamp
Deepika S Gowda writes: > Hi, > > On postgres 11.7 Master/Slave node, there is column named "createddate" > with datatype "timestamp without time zone" with default value as "now()"; > > Column Name | Date Type | Default value > createddate |timestamp without time zone|Now() > > > Issue: From the java application , data is getting loaded into this table > where we expect column value should be today's date with timestamp( > "2020-07-10 10:56:43.21"). But, out of 3K records, 100 records are loaded > as "2019-07-10 10:56:43.21" (change in Year). > > What could be the issue? we tried changing the default value to > "localtimestamp". > My bet would be you have some SQL statements which include a value for 'createddate', so the default is not being used. -- Tim Cross
Re: How bad is using queries with thousands of values for operators IN or ANY?
Thorsten Schöning writes: > Hi all, > > I have lots of queries in which I need to restrict access to rows > using some decimal row-ID and am mostly doing so with using the > operator IN in WHERE-clauses. Additionally I'm mostly embedding the > IDs as ","-seperated list into the query directly, e.g. because I > already hit a limitation of ~32k parameters of the JDBC-driver[1] for > Postgres. > > I really thought that in most cases simply sending a large amount of > IDs embedded into the query is better than looping, because it safes > roundtrips to access the DB, the planner of the DB has all pieces of > information it needs to decide best strategies etc. OTOH, with recent > tests and an increased number of IDs of about factor 100, I have > additional load in Tomcat before actually sending the query to the DB > already and in the DB itself as well of course. I've attached an > example query and plan. > >> -> Hash (cost=242592.66..242592.66 rows=6825 width=39) (actual >> time=91.117..91.117 rows=40044 loops=3) >> Buckets: 65536 (originally 8192) Batches: 1 (originally 1) Memory >> Usage: 3016kB >> -> Hash Join (cost=137.57..242592.66 rows=6825 width=39) (actual >> time=10.194..82.412 rows=40044 loops=3) >> Hash Cond: (meter.meter_bcd = meter_bcd.id) >> -> Index Scan using pk_meter on meter (cost=0.42..242237.10 >> rows=40044 width=25) (actual time=9.350..71.276 rows=40044 loops=3) >> Index Cond: (id = ANY ('{[...]}'::integer[])) >> -> Hash (cost=135.73..135.73 rows=113 width=22) (actual >> time=0.830..0.830 rows=113 loops=3) > > Do you know of any obvious limitations of the JDBC-driver of handling > such large queries? In the end, the query is mostly large text with > only very few bind parameters. > > Do you know of any obvious problem in Postgres itself with that query, > when parsing it or alike? Do things simply take how long they take and > are mostly comparable to looping or is there some additional overhead > the larger the query itself gets? From my naive expectation, comparing > IDs shouldn't care if things get looped or transmitted at once. > > I'm just trying to collect some input for where to look at to optimize > things in the future. Thanks! > > [1]: https://github.com/pgjdbc/pgjdbc/issues/90 > > Mit freundlichen Grüßen, > > Thorsten Schöning It would help to see the query as well as the plan. Where are these 100s of IDs coming from? I sometimes find this a sign you could be re-structuring your query to be a join between two tables where one table contains the IDs of interest rather than trying to embed them into the query as part of a where clause. -- Tim Cross
Re: How to keep format of views source code as entered?
Adrian Klaver writes: > On 1/8/21 12:38 AM, Markhof, Ingolf wrote: >> Thanks for your comments and thoughts. >> >> I am really surprised that PostgreSQL is unable to keep the source text >> of a view. Honestly, for me the looks like an implementation gap. >> Consider software development. You are writing code in C++ maybe on a >> UNIX host. And whenever you feed you source code into the compiler, it >> will delete it, keeping the resulting executable, only. And you could >> not even store your source code on the UNIX system. Instead, you'd be >> forced to do so in a separate system, like GitHub. Stupid, isn't it? >> Right. There are good reasons to store the source code on GitHub or >> alike anyhow. Especially when working on larger project and when >> collaborating with many people. But in case of rather small project with >> a few people only, this might be an overkill. > > The projects I work on are my own personal ones and I find an > independent version control solution the way to go for the following > reasons: > > 1) It is easy. > a) mkdir project_src > b) cd project_src > c) git init > Now you are set. > > 2) Even my simple projects generally have multiple layers. > a) Database > b) Middleware > c) UI > And also multiple languages. It makes sense to me to keep all > that information in one repo then having each layer operate independently. > > 3) It allows me to work on test and production code without stepping on > each other. > > 4) It serves as an aid to memory. Answers the question; What was I > thinking when I did that? More important it helps anyone else that might > have to deal with the code. > > FYI, the program I use to manage database changes is > Sqitch(https://sqitch.org/). > >> This is essentially my workflow as well. I have even used sqitch too. While this has worked well for my projects, attempts to introduce the discipline necessary to use such a workflow in a team has largely failed. This seems to be due to 2 main reasons - 1. Lack of SCCM support built into common tools. There are very few tools which have version control support built in (I believe the jet brains product does). In particular, pgAdmin would benefit here (maybe pgadmin4 does, I've not tried it in a few years). 2. Poor SCCM and DBMS Understanding. Despite it being 2021 and both version control and databases being two very common technologies you need to interact with as a developer, I'm still surprised at how poorly many developers understand these tools. I still frequently come across really bad workflows and practices with version control and code which uses the database as little more than a bit bucket, which re-implement searching and sorting at the client level (and then often moan about poor performance issues). My editor has good support for psql and psql has always been my goto tool for PG. As my editor also has good git support, my workflow works well. However, most people I've worked with prefer things like pgadmin. Tom Lane responded in this thread to point out some of the complexities which make it difficult to maintain current code source within the database itself. This is definitely something which should be kept in version control. The problem is, if your tool does not support the version control system, it is too easy to forget/bypass that stage. When you use something like pgadmin, it is far too easy to modify the source definitions in the database without ever updating the sources on disk in the version control working directory and the changes get lost. The other big challenge is dependency management. Keeping track of what is affected by a change to a table definition can be a challenge within a complex system. I've yet to find a good solution to that issue. It is probably something which needs to be built into a tool. In the past, I've used a modified sqitch approach that also maintains a small 'dbadm' schema containing metadata to track dependencies. Although this worked OK, especially if you understood how all the bits fit together, it still had many corner cases and to some extent highlighted the complexities involved. -- Tim Cross
Re: How to keep format of views source code as entered?
oracle, so you can load the code and run it without having to leave your editor. Having the code in the database can be useful. I've used this in oracle to provide enhanced debugging support when developing stored procedures and packages and enhanced reporting in unit testing. However, you really don't want this to be the definitive master for your source code. Your source code benefits from being in a location which makes backup, restoration and migration easy/fast, where changes can be tracked, analysed and rolled back, where re-factoring can use advanced tools and can work across projects, not just on a single script, where code sharing is easy and where people can use their preferred tool rather than being forced to use something which understands the database. -- Tim Cross
Re: Do we need a way to moderate mailing lists?
Hemil Ruparel writes: > Exactly my point. We need to raise the bar of the behavior we tolerate. > This should not be tolerated. We need to set an example. The person in > question clearly understood english and I have never seen a person who > could use mailing lists but not google. So that's out of the question. > > We are not free consultants. And you are not entitled to shit. You are > probably being paid to work on that project. We are not. Your problem. Fix > it yourself. Or at least have to courtesy to google it. > While I can understand your frustration, I disagree with your position. It is too subjective and difficult to apply/adopt such a strong position and could too easily backfire, resulting in a perception of an elitist, unwelcoming and unfriendly community. Banning should be reserved for the most serious and abusive cases. Banning because someone appears to be acting entitled or lazy is hard to assess in a non-bias manner and likely has too much cultural variation to applied consistently. Someone you feel who is being entitled or lazy might be someone I feel is frustrated, may lack good communication and/or social skills or might simply be immature and in need of some guidance and education. My response may also differ depending on my own state of mind and mood at the time when I read the message. I've been on the postgres lists for some years now and to be honest, have not noticed this type of issue very often. There are occasionally rude and lazy individuals who may appear to be acting entitled, but they soon go away. In some respects, the list is self-moderating because people who do act poorly soon get ignored and their messages die out with no responses. The great benefit of lists like these is that you can just ignore anyone you think are rude, entitled or fail to put in the effort you believe is warranted before their question/issue needs attention. Many mail clients will even allow you to 'block' specific senders. I have done this once with someone from a different list. I don't know if they are still behaving badly as now I never see their messages. My advice would be to just delete and move on, a luxury you don't have when you are employed and paid to deal with such messages, which is one reason I don't like or have the temperament to fulfil the difficult service/support desk roles which too often maligned and fail to get the recognition they deserve. Tim
Re: Do we need a way to moderate mailing lists?
Michael Nolan writes: > There's so much garbage in a Google search any more that they're becoming > nearly useless. Between 'sponsored' hits and ones that have little or no > relevance but throw in words to get included, I find as often as not that > IF Google finds what I'm looking for, it'll be several pages in. > There is certainly a bit of 'art' or 'black magic' involved when doing a google to find relevant information and the amount of noise in the signal has certainly gotten worse. I find putting the key terms early in your search string can help. However, when dealing with an unfamiliar topic, knowing what those key terms are can be challenging. This is one reason I rarely tell people to 'just google for the answer' or assume they haven't tried when the answer seems quite obvious and easily found for me. The change I've noticed over the last decade or so is the amount of completely wrong or misleading information that is easily found. I rarely use stack overflow sites these days because too often, the accepted or most popular answer is wrong or gets a result, but in a poor manner that is likely to introduce other issues. The one thing I wish people did was provide clear and concise meta data with the information they post. Often, I find it difficult to know, for example, how old the information is or which version of the software it applies to. When it comes to PG, I think we are very lucky. In general, I find the official documentation to be of the highest quality. Sometimes, I can be a little dense and a few more examples would be useful, but I understand how hard getting the balance between enough examples and concise information can be. It is often in this forum where I find some of the most useful information and ideas. I really appreciate those contributors who not only provide an answer to a question, but also include URLs to other sources which frequently contain more background or details. Pointers to such valuable resources from those more knowledgeable can save hours of googling and wading through ill informed and misguided advice. -- Tim Cross
Re: Do we need a way to moderate mailing lists?
Paul Förster writes: > Hi Adrian, > > I'm on a Mac w/ Big Sur (macOS 11.1) and use Apple Mail. I've been on Macs > since 2003. Apple Mail is simple to use and I love it for exactly that. But > Apple Mail has everything I expect a mail client to have, it does not allow a > couple of things which other mail clients might have. Some people hate it for > exactly that. I set Mail to always compose in plain text but there is no way > of manipulating incoming mails other than that automatic displaying remote > content (HTML links, etc.) can (and should) be turned off. > > So I sometimes resort to either hit cmd-opt-u to see the mail text raw > source, or better yet, just hit reply and then drop the reply after reading. > As I set composing to plain text, it will convert any quoted parts. > Sometimes, I just copy/paste the whole mail for reading over to TextMate, > which is also sub-optimal but obviously also gives me non-proportional font > reading. > > Still, this is somewhat cumbersome as I have to do that for each mail > individually. Thank god, this doesn't happen too often. Yet, it's still > annoying enough. > I've used a number of GUI mail clients, including Apple Mail. However, I find still the fastest, most feature rich and powerful client is the text based client mutt. My other favourite is mu4e (Emacs client). While Apple Mail has reasonable keyboard shortcuts, mutt and mu4e can be fully keyboard driven and both have great (but different) abilities for customisation and dealing with large amounts of mail. The thing I hate most (and there is a lot to hate) with Outlook is the dependence on using the mouse for many operations. Being able to preview, sort, move, delete, messages and threads just using the keyboard makes dealing with mail much easier to deal with. Having a client which can do sophisticated sorting, flagging and searching messages/threads is essential and being able to easily automate where possible really helps. Highly recommend a mutt and imap combination. Your not locked into any particular mail folder format, can still access things via mobile devices and can process messages fast and efficiently. >> That is a learning curve thing. Many people don't know that copy and paste >> exists for terminals/GUI's/etc. Most people, once they are pointed in the >> right direction, will change that habit. That is why I would not advocate >> dropping non plain text attachments. Take this as a teaching moment and >> explain the reason why text is a benefit. > > I guess, they only change their behavior because copying/pasting some text is > easier to do than creating a windowshot with aligning the frame manually, > etc. But whatever the reason, thank god, some people are willing to learn > that if being told. > It is easy to forget the different experience levels and sophistication of users. I once had to help resolve a problem a developer was having with a database. I asked him to send me the exact error message. He moaned and said that was a real hassle. I couldn't understand why he found that so difficult to do. I decided to get him to show me his workflow. When the error occurred, he would take a screen shot of his window, send it to the printer, wait for the printer to send back a PDF and then send the issue with the PDF attached. He was amazed when I showed him all he needed to do was highlight the error message, copy it and paste it into the message. This guy was one of the senior developers on the team. I switched employers a few weeks later. -- Tim Cross
Re: Do we need a way to moderate mailing lists?
Paul Förster writes: > Hi raf, > >> On 17. Jan, 2021, at 02:59, raf wrote: >> >> I once wrote a program to do that very thing: >> >> http://raf.org/textmail/ >> https://github.com/raforg/textmail/ > > thanks very much for the nice offer but I mostly read my Mails on a Mac, > sometimes Windows, but never Linux. I have no mail access on Linux. At home I > use Macs and at work I (have to :-() use Windows as desktops. So textmail is > not an option for me. > There is nothing stopping you from using a text mail program, like mutt, on macOS. -- Tim Cross
Re: ransomware
Marc Millas writes: > Hi, > > I know its quite general. It is as I dont know what approaches may exist. > > Requirement is extremely simple: Is there anyway, from a running postgres > standpoint, to be aware that a ransomware is currently crypting your data ? > > answer can be as simple as: when postgres do crash. > > something else ? > > Marc MILLAS > Senior Architect > +33607850334 > www.mokadb.com > > > Ransomeware tends to work at the disk level rather than the application level. There is too much work/effort required to focus ransomeware at an application level because of the amount of variation in applications and versions, to be profitable. This means any form of detection you may try to implement really needs to be at the disk level, not the application level. While it could be possible to add some sort of monitoring for encryption/modification to underlying data files, by the time this occurs, it will likely be too late (and unless your monitoring is running on a different system, the binaries/scripts are likely also encrypted and won't run as well). The best protection from ransomeware is a reliable, regular and TESTED backup and restoration solution which runs frequently enough that any lost data is acceptable from a business continuity position and which keeps multiple backup versions in case your ransomeware infection occurs some time before it is actually triggered i.e. in case your most recent backups are already infected. Backups should be stored in multiple locations. For large data sets, this can often mean having the ability to take fast filesystem snapshots as more traditional 'copy' approaches are often too slow to perform backups frequently enough to meet business continuity requirements. Bar far, the most common failure in backup solutions is around failure to test the restoration component. I've seen way too many places where they thought they had adequate backups only to find when they needed to perform a restoration, key data was missing. This can greatly increase the time it takes to perform a restoration and in extreme cases can mean restoration is not possible. regular testing of restoration processes is critical to any reliable backup solution. As it is also a good idea to have some sort of testing/staging environment for testing code/configuration changes, new versions etc, it can make sense to use your backups as part of your staging/testing environment 'refresh' process. A regular refresh of your staging/testing environment from backups then provides you with assurances your backups are working and that your testing etc is being performed on systems with data most similar to your production systems. Tim
Re: Encryption of Data Specific to a Tenant in PostgreSQL database | General Idea
Jagmohan Kaintura writes: > HI All, > > For POstgreSQL database to store data for multiple tenants, the approach > decided was to have > Shared Database (Holding data for all tenants) > => Data would be segregated on basis of some additional column > (tennatid,different tenants having different tenantId) >=> Data would be accessed through Views on the basis of tenantId > value. > > This is the basic process of most of the customers who are trying to > implement multiple tenants in PostgreSQL, rather than choosing > separate databases for each tenant. > > Now we need to encrypt the data related to a tenantId, so that now one > knows this data belongs to which tenant even from Operations group. > Is there a method in POstgreSQL for encrypting data with different keys > with respect to different values in a single column. Moreover pg_crypto > will impose a single key on the column. > > Please share your thoughts in which direction i can start analysing this > area for encryption of data specific to a tenant. > The decision to have all tenants in a single database seems rather unusual to me. Isolating one tenant from adversely impacting another would seem complicated and I'm not sure how you would implement a clear security model. Your model has effectively bypassed all the provided PG facilities for isolation of data. Disaster recovery and business continuity planning under this model must be a nightmare! I doubt you can adopt a solution which is solely within the database. How would the database know which key to use for which rows of data? How would you select the data for your tenant views if all that data is encrypted with different keys? How would you manage these keys in a secure manner? With the model you have adopted, I would be looking at performing encryption/decryption at the client level. However, depending on your data types, this could be challenging. this is really a requirement which should have been factored into the initial architecture design. Anything you try to bolt on now is likely to be complex and have significant performance impact and that is assuming you can re-interpret the requirement to make the objective feasible. -- Tim Cross
Re: Insert into on conflict, data size upto 3 billion records
Karthik K writes: > exactly, for now, what I did was, as the table is already partitioned, I > created 50 different connections and tried updating the target table by > directly querying from the source partition tables. Are there any other > techniques that I can use to speed this up? also when we use on conflict > statement for both insert and update does Postgres uses batching internally > (committing for every 1 records etc) or will it update all records at > once, in that case, does it create a version for each record and do swap > all at once? I'm wondering how atomicity is guaranteed, also if I have to > do batching other than selecting from individual partitions does doing it > batches of 1 records help? > I have had pretty good success with the following strategy. However, you will need to profile/test each assumption as things vary greatly depending on data and table structure. A bit of trial and error is usually required. 1. Use the \COPY command to upload the batch data into a temporary table or at least a table with logging turned off 2. Run a stored procedure which first does updates for existing rows then one which does inserts for non-existing rows into your final table 3. If your table is partitioned, pre-process your batches into separate batches that are divided by the partition key, so instead of one big batch, multiple smaller batches. If this is not possible, break your upload up into multiple batches rather than one huge batch. 4. Optimise the update/insert statement to suit your data and table structure, dropping any unnecessary indexes and re-building them once finished (the whole upload). Note that this will need profiling as depending on the index and index structure, dropping and re-creating can be overall slower than leaving index in place. 5. Determine best times to run analyze to update table stats. Probably want to do it after each update and insert run, but sometimes, may be overall faster to just do it after each 'job' (update + insert). 6. don't forget to check the logs and watch for WAL writes being too frequent etc. Often things are tuned for 'normal' (outside bulk uploads) and are very poor for the bulk uploads. Need to make sure it is the right balance.
Re: checkpointer and other server processes crashing
Joe Abbate writes: > Hello, > > We've been experiencing PG server process crashes about every other week > on a mostly read only website (except for a single insert/update on page > access). Typical log entries look like > > LOG: checkpointer process (PID 11200) was terminated by signal 9: Killed > LOG: terminating any other active server processes > > Other than the checkpointer, the server process that was terminated was > either doing a "BEGIN READ WRITE", a "COMMIT" or executing a specific > SELECT. > > The database is always recovered within a second and everything else > appears to resume normally. We're not certain about what triggers this, > but in several instances the web logs show an external bot issuing > multiple HEAD requests on what is logically a single page. The web > server logs show "broken pipe" and EOF errors, and PG logs sometimes > shows a number of "incomplete startup packet" messages before the > termination message. > > This started roughly when the site was migrated to Go, whose web > "processes" run as "goroutines", scheduled by Go's runtime (previously > the site used Python and Gunicorn to serve the pages, which probably > isolated the PG processes from a barrage of nearly simultaneous requests). > > As I understand it, the PG server processes doing a SELECT are spawned > as children of the Go process, so presumably if a "goroutine" dies, the > associated PG process would die too, but I'm not sure I grasp why that > would cause a recovery/restart. I also don't understand where the > checkpointer process fits in the picture (and what would cause it to die). > A signal 9 typically means something is explicitly killing processes. I would check your system logs in case something is killing processes due to running out of some resource (like memory). If it is a fairly recent Debian system, journalctl might be useful for checking. -- Tim Cross
Re: Syntax checking DO blocks and ALTER TABLE statements?
David G. Johnston writes: > On Tue, Feb 16, 2021 at 3:43 PM Ron wrote: > >> >> How does one go about syntax checking this? >> >> (There are 222 ALTER TABLE ADD FOREIGN KEY statements that I'm wrapping in >> similar DO blocks, and want to make sure the statements are clean.) >> >> > Begin a transaction, execute the DO, capture an error if there is one, > rollback the transaction. > As David points out, wrapping the whole thing in a transaction will at least guarantee it all succeeds or it is all rollled back. This can be frustrating if the statements are slow and there are a lot of them as it can result in a very tedious do-run-fix cycle. Something which can help is using an editor with good font highlighting and parsing support. One interesting area I've not yet looked at is the development of LSP (Language Server Protocol) servers for SQL. I've used LSP for other languages with great success. The challenge with databases is that there is enough variation between different vendor implementations to make accurate parsing and validation tedious to implement, so most solutions only focus on ANSI compliance. Still, that can be very useful. See https://github.com/lighttiger2505/sqls for one example of an LSP server for SQL and https://microsoft.github.io/language-server-protocol/ for more background on LSP and what it can provide. Many editors, including VSCode, VI, Emacs, TextMate etc now have some support for LSP. -- Tim Cross
Re: Syntax checking DO blocks and ALTER TABLE statements?
David G. Johnston writes: > On Tue, Feb 16, 2021 at 4:28 PM Tim Cross wrote: > >> >> David G. Johnston writes: >> >> > On Tue, Feb 16, 2021 at 3:43 PM Ron wrote: >> > >> >> >> >> How does one go about syntax checking this? >> >> >> >> (There are 222 ALTER TABLE ADD FOREIGN KEY statements that I'm wrapping >> in >> >> similar DO blocks, and want to make sure the statements are clean.) >> >> >> >> >> > Begin a transaction, execute the DO, capture an error if there is one, >> > rollback the transaction. >> > >> >> As David points out, wrapping the whole thing in a transaction will at >> least guarantee it all succeeds or it is all rollled back. This can be >> frustrating if the statements are slow and there are a lot of them as it >> can result in a very tedious do-run-fix cycle. >> >> > I do presume that someone wanting to test their code in this manner would > be doing so in a test environment and an empty database. Which makes the > execution time very small. > True. However, it never ceases to amaze me how many places don't have such environments. Far too often, my first task when commencing a new engagement is to sort out environments and procedures to manage change. > I personally would also solve the "lot of them" problem by using dynamic > SQL, so one pretty much only has to test the code generator instead of all > the actual executions - which can simply be confirmed fairly quickly once > on a test database without the need for transactions. > Given the number, I think I would do the same. A good example of why being 'lazy' can be a virtue. Faster and easier to write a procedure to generate dynamic SQL than write out all those alter statements manually or even write it using a scripting language and ODBC if there is sufficient variation in the statements to make writing it in plsql 'messy'. -- Tim Cross
Re: Syntax checking DO blocks and ALTER TABLE statements?
Ron writes: > On 2/16/21 5:44 PM, Tim Cross wrote: >> Given the number, I think I would do the same. A good example of why >> being 'lazy' can be a virtue. Faster and easier to write a procedure to >> generate dynamic SQL than write out all those alter statements manually >> or even write it using a scripting language and ODBC if there is >> sufficient variation in the statements to make writing it in plsql >> 'messy'. > > In my case, the statements are generated by Ora2Pg, and the DO blocks are > generated by a bash script I wrote. Input data can be messy, so want to > verify things before running. > > Sure, vim is great at highlighting some problems, but certainly not all. > > What I'm really looking for the PostgreSQL version of SQL Server's Ctrl-F5: > it runs just the parser and then stops, Sounds like exactly what LSP aims to provide. Don't know if the LSP SQL servers available are mature enough yet, but that is definitely the objective. Real benefit is that it is editor agnostic. Once your editor has LSP support, all you need to do is configure the server details and you get parsing, completion, re-factoring, definition lookup etc. -- Tim Cross
Re: Script checking to see what database it's connected to
Rob Sargent writes: >>> >>> Take it up a notch? Write a script which takes the dbname and the >>> script name: >>> >>> /pcode/ >>> >>> #!/bin/bash -e >>> if [[ $# -ne 2 ]]; then echo "missing arg(s)"; exit 2; fi >>> dbn=$1; shift; >>> sql=$1; shift; >>> psql --dbname $dbn --file $sql >>> >>> /pcode/ >> >> I thought of that, yet so earnestly want avoid Yet Another Tiny Script. >> > Isn't it a toss-up with putting the check in every sql script? > Or make it /really/ fancy: use proper arg parsing; check for existence > of the sql script; add a usage function; split stdout/stderr... No end > of fun. that would be my approach. A general purpose 'launcher' script that does argument checking, logging and reporting. Stick it in your bin directory and then call that instead of psql directly. Now all your SQL scripts are just DDL/DML statements. Nice thing is you can do it in whatever scripting language your most comfortable with - bash, perl, ruby, python, whatever and it is available for whatever project your working on. -- Tim Cross
Re: Oracle vs PG
Ravi Krishna writes: >> Again, pretty much content-free. For all you know some application was >> creating savepoints, needlessly: > >> https://www.postgresql.org/docs/10/static/sql-savepoint.html > > I have hardly used savepoints in any application, but if I understand it > correctly, isn't it something which is typically used > in a persistent connection. I wonder how it is applicable in a web based > stateless application like Amazon.com, unless > even web based application have database level state. No, savepoints and persistent connections are not necessarily related. Savepoints are really just a way of managing rollback segments. For example, if you were doing a large number of inserts/updates, things can become slow if the rollback segment grows really large. One way around this is to set savepoints, which will allow you to commit more frequently and prevent the rollback size from growing too large (there are other benefits as well, such as allowing other transactions to see partial changes sooner rather than not seeing any change until after a long running insert/update has completed etc). I think that article is really just about headline click bait and lacks any real details. I'm not even convinced that comparison of Oracle and PG really makes sense anyway - both databases have their pros and cons. IMO Oracle is a very good database (though most of the 'add ons' are less beneficial). However, it is extremely expensive, both to license and to administer. For certain applications, it would be my first choice (assuming available budget). However, I prefer PG for the majority of what I need, partially due to the cost, but mainly because it is rock solid and much, much easier to administer and sufficient for what I need. As usual, it is more about requirements than brand and choosing the right tool for the right job. Tim -- Tim Cross
Re: Add columns to table; insert values based on row
Rich Shepard writes: > On Thu, 1 Nov 2018, Adrian Klaver wrote: > >>> alter table stations add column start_date date; >>> alter table stations add column end_date date; >>> alter table stations add column howmany integer; >>> alter table stations add column bin_col char(8); >>> >>> insert into stations (start_date, end_date, howmany, bin_col) values ( ) >>> select site_nbr from stations >>> where site_nbr = ' '; > >> Are you trying to add new data to existing records? > > Adrian, > >I am adding four columns to an existing table that already contains four > columns. > >> If so where is the new data coming from? > >I have a text file and will fill each insert statement by hand if there's > not a more efficient way to do this. > > Regards, > > Rich Like others, I'm not clear on exactly what your after here, but did want to point out 1. If your doing it by hand, you don't have to do a separate 'full' insert statement for every row i.e. insert into blah (x, y, z) values (), (), (), (...); is valid syntax. You don't need to do a full "insert into blah () values (...)" for each insert. 2. If it really is an insert you want to do and you already have the data in a file e.g. CSV or similar, then you can use the \copy command to process the file, which is very fast. 3. Is it really insert or update you need? -- Tim Cross
Re: Manage PostgreSQL Database for GITLAB Application?
een the environments? >>> >>> >>> Thanks again for your help. >>> >>> Regards, >>> >>> Karin >>> >>> >>> *From:* Stephen Frost >>> *Sent:* Monday, January 21, 2019 1:53:00 PM >>> *To:* Hilbert, Karin >>> *Cc:* pgsql-general@lists.postgresql.org >>> *Subject:* Re: Manage PostgreSQL Database for GITLAB Application? >>> Greetings, >>> >>> * Hilbert, Karin (i...@psu.edu) wrote: >>>> Does anyone manage a PostgreSQL database for a GITLAB application? >>> >>> Yes. >>> >>>> I have PostgreSQL v9.6 installed on my server & we are trying to migrate a >>>> GITLAB database there. >>>> >>>> The developer says that we need to use the public schema instead of the >>>> schema of the same name as the application user. >>> >>> Not sure this is really required but it also shouldn't hurt anything >>> really- I'd definitely have the database be dedicated to gitlab. >>> >>>> The schema that he provided me to restore also is revoking all privileges >>>> from the database owner & instead granting all privileges to PUBLIC. >>> >>> That's terrible. >>> >>>> Has anyone else run across this? I always thought that granting >>>> privileges to PUBLIC is a bad security thing to do? >>> >>> Yes, that's bad from a security perspective and shouldn't be necessary. >>> GRANT rights to the user(s) the application logs into, don't just grant >>> them to PUBLIC- that would allow anyone on the system to have access. >>> >>>> If anyone can offer any thoughts regarding this, it would be greatly >>>> appreciated. >>> >>> Is this developer the only one who is going to be using this gitlab >>> instance..? Sounds like maybe they want direct database access which >>> would only make sense if they're the one running it and should have full >>> access- but even then, I'd create a role and grant access to that role >>> and then grant them that role, if that's the requirement. GRANT'ing >>> things to public isn't a good idea if you're at all concerned about >>> security. >>> >>> Thanks! >>> >>> Stephen >> >> >> -- >> Adrian Klaver >> adrian.kla...@aklaver.com Joining discussion late, so apologise in advance if I repeat information already provided. Just wanted to mention that unfortunately, there are some libraries out there which provide a layer of abstraction for working with databases and postgres in particular, but which do not handle database schemas at all well. I'm not defending or criticising such libraries, but want to point out that sometimes, a developer, who is required to use specific libraries or modules, may not have the freedom to fully use the power of database schemas and that sometimes, limitations/restrictions are not necessarily at the DB level. As DBA's we need to recognise such restrictions exist, even if they seem misguided. As an example, there have been issues with at least one of the commonly used db interface libs/modules used by the Javascript SAILS framework (which I believe was addressed in later versions) that made working with different schemas very difficult. I would agree that the description provided regarding changes to permissions does raise concerns and hints of a developer under pressure to make something work with insufficient understanding of the underlying DB security and access control model. It is likely the developer needs guidance in this area. I also would argue that the PUBLIC schema is not in itself a security risk. The problem is with inappropriate use of that schema. It depends heavily on how the database is used. A database used for a single application has a completely different security and risk profile from a database used by multiple users for different applications. Arbitrary rules such as 'you won't use PUBLIC' are almost always wrong and often just make both developer and dba lives more complicated and harder to maintain. Complexity is where things go wrong and where security tends to break down. Rather than requiring the developer to use a specific schema, I would 1. Ask them why they believe they have to use the PUBLIC schema 2. If the need to use the PUBLIC schema is confirmed, then work with the developer to understand what the access requirements are and develop an appropriate model. 3. If there is no dependency on using the PUBLIC schema, work with the developer to assist them to resolve there access issues. Depending on the size of the organisation and complexity of the environment, choice of libraries and modules is not always as straight-forward. It may not be easy to switch to another library/module with better support for schemas etc or even to upgrade to a new version. Often, such changes will need to be managed in stages and over time. Work with the developers as complex environments will frequently require a united voice in order to get changes approved or prioritised. Tim -- Tim Cross
Re: Revoke SQL doesn't take effect
On Wed, 30 Jan 2019 at 07:49, Jason W wrote: > I have two postgresql accounts created by someone else who I do not know > (So I do not know setting for those accounts and tables created). One is > read only account e.g. read_only_user (This can perform select operations > only). The other is admin account e.g. admin_user (This can perform grant, > revoke, CRUD, and so on operations). > > The read only account can query (select sql) a table (suppose it's called > table1) under a specific schema (suppose it's schema1). For instance select > * from schema1.table1. Now I received a request to revoke select for that > read only account on table1. So I execute > > revoke select on schema1.table1 from read_only_user > > psql returns REVOKE string (or something similar showing the sql execution > was successful) on console. However, when check with read_only_user > account. I am still able to query table1. Searching the internet, [1] looks > like the closest to my problem. But I do not find solution in that thread. > > So my question: > What steps do I need to perform in order to exactly revoke select from > read only user account for a particular table? So the read only user > account wont' be able query that specific table with select permission > revoke (psql should returns info like permission denied). > > Thanks > > [1]. > https://www.postgresql.org/message-id/00ce01cbef04%24910606b0%24b3121410%24%40yahoo.com > > It is likely that permissions for the user are being granted via a role rather than granted directly to the user (think of a role as a user account which does not have the login permission). First thing to check would be to look at what roles have been granted to the read_only user and if one of those grants select on schema1.table1, revoke/remove it from the role. There may be other complications, such as roles which do a grant select on all tables in a schema, so getting the order of things correct is important. First step, understanding how permissions are granted, then you should be able to revoke them effectively. Tim -- regards, Tim -- Tim Cross
Re: Where to store Blobs?
I don't think there is a suitable 'one size fits all' answer to this question. A lot will depend on how you intend to use the blobs and what sort of hardware architecture, especially storage systems, you have. At first glance, sticking everything in the DB seems like an easy choice. However, that can result in very large databases, which in turn can lead to issues with respect to backup, replication etc. If all your after is storage, then sometimes your better off using the file system for the blobs and keeping the metadata in the db. It can potentially be faster and easier to serve up blobs from the file system compared to the db if that is the main use case, but if the blobs are more dynamic or you use collections of blobs to build a master blob etc, the db has some advantages. If you really need database like functionality, given the relative cheapness of storage and the wealth of options available, storing the blobs in the database can have advantage. However, it will be important to select the most appropriate datatype. What some people think of as a 'blob' is just an array of bytes to many DBs and as usual, you need to make the decision as to what is the best storage representation for your requirements, keeping in mind that the more general 'blob' like storage type you choose often represents a loss in functionality but an increase in flexibility wrt to what can be inserted over more precise data types, which will be more restrictive about what can be inserted, but offer more functionality regarding what you can do with it (at the db level). Tim Thomas Güttler writes: > Some days ago I asked "Where **not** to use PostgreSQL?" on this mailing list. > > Now I realized: Nobody talked about Blobs. > > I guess most people do not store Blobs in PostgresSQL. > > Where do you store Blobs? > > (In my case Blobs are PDF/image files with size up to 20 MByte. > I do not talk about very big blobs which are several hundret MByte) -- Tim Cross
Re: Forks of pgadmin3?
Jeff Janes writes: > On Fri, Mar 22, 2019 at 8:04 AM Steve Atkins wrote: > >> >> >> > On Mar 22, 2019, at 10:56 AM, Christian Henz >> wrote: >> > >> >> There's the BigSQL fork, which had at least some minimal support >> for 10. I've no idea whether it's had / needs anything for 11 > > > I just installed BigSQL's v11 of the database to get the pgAdmin3 that > comes with it (I couldn't get the Windows installer to install just > pgAdmin, I had to take the entire server installation along with it) . > Even though it comes with v11, when you start it says it only supports up > to v10, and then gives a series of warnings about catalogs and system admin > functions not being as expected. Once you are past the warnings, it does > work at least on the surface, but I have to think some features aren't > going to work. > > Cheers, > > Jeff I think you have little choice other than to give up on pgAdmin3 and any of the forks. The old pgAdmin3 had become difficult to maintain and I doubt any fork will be able to avoid this. I completely understand your frustration with pgAdmin4, though I have observed significant improvement over the last 12 months. I'm in the position where I have been prevented from upgrading our databases because nobody on our team likes pgAdmin4 and they don't want to give up on pgAdmin3. The proverbial tail wagging the dog if you ask me. I have looked at some alternatives and have found that 1. dbeaver https://dbeaver.io/download/ is not too bad and is free 2. dataGrip from Atlasian is pretty good, but has a paid license 3. Most of our developers use Visual Code as their editor and it has some pretty reasonable extensions which makes doing basic database queries and display of results pretty reasonable and provides OK code completion support. Datagrip and visual code also have git integration, which is good if your keen on DDL stuff being tracked and versioned in git. Based on the improvements I've seen in pgAdmin4, I suspect it will get to a usable and stable state eventually and will likely be a pretty good replacement for pgAdmin3. However, currently, I find it still a little too unstable. Personally, I'm pleased I spent the time to get my Emacs and psql integration working to the point that I do 90% of what I need in psql -- Tim Cross
Re: AW: Forks of pgadmin3?
kpi6...@gmail.com writes: > Thanks for the link but we're very reluctant to use Java based programs. > The main reason is that we need to do some works on servers whose admins > simply do not allow to install Java. > The screenshots look very promises, however. > > Regards > Klaus > >> -Ursprüngliche Nachricht- >> Von: Thomas Kellerer >> Gesendet: Montag, 25. März 2019 12:06 >> An: pgsql-general@lists.postgresql.org >> Betreff: Re: Forks of pgadmin3? >> >> kpi6...@gmail.com schrieb am 22.03.2019 um 17:25: >> > 95% of my time I use pgadminIII just to type select and update >> > statements and review the output rows. >> > >> > I know that I can do this in psql but it’s not handy with many >> > columns. >> >> An alternative you might want to try is SQL Workbench/J: https://www.sql- >> workbench.eu/ >> >> Full disclosure: I am the author of that tool. >> >> It's a cross DBMS tool, but my primary focus is Postgres. >> >> It focuses on running SQL queries rather then being a DBA tool. >> >> Regards >> Thomas You may not need to install anything on the server. GUI based tools like dbeaver (also java) and I suspect this one, just run on your desktop/laptop. You connect to the remote DB as normal i.e. port 5432. If your network environment is locked down to only allow connections to port 5432 from specific servers and localhost (i.e. the server), then SSH can work. You use an SSH tunnel to tunnerl traffic for port 5432 to a local port and then configure the connection as a local connection using that port e.g. in one terminal ssh -L 3330:localhost:5432 db.server in local software tool, configure the connection with host localhost and port 3330. It may also be necessary to setup proxy connections if the host your allowed to connect to is not the db host, but many tools support this as well as it is a common restriction. You can also use ssh here, but it is a little more complicated, but same principals. BTW, the blanket restriction on java runtime is IMO misguided. There is nothing inherently more risky about the Java runtime than anything else (python, perl, ruby, node, etc). In fact, the JVM is a pretty decent bit of kit. The Java language is horrible to work with, but that is a different issue. There are some bloody awful Java applications out there, but this really means, assess on a per app basis, not a blanket ban on all of them. There are insecure and poorly written apps in every language. Tim -- Tim Cross
Re: PG version recommendation
I would find out if the IT team who will maintain the system are running a specific Linux distribution, such as RHEL and just go with the PG version that is on that distribution. Large corp rarely have sufficient IT resources. Unless you specifically need a particular PG version (which does not seem to be the case based on your info), you are better off sticking with the version provided by whatever distro they use. This will ensure reasonable updates and patches. In corp environments, where IT resources are thin on the ground, any custom install often results in poor patching and update cycles because it falls outside 'standard procedures'. With respect to hardware specifications, it really depends a lot on what the infrastructure is. Typically, you would be better off specifying the performance and storage (size) you require and leave it to the IT team to work out how to best satisfy those requirements e.g. support x concurrent connections, y Tb/Gb of storage, backup requirements i.e. how frequent, how many versions and retention requirements. Include details of any additional PG packages you may need/want and how/where the database will need to be accessed from. As you indicate the host will be a VM, it should be relatively easy to scale up/down cpus or memory as required, unless you have special requirements (very complex queries, very large data sets, complex data models involving GIS, XML, etc that may exceed resources available in their VM infrastructure). >From your description, your database sounds fairly standard with no unusual requirements. The number of concurrent users is low and it sounds like it may be a new application where you probably don't yet know where performance or resource bottlenecks will be. A standard Linux server with 16+Gb memory and a couple of Gb for storage running PG 9.6 or higher is likely to be a reasonable starting point. It would also be a good idea to speak to the IT team and see if they have any procedures/policies for requesting resources. Ask them what info they need to know and then gather that. It is unlikely to help if yuou specify hardware requirements they cannot easily support, especially if those requirements are really just arbitrary and based on external recommendations from people who don't know what the infrastructure is. Nothing frustrates IT teams more than being require to provide over specified systems which consume valuable resources that are never used or demand custom configurations which are unnecessary and just add to their maintenance overheads. Tim David Gauthier writes: > Hi: > > I'm going to be requesting a PG instance supported by an IT team in a large > corp. They will be creating the server as a VM. We will be loading the DB > using scripts (perl/dbi) on linux, possibly using bulk loading techniques > if that's required. Queries will come from both linux and the web, but > typically the number of concurrent users will be on the order of 10 reads, > maybe a couple writers. < 1T total disk, no partitioning. I will be > requesting PITR. > > I need to pick a PG version in my request. I want something that will be > stable and reliable while, of course, being able to perform well. What > would be a good choice for PG version ? > > Also, since the server will be a VM, are there any special > recommendations/suggestions might I forward in the request (install > options, tuning options, other) ? > > Thanks ! -- Tim Cross