Re: Multitenent architecture
Hi guys - apologies for jumping into the middle of this, but it touches on precisely one of the topics I'll have to deal with soon. We're standing up a pharmacovigilance / drug safety platform managing ICSRs (HL7 "individual case safety reports" that must be FDA/ICH compliant) and will be dealing with multi-level privacy and compliance (HIPAA, GDPR, PHI, CCPA, etc.) at levels such as Patient, Doctor, Pharmacist. This first part deals with adverse drug events in the "post marketing" realm, i.e., drugs already approved and in the marketplace. Our model will also expand to support pre-marketing, i.e., clinical trials and the like. Far easier to have one coherent data model and tackle compliance as a multi-tenant / multi-level-secure database issue. We just have to make sure we do this so it's rock solid and also facilitates the business / use cases that need to be executed. I'd love to compare notes with anyone else doing something along those lines. @Vasu - it sounds like you're smack in the middle of this. Let me know if you'd be open to connecting and comparing notes on this topic. (anyone else as well!) Thanks, - Jon On Sat, Jun 6, 2020 at 1:21 PM Michel Pelletier wrote: > > On Sat, Jun 6, 2020 at 3:14 AM Vasu Madhineni > wrote: > >> Hi Rob, >> >> Our environment is medical clinical data, so each clinic as a tenant. >> Approximately 500+ tenants with 6TB data. >> >> Thank you in advance. >> >> > There's a good article on the AWS blog on multi tenancy with postgres: > > > https://aws.amazon.com/blogs/database/multi-tenant-data-isolation-with-postgresql-row-level-security/ > > This is similar to Laurenz's second suggestion. > > -Michel > > >> Regards, >> Vasu Madhineni >> >> On Fri, Jun 5, 2020 at 6:09 PM Rob Sargent wrote: >> >>> >>> >>> On Jun 5, 2020, at 2:54 AM, Vasu Madhineni >>> wrote: >>> >>> >>> If the data size is more than 6TB, which approach better? >>> >>> On Fri, Jun 5, 2020 at 2:57 PM Laurenz Albe >>> wrote: >>> On Thu, 2020-06-04 at 23:52 +0800, Vasu Madhineni wrote: > We are planning a POC on multitenant architecture in Postgres, Could you please > help us with steps for multitenant using schema for each application model. For few tenants, you can keep identical tables in several schemas and set "search_path" to select a tenant. With many tenants, you are better off with one table that holds the data for all clients. You can use Row Level Security to have each tenant see only his or her data, and it might be a good idea to use list partitioning on the tenant ID. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com The question is How many separate data owners? >>> >>
Re: How to write such a query
Are you looking to arbitrarily update the field in the fifth row, or can the row that needs to be updated be isolated by some add'l attribute? What's the use case? - Jon <https://www.linkedin.com/in/jonstrong/> <https://www.jonathanrstrong.com> *Jonathan Strong* CIO / CTO / Consultant *P:* 609-532-1715 *E:* jonathanrstr...@gmail.com *Quora Top Writer <https://www.quora.com/profile/Jonathan-R-Strong>* On Fri, Sep 18, 2020 at 1:27 PM Igor Korot wrote: > Hi, > Consider following > > [code] > CREATE TABLE X(id INTEGER PRIMARY KEY, field1 char(50), field2 int); > CREATE TABLE Y(id INTEGER PRIMARY KEY, field1 char, field2 double(10, 2)); > SELECT X.field1, Y.field2 from X, Y WHERE X.id = Y.id; > [/code] > > Assuming that the SELECT return 10 rows, I want to update X.field1 > in row 5. > > How do I write a WHERE clause in the > > [code] > UPDATE X.field1 SET X.field1 = '' WHERE > [/code] > > Thank you. > > >
Re: How to write such a query
@Adrian - Using a unique key value or otherwise isolating a specific record via selection against values in its attributes is certainly preferable to choosing a row to update via its position in a result set, unless the use case actually makes use of that position info as a meaningful descriptor of the data in some fashion. - Jon <https://www.linkedin.com/in/jonstrong/> <https://www.jonathanrstrong.com> *Jonathan Strong* CIO / CTO / Consultant *P:* 609-532-1715 *E:* jonathanrstr...@gmail.com *Quora Top Writer <https://www.quora.com/profile/Jonathan-R-Strong>* On Fri, Sep 18, 2020 at 1:58 PM Adrian Klaver wrote: > On 9/18/20 10:46 AM, Igor Korot wrote: > > Hi, Johnathan, > > > > On Fri, Sep 18, 2020 at 12:34 PM Jonathan Strong > > mailto:jonathanrstr...@gmail.com>> wrote: > > > > Are you looking to arbitrarily update the field in the fifth row, or > > can the row that needs to be updated be isolated by some add'l > > attribute? What's the use case? > > > > > > What do you mean? > > I don't have any other attributes. > > > > I want to understand how to emulate MS Access behavior, where you have a > > form > > with the arbitrary query, then you can go to any record in that form and > > update any field. > > > > Is it even possible from the "pure SQL" POV? Or Access is doing some > > VBA/DB/4GL magic? > > > > When you are updating a record in a form the framework(Access in your > case) is using some identifier from that record to UPDATE that > particular record in the database. From when I used Access, I seem to > remember it would not give you INSERT/UPDATE capability on a form unless > you had specified some unique key for the records. So you need to find > what the key(generally a PRIMARY KEY) is and use that to do the UPDATE. > > > Thank you. > > > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com >
Re: How to write such a query
Yes...absolutely. Short of using ORDER BY, the order of a multi-row result set can be arbitrary, with "row position" having no significant meaning. This gets back to understanding set theory, the relational model, the various types of keys (primary, candidate, foreign, etc.). Truly crucial to understand the model in order to write correctly functioning and reliable code. - Jon <https://www.linkedin.com/in/jonstrong/> <https://www.jonathanrstrong.com> *Jonathan Strong* CIO / CTO / Consultant *P:* 609-532-1715 *E:* jonathanrstr...@gmail.com *Quora Top Writer <https://www.quora.com/profile/Jonathan-R-Strong>* On Fri, Sep 18, 2020 at 2:17 PM Thomas Kellerer wrote: > Igor Korot schrieb am 18.09.2020 um 19:29: > > [code] > > CREATE TABLE X(id INTEGER PRIMARY KEY, field1 char(50), field2 int); > > CREATE TABLE Y(id INTEGER PRIMARY KEY, field1 char, field2 double(10, > 2)); > > SELECT X.field1, Y.field2 from X, Y WHERE X.id = Y.id; > > [/code] > > > > Assuming that the SELECT return 10 rows, I want to update X.field1 > > in row 5. > > There is no such thing as "row 5" in a relational database. > > Rows in a table have no inherent sort order. The only way you can identify > a row, is by the value of its primary (or unique) key. Not by "position". > > The only way you can identify "row 5" is, if you use an ORDER BY to > define a sort order on the result - but that position is only valid > for that _result_, it has no meaning for the actual table data. > > Which brings us back to the fact, that the only way to (uniquely) identify > a row in a table is to specify its primary key value in the WHERE clause > > > > > >
Re: How to write such a query
Yes - 100% - Jon <https://www.linkedin.com/in/jonstrong/> <https://www.jonathanrstrong.com> *Jonathan Strong* CIO / CTO / Consultant *P:* 609-532-1715 *E:* jonathanrstr...@gmail.com *Quora Top Writer <https://www.quora.com/profile/Jonathan-R-Strong>* On Fri, Sep 18, 2020 at 2:22 PM Adrian Klaver wrote: > On 9/18/20 11:13 AM, Jonathan Strong wrote: > > @Adrian - > > > > Using a unique key value or otherwise isolating a specific record via > > selection against values in its attributes is certainly preferable to > > choosing a row to update via its position in a result set, unless the > > use case actually makes use of that position info as a meaningful > > descriptor of the data in some fashion. > > > The bigger issue is deciding what attribute of the selected row is be > used to do the UPDATE. Unless it is the PRIMARY KEY(or other UNIQUE > key(s)) then you very likely are going to UPDATE more then you bargained > for. > > > > > - Jon > > > > <https://www.linkedin.com/in/jonstrong/> < > https://www.jonathanrstrong.com> > > > > > > > > *Jonathan Strong* > > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com >
Re: How to write such a query
A pretty good read / intro to the concept of keys in the relational model: https://www.red-gate.com/simple-talk/sql/learn-sql-server/primary-key-primer-for-sql-server/ - Jon <https://www.linkedin.com/in/jonstrong/> <https://www.jonathanrstrong.com> *Jonathan Strong* CIO / CTO / Consultant *P:* 609-532-1715 *E:* jonathanrstr...@gmail.com *Quora Top Writer <https://www.quora.com/profile/Jonathan-R-Strong>* On Fri, Sep 18, 2020 at 3:08 PM Adrian Klaver wrote: > On 9/18/20 11:49 AM, Igor Korot wrote: > > Hi, Adrian, > > > > On Fri, Sep 18, 2020 at 12:58 PM Adrian Klaver > > wrote: > >> > >> On 9/18/20 10:46 AM, Igor Korot wrote: > >>> Hi, Johnathan, > >>> > >>> On Fri, Sep 18, 2020 at 12:34 PM Jonathan Strong > >>> mailto:jonathanrstr...@gmail.com>> wrote: > >>> > >>> Are you looking to arbitrarily update the field in the fifth row, > or > >>> can the row that needs to be updated be isolated by some add'l > >>> attribute? What's the use case? > >>> > >>> > >>> What do you mean? > >>> I don't have any other attributes. > >>> > >>> I want to understand how to emulate MS Access behavior, where you have > a > >>> form > >>> with the arbitrary query, then you can go to any record in that form > and > >>> update any field. > >>> > >>> Is it even possible from the "pure SQL" POV? Or Access is doing some > >>> VBA/DB/4GL magic? > >>> > >> > >> When you are updating a record in a form the framework(Access in your > >> case) is using some identifier from that record to UPDATE that > >> particular record in the database. From when I used Access, I seem to > >> remember it would not give you INSERT/UPDATE capability on a form unless > >> you had specified some unique key for the records. So you need to find > >> what the key(generally a PRIMARY KEY) is and use that to do the UPDATE. > > > > But now the question becomes > > > > How to find what the primary key (or UNIQUE identifier) value is > > for row 5 in the recordset? > > You defined them: > > CREATE TABLE X(id INTEGER PRIMARY KEY, field1 char(50), field2 int); > CREATE TABLE Y(id INTEGER PRIMARY KEY, field1 char, field2 double(10, 2)); > > How you fetch that value is going to depend on where the record set is > being presented and how the record to be UPDATEd is selected? If you are > using some sort of framework/form to display the records it will have > some mechanism to gather the information(data) on select and then you > can use the PRIMARY KEY value from that data to do the UPDATE. If you > want a more precise answer then you will need to provide a complete > example of what you are doing. > > > > > Thank you. > > > >> > >>> Thank you. > >>> > >> > >> > >> > >> -- > >> Adrian Klaver > >> adrian.kla...@aklaver.com > > > -- > Adrian Klaver > adrian.kla...@aklaver.com >
Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?
Joshua - adding to what Fabio said (with which I agree wholeheartedly!) - It will be worthwhile for you to itemize the attributes and needed features for making a final decision about the architecture of the application you plan to deploy. While you are familiar with PostgreSQL and like its features, it may not wind up being the best match for what you are trying to accomplish. I'm not sure your concerns about SQLite are true roadblocks. If you are looking to deploy an executable supported by one or more DLLs but no external processes, you won't find many mature and truly embeddable DBMSs out there. SQLite is widely used and well supported, and some of the functionality you'd normally consider putting in stored procedures might fit in with the idea of creating custom functions. A few more steps than creating a PostgreSQL stored procedure or function, but perhaps this approach would make SQLite a reasonable choice for you. See: https://stackoverflow.com/questions/2108870/how-to-create-custom-functions-in-sqlite Back to the question about attributes and features of the DBMS, and Fabio's question to you re: your definition of an embedded database. An embedded database is typically used by a single user and is only available within the context of the current application and user / memory space. It should also be self-tuning and self-managing, as a typical end user can't and shouldn't be expected to also handle any DBA tasks. Several databases are considered "embedded", but the criteria used for this label may not match your own. If you can spell out your specific expectations (e.g., the app and database can be deployed in a single exe, etc.), this will help tremendously in homing in on the right choice. - Jon <https://www.linkedin.com/in/jonstrong/> <https://www.jonathanrstrong.com> *Jonathan Strong* CIO / CTO / Consultant *P:* 609-532-1715 *E:* jonathanrstr...@gmail.com *Quora Top Writer <https://www.quora.com/profile/Jonathan-R-Strong>* On Tue, Sep 22, 2020 at 10:18 AM Fabio Ugo Venchiarutti < f.venchiaru...@ocado.com> wrote: > > On 21/09/2020 17:53, Joshua Drake wrote: > > 3. The ability to embed PG to run in an automatic, quiet manner as part > > of something else. I know about SQLite, but it's extremely limited to > > the point of being virtually useless IMO, which is why I cannot use that > > for anything nontrivial. I want my familiar PostgreSQL, only not require > > it to be manually and separately installed on the machine where it is to > > run as part of some "application". If I could just "embed" it, this > > would allow me to create a single EXE which I can simply put on a > > different machine to run my entire "system" which otherwise takes *tons* > > of tedious, error-prone manual labor to install, set up and maintain. Of > > course, this is probably much easier said than done, but I don't > > understand why PG's architecture necessarily dictates that PG must be a > > stand-alone, separate thing. Or rather, why some "glue" cannot enable it > > to be used just like SQLite from a *practical* perspective, even if it > > still is a "server-client model" underneath the hood. (Which doesn't > > matter at all to me, nor should it matter to anyone else.) > > It depends what you mean by "embedded". > If you want sqlite's linked library approach, where the database calls > literally run your process' address space, then that's a no go, as > postgres is a multi-user database server with its own process hierarchy. > > > However, postgres also is a rather agnostic command that does not detach > from the parent's terminal/stdio unless instructed to, so nothing stops > your bespoke application from launching and managing its own postmaster > as a directly managed child process - started as part of application > initialisation - and having that listen on a local socket only available > to the application itself; this is what we implemented in some of our > installations where postgres is enslaved to the cluster control system - > the configuration is re-generated at every restart (IIRC Patroni does > something similar minus the unix socket part). > > > A plethora of systems are built around the notion of programs calling > other programs and managing the process' life cycle. The limiting factor > to such architecture tends to be the OS's process control semantics and > API (notoriously weaker or more contrived on non-unix-like OSs), but > that's not postgres' fault. > > > > > > -- > Regards > > Fabio Ugo Venchiarutti > OSPCFC Network Engineering Dpt. > Ocado Technology > > -- > > > Notice: > This email i
Re: Rows removed on child table when updating parent partitioned table.
I've been away from coding for several years, but dusting off my chops and getting back up to speed with PostgreSQL (love it!). So please forgive me if my early answers here come off as naive. But my understanding of this suggests that you shouldn't be using "update" on a serial field. I'm guessing that under the covers things are getting confused because your update doesn't also address the sequence that's implicitly created when you define a field as "serial". If you use "update" I'm guessing that nextval in the corresponding sequence is *not* updated accordingly. Have you tried this with setval() or nextval() rather than update? You can compare the difference between these and "update" by checking currval() after each. Again - I apologize for incomplete knowledge here, but I'm speculating that use of "update" on an auto-incrementing serial field is outside intended / supported behavior, and it may well just be that it winds up being handled differently under the covers when the data table and/or associated sequence are partitioned. - Jon <https://www.linkedin.com/in/jonstrong/> <https://www.jonathanrstrong.com> *Jonathan Strong* CIO / CTO / Consultant *P:* 609-532-1715 *E:* jonathanrstr...@gmail.com *Quora Top Writer <https://www.quora.com/profile/Jonathan-R-Strong>* On Thu, Oct 1, 2020 at 1:00 PM Eduard Català wrote: > Hi developers, > We have a strange case where some rows are removed. I think it is a bug, > but before notifying it I prefer to ask here where I am wrong. > > *Postgres 12* > Given the following structure: > > create table parent ( id serial, constraint parent_pkey primary key (id)) > partition by range (id); > create table parent_10 partition of parent for values from (0) to (10); > create table parent_20 partition of parent for values from (11) to (20); > > > create table child ( > id serial, > parent_id int constraint parent_id_fk references parent(id) on update > cascade on delete cascade); > > -- Notice the on update cascade on delete cascade. > > insert into parent values(0); > insert into child values(1,0); > > -- Here are the rows > > postgres=# table parent; > id > > 0 > (1 row) > > postgres=# table child; > id | parent_id > +--- > 1 | 0 > (1 row) > > > *-- Update the parent table id, with a value contained in the same > partition* > update parent set id = 5; > > postgres=# table parent; > id > > 5 > (1 row) > > postgres=# table child; > id | parent_id > +--- > 1 | 5 > (1 row) > > > *-- Update the parent table, with a value contained into other partition* > update parent set id = 15; > > postgres=# update parent set id = 15; > UPDATE 1 > postgres=# table parent; > id > > 15 > (1 row) > > > > > *postgres=# table child; id | parent_id+---(0 rows)* > > No error or warning was thrown. The rows in the child table were removed. > I think what has happened is: The update caused a DELETE in the table > parent_10 (removing the rows from child table) and then the INSERT into > parent_20. > > We've checked the documentation but didn't find something about this > unexpected behaviour. > > Trying without "on delete cascade" clause throws a "parent key not found > error". > > Thank you! > > > > >
Re: Rows removed on child table when updating parent partitioned table.
On Thu, Oct 1, 2020 at 2:02 PM David G. Johnston wrote: > The convention on these lists is to inline or bottom-post, please do not > top-post. > > On Thu, Oct 1, 2020 at 10:41 AM Jonathan Strong > wrote: > >> I've been away from coding for several years, but dusting off my chops >> and getting back up to speed with PostgreSQL (love it!). So please forgive >> me if my early answers here come off as naive. But my understanding of this >> suggests that you shouldn't be using "update" on a serial field. >> > > Yes Jonathan, your present understanding is flawed. The OP has provided > a self-contained simple test case for the problem at hand - which even if > not "best practice" is indeed valid to do and demonstrates the problem > quite clearly. Without actually testing it out I would say that this is > likely indeed an oversight in the partition row movement feature - it > didn't take into account the ON UPDATE/ON DELETE clause. > Understood - thx. I'll watch and learn for a while. Thanks as well for the list etiquette advice. I'll endeavor to follow convention. - Jon
Re: What's your experience with using Postgres in IoT-contexts?
On Wed, Oct 14, 2020 at 8:49 AM Peter J. Holzer wrote: > On 2020-10-13 06:55:52 +0200, chlor wrote: > > > I want to have long term storage and access to individual telegrams > > > > An IOT is not designed for that. It is used for control or delivery of > > data to a server. > > That's a rather dogmatic and narrow-minded point of view. "IOT" means > "Internet of things". There are many things which which could benefit > from network connectivity and don't necessarily need a central server or > may even act as servers for other "things". > > It all depends on the application and the "thing". > > > Long term storage also means backup and recovery and I don't think you > > have that planned for your IOT. > > That depends on how valuable those data are. > > hp > > -- >_ | Peter J. Holzer| > > Indeed. IoT architecture also begs the question of "when" detailed historical data may be needed, and how Edge Computing can factor into the overall solution model. Detailed transactions may live "at the edge" while needed aggregate / extract info is communicated to a central server to support real time response. But those left-behind detailed transactions may (or may not) follow later on via a lower priority / non- real time path if relevant and eventually valuable. Some examples I've had to work with: When calculating real-time Equity / Security Index values, you might capture tick by tick data for each security in an index valuation formula. Just one security in an index (e.g., MSFT) could easily generate more than 100,000 ticks per day. One of the Large Cap indices currently has about 3,500 stocks in it. They might not all trade as frequently as MSFT, but you might see anywhere from 10 million to 100 million data points in a day. While this differs from IoT in that data sources aren't physically separated and as numerous as individual IoT devices, the challenge is similar in that a good real time architecture makes use of needed data at various stages in the process flow (and data network flow) and defers functions that can wait, including perhaps committing full details of every transaction to centralized long term storage, as long as the computed Index value can be published in real time. Years ago we developed an online gaming platform supporting hundreds of thousands of concurrent users who came in from numerous countries around the world. Real-time scoring and chat communications posed challenges similar to the Equity Index solution above. We needed to be able to accept play data from thousands of concurrent players and have a game (or chat room) respond in near real time, but full detailed data could be queued up and gradually transmitted, processed, assimilated and committed to long term storage. In health care data collection we see similar challenges: real time IoT biosensors may capture blood oximetry, glucose, lactate info, heart rate, etc. Some of this may be critical for real time monitoring and processing. Some gets processed "at the Edge" - aggregated, filtered, interpreted, etc. before getting to central / long term storage. Deciding the level of detail that actually has to reach centralized long term storage - and when - is typically a non-trivial exercise. When you look at examples like monitoring a jet engine, gas turbines, or an air conditioner manufacturer and service company (one of my past clients) monitoring hundreds of thousands of HVAC units distributed around the country, data samples go past terabytes to petabytes, exabytes and more. While you need to figure out how to trim the raw data to amounts that can reasonably be stored and managed, I've seen too many cases of being overly aggressive in discarding data thought to be superfluous; thoughtful analysis is critical here. - Jon