Re: Multitenent architecture

2020-06-06 Thread Jonathan Strong
Hi guys - apologies for jumping into the middle of this, but it touches on
precisely one of the topics I'll have to deal with soon. We're standing up
a pharmacovigilance / drug safety platform managing ICSRs (HL7 "individual
case safety reports" that must be FDA/ICH compliant) and will be dealing
with multi-level privacy and compliance (HIPAA, GDPR, PHI, CCPA, etc.) at
levels such as Patient, Doctor, Pharmacist. This first part deals with
adverse drug events in the "post marketing" realm, i.e., drugs already
approved and in the marketplace. Our model will also expand to support
pre-marketing, i.e., clinical trials and the like.

Far easier to have one coherent data model and tackle compliance as a
multi-tenant / multi-level-secure database issue. We just have to make sure
we do this so it's rock solid and also facilitates the business / use cases
that need to be executed.

I'd love to compare notes with anyone else doing something along those
lines. @Vasu - it sounds like you're smack in the middle of this. Let me
know if you'd be open to connecting and comparing notes on this topic.
(anyone else as well!)

Thanks,

- Jon


On Sat, Jun 6, 2020 at 1:21 PM Michel Pelletier 
wrote:

>
> On Sat, Jun 6, 2020 at 3:14 AM Vasu Madhineni 
> wrote:
>
>> Hi Rob,
>>
>> Our environment is medical clinical data, so each clinic as a tenant.
>> Approximately 500+ tenants with 6TB data.
>>
>> Thank you in advance.
>>
>>
> There's a good article on the AWS blog on multi tenancy with postgres:
>
>
> https://aws.amazon.com/blogs/database/multi-tenant-data-isolation-with-postgresql-row-level-security/
>
> This is similar to Laurenz's second suggestion.
>
> -Michel
>
>
>> Regards,
>> Vasu Madhineni
>>
>> On Fri, Jun 5, 2020 at 6:09 PM Rob Sargent  wrote:
>>
>>>
>>>
>>> On Jun 5, 2020, at 2:54 AM, Vasu Madhineni 
>>> wrote:
>>>
>>> 
>>> If the data size is more than 6TB, which approach better?
>>>
>>> On Fri, Jun 5, 2020 at 2:57 PM Laurenz Albe 
>>> wrote:
>>>
 On Thu, 2020-06-04 at 23:52 +0800, Vasu Madhineni wrote:
 > We are planning a POC on multitenant architecture in Postgres, Could
 you please
 > help us with steps for multitenant using schema for each application
 model.

 For few tenants, you can keep identical tables in several schemas and
 set "search_path" to select a tenant.

 With many tenants, you are better off with one table that holds the
 data for all clients.  You can use Row Level Security to have each
 tenant see only his or her data, and it might be a good idea to
 use list partitioning on the tenant ID.

 Yours,
 Laurenz Albe
 --
 Cybertec | https://www.cybertec-postgresql.com

 The question is How many separate data owners?
>>>
>>


Re: How to write such a query

2020-09-18 Thread Jonathan Strong
Are you looking to arbitrarily update the field in the fifth row, or can
the row that needs to be updated be isolated by some add'l attribute?
What's the use case?

- Jon

<https://www.linkedin.com/in/jonstrong/>
<https://www.jonathanrstrong.com>

*Jonathan Strong*

CIO / CTO / Consultant

*P:* 609-532-1715 *E:* jonathanrstr...@gmail.com

*Quora Top Writer <https://www.quora.com/profile/Jonathan-R-Strong>*


On Fri, Sep 18, 2020 at 1:27 PM Igor Korot  wrote:

> Hi,
> Consider following
>
> [code]
> CREATE TABLE X(id INTEGER PRIMARY KEY, field1 char(50), field2 int);
> CREATE TABLE Y(id INTEGER PRIMARY KEY, field1 char, field2 double(10, 2));
> SELECT X.field1, Y.field2 from X, Y WHERE X.id = Y.id;
> [/code]
>
> Assuming that the SELECT return 10 rows, I want to update X.field1
> in row 5.
>
> How do I write a WHERE clause in the
>
> [code]
> UPDATE X.field1 SET X.field1 = '' WHERE
> [/code]
>
> Thank you.
>
>
>


Re: How to write such a query

2020-09-18 Thread Jonathan Strong
@Adrian -

Using a unique key value or otherwise isolating a specific record via
selection against values in its attributes is certainly preferable to
choosing a row to update via its position in a result set, unless the use
case actually makes use of that position info as a meaningful descriptor of
the data in some fashion.

- Jon

<https://www.linkedin.com/in/jonstrong/>
<https://www.jonathanrstrong.com>

*Jonathan Strong*

CIO / CTO / Consultant

*P:* 609-532-1715 *E:* jonathanrstr...@gmail.com

*Quora Top Writer <https://www.quora.com/profile/Jonathan-R-Strong>*


On Fri, Sep 18, 2020 at 1:58 PM Adrian Klaver 
wrote:

> On 9/18/20 10:46 AM, Igor Korot wrote:
> > Hi, Johnathan,
> >
> > On Fri, Sep 18, 2020 at 12:34 PM Jonathan Strong
> > mailto:jonathanrstr...@gmail.com>> wrote:
> >
> > Are you looking to arbitrarily update the field in the fifth row, or
> > can the row that needs to be updated be isolated by some add'l
> > attribute? What's the use case?
> >
> >
> > What do you mean?
> > I don't have any other attributes.
> >
> > I want to understand how to emulate MS Access behavior, where you have a
> > form
> > with the arbitrary query, then you can go to any record in that form and
> > update any field.
> >
> > Is it even possible from the "pure SQL" POV? Or Access is doing some
> > VBA/DB/4GL magic?
> >
>
> When you are updating a record in a form the framework(Access in your
> case) is using some identifier from that record to UPDATE that
> particular record in the database. From when I used Access, I seem to
> remember it would not give you INSERT/UPDATE capability on a form unless
> you had specified some unique key for the records. So you need to find
> what the key(generally a PRIMARY KEY) is and use that to do the UPDATE.
>
> > Thank you.
> >
>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Re: How to write such a query

2020-09-18 Thread Jonathan Strong
Yes...absolutely. Short of using ORDER BY, the order of a multi-row result
set can be arbitrary, with "row position" having no significant meaning.

This gets back to understanding set theory, the relational model, the
various types of keys (primary, candidate, foreign, etc.). Truly crucial to
understand the model in order to write correctly functioning and reliable
code.

- Jon

<https://www.linkedin.com/in/jonstrong/>
<https://www.jonathanrstrong.com>

*Jonathan Strong*

CIO / CTO / Consultant

*P:* 609-532-1715 *E:* jonathanrstr...@gmail.com

*Quora Top Writer <https://www.quora.com/profile/Jonathan-R-Strong>*


On Fri, Sep 18, 2020 at 2:17 PM Thomas Kellerer  wrote:

> Igor Korot schrieb am 18.09.2020 um 19:29:
> > [code]
> > CREATE TABLE X(id INTEGER PRIMARY KEY, field1 char(50), field2 int);
> > CREATE TABLE Y(id INTEGER PRIMARY KEY, field1 char, field2 double(10,
> 2));
> > SELECT X.field1, Y.field2 from X, Y WHERE X.id = Y.id;
> > [/code]
> >
> > Assuming that the SELECT return 10 rows, I want to update X.field1
> > in row 5.
>
> There is no such thing as "row 5" in a relational database.
>
> Rows in a table have no inherent sort order. The only way you can identify
> a row, is by the value of its primary (or unique) key. Not by "position".
>
> The only way you can identify "row 5" is, if you use an ORDER BY to
> define a sort order on the result - but that position is only valid
> for that _result_, it has no meaning for the actual table data.
>
> Which brings us back to the fact, that the only way to (uniquely) identify
> a row in a table is to specify its primary key value in the WHERE clause
>
>
>
>
>
>


Re: How to write such a query

2020-09-18 Thread Jonathan Strong
Yes - 100%

- Jon

<https://www.linkedin.com/in/jonstrong/>
<https://www.jonathanrstrong.com>

*Jonathan Strong*

CIO / CTO / Consultant

*P:* 609-532-1715 *E:* jonathanrstr...@gmail.com

*Quora Top Writer <https://www.quora.com/profile/Jonathan-R-Strong>*


On Fri, Sep 18, 2020 at 2:22 PM Adrian Klaver 
wrote:

> On 9/18/20 11:13 AM, Jonathan Strong wrote:
> > @Adrian -
> >
> > Using a unique key value or otherwise isolating a specific record via
> > selection against values in its attributes is certainly preferable to
> > choosing a row to update via its position in a result set, unless the
> > use case actually makes use of that position info as a meaningful
> > descriptor of the data in some fashion.
>
>
> The bigger issue is deciding what attribute of the selected row is be
> used to do the UPDATE. Unless it is the PRIMARY KEY(or other UNIQUE
> key(s)) then you very likely are going to UPDATE more then you bargained
> for.
>
> >
> > - Jon
> >
> > <https://www.linkedin.com/in/jonstrong/> <
> https://www.jonathanrstrong.com>
> >
> >
> >
> > *Jonathan Strong*
> >
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Re: How to write such a query

2020-09-18 Thread Jonathan Strong
A pretty good read / intro to the concept of keys in the relational model:

https://www.red-gate.com/simple-talk/sql/learn-sql-server/primary-key-primer-for-sql-server/

- Jon

<https://www.linkedin.com/in/jonstrong/>
<https://www.jonathanrstrong.com>

*Jonathan Strong*

CIO / CTO / Consultant

*P:* 609-532-1715 *E:* jonathanrstr...@gmail.com

*Quora Top Writer <https://www.quora.com/profile/Jonathan-R-Strong>*


On Fri, Sep 18, 2020 at 3:08 PM Adrian Klaver 
wrote:

> On 9/18/20 11:49 AM, Igor Korot wrote:
> > Hi, Adrian,
> >
> > On Fri, Sep 18, 2020 at 12:58 PM Adrian Klaver
> >  wrote:
> >>
> >> On 9/18/20 10:46 AM, Igor Korot wrote:
> >>> Hi, Johnathan,
> >>>
> >>> On Fri, Sep 18, 2020 at 12:34 PM Jonathan Strong
> >>> mailto:jonathanrstr...@gmail.com>> wrote:
> >>>
> >>>  Are you looking to arbitrarily update the field in the fifth row,
> or
> >>>  can the row that needs to be updated be isolated by some add'l
> >>>  attribute? What's the use case?
> >>>
> >>>
> >>> What do you mean?
> >>> I don't have any other attributes.
> >>>
> >>> I want to understand how to emulate MS Access behavior, where you have
> a
> >>> form
> >>> with the arbitrary query, then you can go to any record in that form
> and
> >>> update any field.
> >>>
> >>> Is it even possible from the "pure SQL" POV? Or Access is doing some
> >>> VBA/DB/4GL magic?
> >>>
> >>
> >> When you are updating a record in a form the framework(Access in your
> >> case) is using some identifier from that record to UPDATE that
> >> particular record in the database. From when I used Access, I seem to
> >> remember it would not give you INSERT/UPDATE capability on a form unless
> >> you had specified some unique key for the records. So you need to find
> >> what the key(generally a PRIMARY KEY) is and use that to do the UPDATE.
> >
> > But now the question becomes
> >
> > How to find what the primary key (or UNIQUE identifier) value is
> > for row 5 in the recordset?
>
> You defined them:
>
> CREATE TABLE X(id INTEGER PRIMARY KEY, field1 char(50), field2 int);
> CREATE TABLE Y(id INTEGER PRIMARY KEY, field1 char, field2 double(10, 2));
>
> How you fetch that value is going to depend on where the record set is
> being presented and how the record to be UPDATEd is selected? If you are
> using some sort of framework/form to display the records it will have
> some mechanism to gather the information(data) on select and then you
> can use the PRIMARY KEY value from that data to do the UPDATE. If you
> want a more precise answer then you will need to provide a complete
> example of what you are doing.
>
> >
> > Thank you.
> >
> >>
> >>> Thank you.
> >>>
> >>
> >>
> >>
> >> --
> >> Adrian Klaver
> >> adrian.kla...@aklaver.com
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

2020-09-22 Thread Jonathan Strong
Joshua - adding to what Fabio said (with which I agree wholeheartedly!) -

It will be worthwhile for you to itemize the attributes and needed features
for making a final decision about the architecture of the application you
plan to deploy.

While you are familiar with PostgreSQL and like its features, it may not
wind up being the best match for what you are trying to accomplish. I'm not
sure your concerns about SQLite are true roadblocks. If you are looking to
deploy an executable supported by one or more DLLs but no external
processes, you won't find many mature and truly embeddable DBMSs out there.
SQLite is widely used and well supported, and some of the functionality
you'd normally consider putting in stored procedures might fit in with the
idea of creating custom functions. A few more steps than creating a
PostgreSQL stored procedure or function, but perhaps this approach would
make SQLite a reasonable choice for you. See:

https://stackoverflow.com/questions/2108870/how-to-create-custom-functions-in-sqlite

Back to the question about attributes and features of the DBMS, and Fabio's
question to you re: your definition of an embedded database. An embedded
database is typically used by a single user and is only available within
the context of the current application and user / memory space. It should
also be self-tuning and self-managing, as a typical end user can't and
shouldn't be expected to also handle any DBA tasks. Several databases are
considered "embedded", but the criteria used for this label may not match
your own. If you can spell out your specific expectations (e.g., the app
and database can be deployed in a single exe, etc.), this will help
tremendously in homing in on the right choice.

- Jon

<https://www.linkedin.com/in/jonstrong/>
<https://www.jonathanrstrong.com>

*Jonathan Strong*

CIO / CTO / Consultant

*P:* 609-532-1715 *E:* jonathanrstr...@gmail.com

*Quora Top Writer <https://www.quora.com/profile/Jonathan-R-Strong>*


On Tue, Sep 22, 2020 at 10:18 AM Fabio Ugo Venchiarutti <
f.venchiaru...@ocado.com> wrote:

>
> On 21/09/2020 17:53, Joshua Drake wrote:
> > 3. The ability to embed PG to run in an automatic, quiet manner as part
> > of something else. I know about SQLite, but it's extremely limited to
> > the point of being virtually useless IMO, which is why I cannot use that
> > for anything nontrivial. I want my familiar PostgreSQL, only not require
> > it to be manually and separately installed on the machine where it is to
> > run as part of some "application". If I could just "embed" it, this
> > would allow me to create a single EXE which I can simply put on a
> > different machine to run my entire "system" which otherwise takes *tons*
> > of tedious, error-prone manual labor to install, set up and maintain. Of
> > course, this is probably much easier said than done, but I don't
> > understand why PG's architecture necessarily dictates that PG must be a
> > stand-alone, separate thing. Or rather, why some "glue" cannot enable it
> > to be used just like SQLite from a *practical* perspective, even if it
> > still is a "server-client model" underneath the hood. (Which doesn't
> > matter at all to me, nor should it matter to anyone else.)
>
> It depends what you mean by "embedded".
> If you want sqlite's linked library approach, where the database calls
> literally run your process' address space, then that's a no go, as
> postgres is a multi-user database server with its own process hierarchy.
>
>
> However, postgres also is a rather agnostic command that does not detach
> from the parent's terminal/stdio unless instructed to, so nothing stops
> your bespoke application from launching and managing its own postmaster
> as a directly managed child process - started as part of application
> initialisation - and having that listen on a local socket only available
> to the application itself; this is what we implemented in some of our
> installations where postgres is enslaved to the cluster control system -
> the configuration is re-generated at every restart (IIRC Patroni does
> something similar minus the unix socket part).
>
>
> A plethora of systems are built around the notion of programs calling
> other programs and managing the process' life cycle. The limiting factor
> to such architecture tends to be the OS's process control semantics and
> API (notoriously weaker or more contrived on non-unix-like OSs), but
> that's not postgres' fault.
>
>
>
>
>
> --
> Regards
>
> Fabio Ugo Venchiarutti
> OSPCFC Network Engineering Dpt.
> Ocado Technology
>
> --
>
>
> Notice:
> This email i

Re: Rows removed on child table when updating parent partitioned table.

2020-10-01 Thread Jonathan Strong
I've been away from coding for several years, but dusting off my chops and
getting back up to speed with PostgreSQL (love it!). So please forgive me
if my early answers here come off as naive. But my understanding of this
suggests that you shouldn't be using "update" on a serial field. I'm
guessing that under the covers things are getting confused because your
update doesn't also address the sequence that's implicitly created when you
define a field as "serial". If you use "update" I'm guessing that nextval
in the corresponding sequence is *not* updated accordingly.

Have you tried this with setval() or nextval() rather than update? You can
compare the difference between these and "update" by checking currval()
after each. Again - I apologize for incomplete knowledge here, but I'm
speculating that use of "update" on an auto-incrementing serial field is
outside intended / supported behavior, and it may well just be that it
winds up being handled differently under the covers when the data table
and/or associated sequence are partitioned.



- Jon

<https://www.linkedin.com/in/jonstrong/>
<https://www.jonathanrstrong.com>

*Jonathan Strong*

CIO / CTO / Consultant

*P:* 609-532-1715 *E:* jonathanrstr...@gmail.com

*Quora Top Writer <https://www.quora.com/profile/Jonathan-R-Strong>*


On Thu, Oct 1, 2020 at 1:00 PM Eduard Català 
wrote:

> Hi developers,
> We have a strange case where some rows are removed. I think it is a bug,
> but before notifying it I prefer to ask here where I am wrong.
>
> *Postgres 12*
> Given the following structure:
>
> create table parent ( id serial, constraint parent_pkey primary key (id))
> partition by range (id);
> create table parent_10 partition of parent for values from (0) to (10);
> create table parent_20 partition of parent for values from (11) to (20);
>
>
> create table child (
>  id serial,
>  parent_id int constraint parent_id_fk references parent(id) on update
> cascade on delete cascade);
>
> -- Notice the on update cascade on delete cascade.
>
> insert into parent values(0);
> insert into child values(1,0);
>
> -- Here are the rows
>
> postgres=# table parent;
>  id
> 
>   0
> (1 row)
>
> postgres=# table child;
>  id | parent_id
> +---
>   1 | 0
> (1 row)
>
>
> *-- Update the parent table id, with a value contained in the same
> partition*
> update parent set id = 5;
>
> postgres=# table parent;
>  id
> 
>   5
> (1 row)
>
> postgres=# table child;
>  id | parent_id
> +---
>   1 | 5
> (1 row)
>
>
> *-- Update the parent table, with a value contained into other partition*
> update parent set id = 15;
>
> postgres=# update parent set id = 15;
> UPDATE 1
> postgres=# table parent;
>  id
> 
>  15
> (1 row)
>
>
>
>
> *postgres=# table child; id | parent_id+---(0 rows)*
>
> No error or warning was thrown. The rows in the child table were removed.
> I think what has happened is: The update caused a DELETE in the table
> parent_10 (removing the rows from child table) and then the INSERT into
> parent_20.
>
> We've checked the documentation but didn't find something about this
> unexpected behaviour.
>
> Trying without "on delete cascade" clause throws a "parent key not found
> error".
>
> Thank you!
>
>
>
>
>


Re: Rows removed on child table when updating parent partitioned table.

2020-10-01 Thread Jonathan Strong
On Thu, Oct 1, 2020 at 2:02 PM David G. Johnston 
wrote:

> The convention on these lists is to inline or bottom-post, please do not
> top-post.
>
> On Thu, Oct 1, 2020 at 10:41 AM Jonathan Strong 
> wrote:
>
>> I've been away from coding for several years, but dusting off my chops
>> and getting back up to speed with PostgreSQL (love it!). So please forgive
>> me if my early answers here come off as naive. But my understanding of this
>> suggests that you shouldn't be using "update" on a serial field.
>>
>
> Yes Jonathan, your present understanding is flawed.  The OP has provided
> a self-contained simple test case for the problem at hand - which even if
> not "best practice" is indeed valid to do and demonstrates the problem
> quite clearly.  Without actually testing it out I would say that this is
> likely indeed an oversight in the partition row movement feature - it
> didn't take into account the ON UPDATE/ON DELETE clause.
>

 Understood - thx. I'll watch and learn for a while. Thanks as well for the
list etiquette advice. I'll endeavor to follow convention.

- Jon


Re: What's your experience with using Postgres in IoT-contexts?

2020-10-14 Thread Jonathan Strong
On Wed, Oct 14, 2020 at 8:49 AM Peter J. Holzer  wrote:

> On 2020-10-13 06:55:52 +0200, chlor wrote:
> > > I want to have long term storage and access to individual telegrams
> >
> > An IOT is not designed for that. It is used for control or delivery of
> > data to a server.
>
> That's a rather dogmatic and narrow-minded point of view. "IOT" means
> "Internet of things". There are many things which which could benefit
> from network connectivity and don't necessarily need a central server or
> may even act as servers for other "things".
>
> It all depends on the application and the "thing".
>
> > Long term storage also means backup and recovery and I don't think you
> > have that planned for your IOT.
>
> That depends on how valuable those data are.
>
> hp
>
> --
>_  | Peter J. Holzer|
>
>
Indeed. IoT architecture also begs the question of "when" detailed
historical data may be needed, and how Edge Computing can factor into the
overall solution model. Detailed transactions may live "at the edge" while
needed aggregate / extract info is communicated to a central server to
support real time response. But those left-behind detailed transactions may
(or may not) follow later on via a lower priority / non- real time path if
relevant and eventually valuable. Some examples I've had to work with:

When calculating real-time Equity / Security Index values, you might
capture tick by tick data for each security in an index valuation formula.
Just one security in an index (e.g., MSFT) could easily generate more than
100,000 ticks per day. One of the Large Cap indices currently has about
3,500 stocks in it. They might not all trade as frequently as MSFT, but you
might see anywhere from 10 million to 100 million data points in a day.
While this differs from IoT in that data sources aren't physically
separated and as numerous as individual IoT devices, the challenge is
similar in that a good real time architecture makes use of needed data at
various stages in the process flow (and data network flow) and defers
functions that can wait, including perhaps committing full details of every
transaction to centralized long term storage, as long as the computed Index
value can be published in real time.

Years ago we developed an online gaming platform supporting hundreds of
thousands of concurrent users who came in from numerous countries around
the world. Real-time scoring and chat communications posed challenges
similar to the Equity Index solution above. We needed to be able to accept
play data from thousands of concurrent players and have a game (or chat
room) respond in near real time, but full detailed data could be queued up
and gradually transmitted, processed, assimilated and committed to long
term storage.

In health care data collection we see similar challenges: real time IoT
biosensors may capture blood oximetry, glucose, lactate info, heart rate,
etc. Some of this may be critical for real time monitoring and processing.
Some gets processed "at the Edge" - aggregated, filtered, interpreted, etc.
before getting to central / long term storage.

Deciding the level of detail that actually has to reach centralized long
term storage - and when - is typically a non-trivial exercise. When you
look at examples like monitoring a jet engine, gas turbines, or an air
conditioner manufacturer and service company (one of my past clients)
monitoring hundreds of thousands of HVAC units distributed around the
country, data samples go past terabytes to petabytes, exabytes and more.

While you need to figure out how to trim the raw data to amounts that can
reasonably be stored and managed, I've seen too many cases of being overly
aggressive in discarding data thought to be superfluous; thoughtful
analysis is critical here.

- Jon