Re: Rationale for aversion to the central database?

2018-04-08 Thread g...@luxsci.net

 
 
On April 8, 2018 02:40:46 pm PDT, "Guyren Howe"  wrote:
 
I am a Rails developer at a medium-large size company. I’ve mostly 
worked at smaller companies. I’ve some exposure to other web 
development communities.


When it comes to databases, I have universally encountered the attitude 
that one should treat the database as a dumb data bucket. There is a 
*very* strong aversion to putting much of any business logic in the 
database. I encounter substantial aversion to have multiple 
applications access one database, or even the reverse: all abstraction 
should be at the application layer.


My best theory is that these communities developed at a time when 
Windows was more dominant, and just generally it was *significantly* 
easier to use MySQL than Postgres for many, particularly new, 
developers. And it is pretty reasonable to adopt an aversion to 
sophisticated use of the database in that case.


This attitude has just continued to today, even as many of them have 
switched to Postgres.


This is only a hypothesis. I am now officially researching the issue. I 
would be grateful for any wisdom from this community.


Aside: it is rare to find a situation in life or anywhere where one 
widely adopted thing is worse in *every way* than another thing, but 
this certainly was and largely still continues to be the case when one 
compares MySQL and Postgres. So why do folks continue to use MySQL? I 
find this mystifying.


===

Hi there. This issue is close to my heart and I'm with you.  I am 
however very comfortable with using psql and PL/pgSQL and I am very 
opinionated.
I feel *very* strongly that a database that actually matters and where 
RI is critical, i.e., any PG db I handle, should make sense on its own 
and be *highly* usable on its own. It should not be dependent on some 
particular external application code to use it or make sense of things. 
It follows that I think nonintuituve exceptions/gotchas should be 
*clear* at a db level, likely using functions to encapsulate that 
information.


Sure, PL/pgSQL may possibly be slow at some things like doing lots of 
bigint math, but I would probably use C and ECPG for the appropriate 
cases.
Not a large percentage of programmers these days know how fast db tasks 
can be because they are used to working with relatively slow tools and 
frameworks. ( Yes, typical Python.)


I am also highly mystified by the dumbstore approach and frankly, I 
think that folks should KNOW their tools better. Not knowing how to use 
your database effectively typically results in unnecessary and often 
very convoluted application code, from my experience.


I keep hearing about db portability but I have yet to see cases where 
db logic was an issue. But to be honest, I haven't seen many migrations 
at all. Why? Because I think that it rarely ever happens. If I had to 
do it, I sure as heck hope that the  db was "clean" and understandable 
without
having to review some probably awful app. code. Why would anyone 
migrate *away* from PG anyway? :)


One advantage to using logic and functions in  the db is that you can 
fix things immediately without having to make new application builds. 
That in itself is a huge advantage, IMO.


Cheers,
-g



Re: Rationale for aversion to the central database?

2018-04-27 Thread g...@luxsci.net

 
 
On April 24, 2018 07:27:59 am PDT, "Sam Gendler" 
 wrote:

 
 
On Sun, Apr 8, 2018 at 15:37 [1]g...@luxsci.net <[2]g...@luxsci.net> wrote:

   
   
  On April 8, 2018 02:40:46 pm PDT, "Guyren Howe" <[3]guy...@gmail.com> wrote:
  One advantage to using logic and functions in  the db is that you 
can fix things immediately without having to make new application 
builds. That in itself is a huge advantage, IMO.


 
 
I doubt most of us would consider this any kind of advantage outside of 
the momentary temptation to do it when an app is completely broken and 
needs to be up in a hurry. Application changes, whether in the dB or in 
application logic, need to be tested, and they need to be revision 
controlled and released in a manner that can be easily rolled back in 
an automated manner. The fact that putting logic in the database can 
effectively allow developers to make unreleased changes to production 
apps is specifically one of the problems that I am trying to avoid when 
I keep most logic in the app instead of the dB. It’s a whole lot 
harder to make arbitrary manual changes to code in the app, whether 
interpreted or compiled, if it is running inside a container that 
cannot be updated. Even if you go in with a shell and update an 
interpreted file, the next time that container is launched the change 
will be lost, which is usually sufficient motivation to keep devs from 
doing that kind of thing.

 
I’ll put some things in the db, either for performance or because I 
want that logic to be built into the data and not be part of the 
application, but I choose those contexts carefully and I write them in 
as portable a manner as possible. And for those who say migrations 
don’t happen, I’ve certainly been through a few, usually as part of 
an acquisition or the like, but sometimes simply because another dB 
server better meets our needs after a time. And migrating stored procs 
can be really difficult. Such code usually has less complete unit and 
integration tests, which makes validating those changes more difficult, 
too.

 
But the biggest reason is that databases often have to scale up rather 
than out, so keeping as much logic in the application code allows my 
scaling requirements for the dB server to be as minimal as possible. 
Sure, there are workloads where pushing raw data across the wire will 
be more work than processing it in the dB, and in those cases, I may do 
that, but I consider it premature optimization to just assume that is 
necessary without hard evidence from production examples to suggest 
otherwise.

 
Finally, there’s the consistency argument. I want to find all of the 
logic in one place. Either entirely in the source code or entirely in 
the dB. Having to trace things from the code to the dB and back again 
can make it a whole lot harder to see, at a glance, what is happening 
in the code. Having logic in the dB also means it can be difficult or 
impossible to have two releases talking to the same schema version at 
the same time - so canary builds and rolling deployments can be 
difficult. Of course, schema changes can cause this problem, regardless 
of whether there are stored procs, but the more of your logic that 
lives in the db, the more likely it is that your releases will conflict 
over the db. So I’m more likely to be able to do a rolling release if 
I keep the db as a dumb data store and keep logic in the application 
code.


===

I could have worded that better but I think that we're coming at it 
from different directions. You think of your application as the 
"master" operator. I think of a PG db as the "master", not a slave. I 
believe that we shouldn't _have_ to use an external application for the 
database to be useful and coherent.  I like to think of external 
applications as subservient to the db and not the other way around. 
Yeah, I know, probably not a popular viewpoint.


Sorry, I don't really understand why it would be so hard to migrate, 
say pl/pgsql functions. You can maybe expect to write some likely 
convoluted application code, though. :) Reusable functions in the db 
that are solid also means that developers don't have to reinvent the 
wheel in whatever language and debugging also becomes simpler.


Perhaps I'm extreme. In my ideal world, developers might not even know 
table names! I'm kidding ,sorta...


Thanks much for your comments,
Gerry
 

References

Visible links
1. mailto:g...@luxsci.net
2. mailto:g...@luxsci.net
3. mailto:guy...@gmail.com



Re: Rationale for aversion to the central database?

2018-04-28 Thread g...@luxsci.net

 
 
On April 28, 2018 11:18:02 am PDT, "Peter J. Holzer"  wrote:

On 2018-04-28 09:54:27 -0500, Steven Lembark wrote:

On Sat, 28 Apr 2018 08:02:21 +0200
"Peter J. Holzer"  wrote:

> On 2018-04-27 22:52:39 +, g...@luxsci.net wrote:
> > Perhaps I'm extreme. In my ideal world, developers might not even
> > know table names! I'm kidding ,sorta... 
>
> If they don't know the table names, how can they write those stored
> procedures?

One of the main reasons DBA's need to be pigs. Much spaghetti can be
avoided in ORM frameworks with well-designed views.


Apparently my remark was too short (and perhaps with too little context)
to be clear.

As I understood g...@luxsci.net, they were arguing for writing lots of
stored procedures so that developers wouldn't even have to know the
table names (they would just call the procedures) [Although I now see
that it was somebody else who proposed that as the only interface].

But those stored procedures don't write themselves. Somebody has to
write them and a person who writes code is called a developer. So there
needs to be at least one developer who knows the table names - the
developer of the stored procedures. And that developer should better be
good - API design is hard, and if you take away SQL from your fellow
developers you should give them something better (for the task at hand),
not something worse.

===

Yes, sorry, I meant application developers. Well, actually I meant 
developers who only look at the db through an ORM and who like it that 
way. At my current position, the ones called "developers" all use 
Django, and they practically never look at ( let alone interact with) 
the db using psql. Forget about writing and using pgsql functions. 
There are no DBAs or database developers. I am trying to change this 
culture a bit but I may give up soon. Tragic, really, given the mostly 
terrible performance that they get with naive Django code. Some of 
these developers have even gone so far as to proclaim that PG is too 
slow ( they didn't even configure it !) for their "big" data, which 
really aren't big, and move on to something like ES, which they really 
don't need. For example, they have code that could benefit from simple 
FTS operations, and instead of taking seconds or minutes, could be done 
in milliseconds. Or perhaps they could not execute certain queries for 
user interfaces quickly
enough with their Django code so they think that they need ES for those 
queries. So now they have to manage and design their ES 
implementations, activities which frankly should _not_ be taken 
lighlty, and manage their data in the PG db which they eventually come 
back to for at least some sane data integrity. Now they have to perform 
very long and convoluted processes to keep both in sync. Ouch. Know 
your tools!


But there is no all or nothing, as others have pointed out. Some things 
are better done and more easily managed outside of the DB, at least for 
me.
For example, I tried doing lots of bigint math in pgsql and I moved to 
external C code for that.
That C code still called tried and true pgsql convenience functions for 
getting at the correct data however, and these were the same functions 
that my Java code used for other activities.


Gerry