Re: Rationale for aversion to the central database?
On April 8, 2018 02:40:46 pm PDT, "Guyren Howe" wrote: I am a Rails developer at a medium-large size company. I’ve mostly worked at smaller companies. I’ve some exposure to other web development communities. When it comes to databases, I have universally encountered the attitude that one should treat the database as a dumb data bucket. There is a *very* strong aversion to putting much of any business logic in the database. I encounter substantial aversion to have multiple applications access one database, or even the reverse: all abstraction should be at the application layer. My best theory is that these communities developed at a time when Windows was more dominant, and just generally it was *significantly* easier to use MySQL than Postgres for many, particularly new, developers. And it is pretty reasonable to adopt an aversion to sophisticated use of the database in that case. This attitude has just continued to today, even as many of them have switched to Postgres. This is only a hypothesis. I am now officially researching the issue. I would be grateful for any wisdom from this community. Aside: it is rare to find a situation in life or anywhere where one widely adopted thing is worse in *every way* than another thing, but this certainly was and largely still continues to be the case when one compares MySQL and Postgres. So why do folks continue to use MySQL? I find this mystifying. === Hi there. This issue is close to my heart and I'm with you. I am however very comfortable with using psql and PL/pgSQL and I am very opinionated. I feel *very* strongly that a database that actually matters and where RI is critical, i.e., any PG db I handle, should make sense on its own and be *highly* usable on its own. It should not be dependent on some particular external application code to use it or make sense of things. It follows that I think nonintuituve exceptions/gotchas should be *clear* at a db level, likely using functions to encapsulate that information. Sure, PL/pgSQL may possibly be slow at some things like doing lots of bigint math, but I would probably use C and ECPG for the appropriate cases. Not a large percentage of programmers these days know how fast db tasks can be because they are used to working with relatively slow tools and frameworks. ( Yes, typical Python.) I am also highly mystified by the dumbstore approach and frankly, I think that folks should KNOW their tools better. Not knowing how to use your database effectively typically results in unnecessary and often very convoluted application code, from my experience. I keep hearing about db portability but I have yet to see cases where db logic was an issue. But to be honest, I haven't seen many migrations at all. Why? Because I think that it rarely ever happens. If I had to do it, I sure as heck hope that the db was "clean" and understandable without having to review some probably awful app. code. Why would anyone migrate *away* from PG anyway? :) One advantage to using logic and functions in the db is that you can fix things immediately without having to make new application builds. That in itself is a huge advantage, IMO. Cheers, -g
Re: Rationale for aversion to the central database?
On April 24, 2018 07:27:59 am PDT, "Sam Gendler" wrote: On Sun, Apr 8, 2018 at 15:37 [1]g...@luxsci.net <[2]g...@luxsci.net> wrote: On April 8, 2018 02:40:46 pm PDT, "Guyren Howe" <[3]guy...@gmail.com> wrote: One advantage to using logic and functions in the db is that you can fix things immediately without having to make new application builds. That in itself is a huge advantage, IMO. I doubt most of us would consider this any kind of advantage outside of the momentary temptation to do it when an app is completely broken and needs to be up in a hurry. Application changes, whether in the dB or in application logic, need to be tested, and they need to be revision controlled and released in a manner that can be easily rolled back in an automated manner. The fact that putting logic in the database can effectively allow developers to make unreleased changes to production apps is specifically one of the problems that I am trying to avoid when I keep most logic in the app instead of the dB. It’s a whole lot harder to make arbitrary manual changes to code in the app, whether interpreted or compiled, if it is running inside a container that cannot be updated. Even if you go in with a shell and update an interpreted file, the next time that container is launched the change will be lost, which is usually sufficient motivation to keep devs from doing that kind of thing. I’ll put some things in the db, either for performance or because I want that logic to be built into the data and not be part of the application, but I choose those contexts carefully and I write them in as portable a manner as possible. And for those who say migrations don’t happen, I’ve certainly been through a few, usually as part of an acquisition or the like, but sometimes simply because another dB server better meets our needs after a time. And migrating stored procs can be really difficult. Such code usually has less complete unit and integration tests, which makes validating those changes more difficult, too. But the biggest reason is that databases often have to scale up rather than out, so keeping as much logic in the application code allows my scaling requirements for the dB server to be as minimal as possible. Sure, there are workloads where pushing raw data across the wire will be more work than processing it in the dB, and in those cases, I may do that, but I consider it premature optimization to just assume that is necessary without hard evidence from production examples to suggest otherwise. Finally, there’s the consistency argument. I want to find all of the logic in one place. Either entirely in the source code or entirely in the dB. Having to trace things from the code to the dB and back again can make it a whole lot harder to see, at a glance, what is happening in the code. Having logic in the dB also means it can be difficult or impossible to have two releases talking to the same schema version at the same time - so canary builds and rolling deployments can be difficult. Of course, schema changes can cause this problem, regardless of whether there are stored procs, but the more of your logic that lives in the db, the more likely it is that your releases will conflict over the db. So I’m more likely to be able to do a rolling release if I keep the db as a dumb data store and keep logic in the application code. === I could have worded that better but I think that we're coming at it from different directions. You think of your application as the "master" operator. I think of a PG db as the "master", not a slave. I believe that we shouldn't _have_ to use an external application for the database to be useful and coherent. I like to think of external applications as subservient to the db and not the other way around. Yeah, I know, probably not a popular viewpoint. Sorry, I don't really understand why it would be so hard to migrate, say pl/pgsql functions. You can maybe expect to write some likely convoluted application code, though. :) Reusable functions in the db that are solid also means that developers don't have to reinvent the wheel in whatever language and debugging also becomes simpler. Perhaps I'm extreme. In my ideal world, developers might not even know table names! I'm kidding ,sorta... Thanks much for your comments, Gerry References Visible links 1. mailto:g...@luxsci.net 2. mailto:g...@luxsci.net 3. mailto:guy...@gmail.com
Re: Rationale for aversion to the central database?
On April 28, 2018 11:18:02 am PDT, "Peter J. Holzer" wrote: On 2018-04-28 09:54:27 -0500, Steven Lembark wrote: On Sat, 28 Apr 2018 08:02:21 +0200 "Peter J. Holzer" wrote: > On 2018-04-27 22:52:39 +, g...@luxsci.net wrote: > > Perhaps I'm extreme. In my ideal world, developers might not even > > know table names! I'm kidding ,sorta... > > If they don't know the table names, how can they write those stored > procedures? One of the main reasons DBA's need to be pigs. Much spaghetti can be avoided in ORM frameworks with well-designed views. Apparently my remark was too short (and perhaps with too little context) to be clear. As I understood g...@luxsci.net, they were arguing for writing lots of stored procedures so that developers wouldn't even have to know the table names (they would just call the procedures) [Although I now see that it was somebody else who proposed that as the only interface]. But those stored procedures don't write themselves. Somebody has to write them and a person who writes code is called a developer. So there needs to be at least one developer who knows the table names - the developer of the stored procedures. And that developer should better be good - API design is hard, and if you take away SQL from your fellow developers you should give them something better (for the task at hand), not something worse. === Yes, sorry, I meant application developers. Well, actually I meant developers who only look at the db through an ORM and who like it that way. At my current position, the ones called "developers" all use Django, and they practically never look at ( let alone interact with) the db using psql. Forget about writing and using pgsql functions. There are no DBAs or database developers. I am trying to change this culture a bit but I may give up soon. Tragic, really, given the mostly terrible performance that they get with naive Django code. Some of these developers have even gone so far as to proclaim that PG is too slow ( they didn't even configure it !) for their "big" data, which really aren't big, and move on to something like ES, which they really don't need. For example, they have code that could benefit from simple FTS operations, and instead of taking seconds or minutes, could be done in milliseconds. Or perhaps they could not execute certain queries for user interfaces quickly enough with their Django code so they think that they need ES for those queries. So now they have to manage and design their ES implementations, activities which frankly should _not_ be taken lighlty, and manage their data in the PG db which they eventually come back to for at least some sane data integrity. Now they have to perform very long and convoluted processes to keep both in sync. Ouch. Know your tools! But there is no all or nothing, as others have pointed out. Some things are better done and more easily managed outside of the DB, at least for me. For example, I tried doing lots of bigint math in pgsql and I moved to external C code for that. That C code still called tried and true pgsql convenience functions for getting at the correct data however, and these were the same functions that my Java code used for other activities. Gerry