Hello Jigar, I think if someone wants to do a fair comparison of performance and cost, a detailed analysis of multiple cases, users, usage patterns and statistics if needed - rather than anecdotal evidence and single installation done not following the best practices (i.e. pgbouncer).
The statement "cost is higher" is an interesting anecdotal evidence of a single case, and there are multiple options of running Postgres and MysQL, so comparing costs is rather difficult. It might or might not be true in concrete cases. I generally try to avoid making judgement based on very limited data and generalise based on a single case - it's often misleading and dangerous to draw such conclusions. And publicly stating them as "facts" is a bit of misinformation. That's why we avoid making a "definite" statement like "Postgres is slower" and "Postgres is not good at all". We try to judge stats and signals we have and draw conclusions when we have statistically significant data - that makes it more likely we make better judgment. I find it quite amusing how someone could reliably draw such conclusions based on a single example. Sounds a bit early and skewed and I'd recommend avoiding it. On the other hand the locking problems are based on statistical comparison of our issues that are reported and surveys we run every year. Last one - https://docs.google.com/forms/d/1wYm6c5Gn379zkg7zD7vcWB-1fCjnOocT0oZm-tjft_Q/viewanalytics - where 75% of our users use Postgres. If you look at a number of locking issues in Postgres compared to MySQL (similar to what Damian saw) - you can get your numbers and it's about 10:1 for MySQL - so if you combine the two, the average ratio of Mysql vs. Postgres is 40:1 - quite high to be statistically significant, so the statement "MySQL has problems with locking" is pretty justified. But If you would like to lead that effort and do it Jigar - if there is no-one who has done such statistics, it would be great if you lead it and get such statistics to back-up your statements. That would be an awesome decision point for the community if someone had done such reliable statistically significant analysis. J. On Mon, Oct 14, 2024 at 9:19 PM Jigar Parekh <ji...@vizeit.com> wrote: > I have observed that allocating right amount of memory-cpu helps to avoid > issues with MySQL > > Jigar > > > On Oct 14, 2024, at 11:51 AM, Damian Shaw <ds...@striketechnologies.com> > wrote: > > > > I can't speak to general performance between MySQL and PostgreSQL, but > I can tell you a real-world specific issue I we have faced being on MySQL > for Airflow. > > > > MySQL locking and Airflow's rendering of task fields do not play nice, > and we see many errors (the relevant GitHub issue: > https://github.com/apache/airflow/issues/32342). It has led us to avoid > using Dynamic Task Mapping, as well as nesting of Task Groups, while we are > still on MySQL, which is a shame because these features would be a natural > fit to solve some of our problems. > > > > Damian > > > > > > > > -----Original Message----- > > From: Jigar Parekh <ji...@vizeit.com> > > Sent: Monday, October 14, 2024 2:35 PM > > To: dev@airflow.apache.org > > Subject: Re: [DISCUSSION] Plans for Database backend > > > > In my understanding, the slower performance of PostgreSQL is a known > behavior for write intensive applications. PGBouncer used for connection > pooling cannot change/improve that. And Airflow with multiple DAGs and/or > dynamic tasks with heavy workload will be write intensive. I have done > extensive tests and have collected database statistics that show the > bottleneck. I can share the details if you would like to see. I am > wondering if there are any load tests or comparisons done so far by anyone > in the community. Unless I am doing something totally wrong, PostgreSQL > actually does not seem to be a right choice at all; slower performance and > higher cost (due to PGBouncer process) compared to MySQL > > > > Jigar > > > >> On Oct 14, 2024, at 2:41 AM, Jarek Potiuk <ja...@potiuk.com> wrote: > >> > >> MySQL is not going away. You can use it if you want. We have no plans > >> to remove it. > >> > >> The advice did not change. Postgres is generally more stable that's > >> why we recommend it. MySQL has much worse locking behaviour that is > >> somewhat unpredictable and - especially when you use backfills - it is > >> known to generate occasional deadlocks. This is likely why you got the > >> advice (quite likely by me). > >> > >> But in general it's fully supported and works and we have no plans to > >> get rid of it. > >> > >> Re: cost/price - it's up to you to choose and pay for services you use > >> - we do not compare pricing of all possible available services. > >> > >> Also if you follow the best practices for Postgres (use pgbouncer: > >> https://airflow.apache.org/docs/apache-airflow/stable/howto/set-up-dat > >> abase.html#setting-up-a-postgresql-database), > >> it's unlikely you will have slower postgres with similar machine/setup. > >> > >> J. > >> > >> > >>>> On Mon, Oct 14, 2024 at 7:38 AM Jigar Parekh <ji...@vizeit.com> > wrote: > >>> > >>> Back in June’24, I had a discussion on Slack about a database issue. > >>> My database backend for the Airflow instance is MySQL. It was > >>> recommended to migrate to PostgreSQL to resolve such issues. I was > >>> also told that MySQL may not be supported in the future versions. I > >>> configured PostgreSQL and performed few tests to compare both the DBs > >>> with the type of heavy workload I was expecting for my airflow > >>> instance in the production environment. The test results did not show > >>> a reason to switch from MySQL to PostgreSQL. In fact, PostgreSQL > >>> performed slower and airflow configuration cost more compared to > >>> MySQL. I wanted to start this discussion to find out if others have > >>> any similar observations about Postgres and what is Airflow community > planning to do about MySQL support in the future versions? > >>> > >>> Jigar > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > >>> For additional commands, e-mail: dev-h...@airflow.apache.org > >>> > >>> > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > For additional commands, e-mail: dev-h...@airflow.apache.org > > > > ________________________________ > > Strike Technologies, LLC (“Strike†) is part of the GTS family of > companies. Strike is a technology solutions provider, and is not a broker > or dealer and does not transact any securities related business directly > whatsoever. This communication is the property of Strike and its > affiliates, and does not constitute an offer to sell or the solicitation of > an offer to buy any security in any jurisdiction. It is intended only for > the person to whom it is addressed and may contain information that is > privileged, confidential, or otherwise protected from disclosure. > Distribution or copying of this communication, or the information contained > herein, by anyone other than the intended recipient is prohibited. If you > have received this communication in error, please immediately notify Strike > at i...@striketechnologies.com, and delete and destroy any copies hereof. > > ________________________________ > > > > CONFIDENTIALITY / PRIVILEGE NOTICE: This transmission and any > attachments are intended solely for the addressee. This transmission is > covered by the Electronic Communications Privacy Act, 18 U.S.C ''2510-2521. > The information contained in this transmission is confidential in nature > and protected from further use or disclosure under U.S. Pub. L. 106-102, > 113 U.S. Stat. 1338 (1999), and may be subject to attorney-client or other > legal privilege. Your use or disclosure of this information for any purpose > other than that intended by its transmittal is strictly prohibited, and may > subject you to fines and/or penalties under federal and state law. If you > are not the intended recipient of this transmission, please DESTROY ALL > COPIES RECEIVED and confirm destruction to the sender via return > transmittal. > > > B‹KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB• È > [œÝXœØÜšX™K K[XZ[ ˆ ]‹][œÝXœØÜšX™P Z\™› ݢ\ XÚ K›Ü™ÃB‘›Üˆ Y ] [Û˜[ > ÛÛ[X[™ Ë K[XZ[ ˆ ]‹Z [ Z\™› ݢ\ XÚ K›Ü™ÃB > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > For additional commands, e-mail: dev-h...@airflow.apache.org > >