I have shared only observations based on my load test, I am happy to discuss any specifics if anyone is interested
Jigar > On Oct 14, 2024, at 12:51 PM, Jarek Potiuk <ja...@potiuk.com> wrote: > > Hello Jigar, > > I think if someone wants to do a fair comparison of performance and cost, a > detailed analysis of multiple cases, users, usage patterns and statistics > if needed - rather than anecdotal evidence and single installation done not > following the best practices (i.e. pgbouncer). > > The statement "cost is higher" is an interesting anecdotal evidence of a > single case, and there are multiple options of running Postgres and MysQL, > so comparing costs is rather difficult. It might or might not be true in > concrete cases. > > I generally try to avoid making judgement based on very limited data and > generalise based on a single case - it's often misleading and dangerous to > draw such conclusions. And publicly stating them as "facts" is a bit of > misinformation. That's why we avoid making a "definite" statement like > "Postgres is slower" and "Postgres is not good at all". We try to judge > stats and signals we have and draw conclusions when we have > statistically significant data - that makes it more likely we make better > judgment. > > I find it quite amusing how someone could reliably draw such conclusions > based on a single example. Sounds a bit early and skewed and I'd recommend > avoiding it. > > On the other hand the locking problems are based on statistical > comparison of our issues that are reported and surveys we run every year. > Last one - > https://docs.google.com/forms/d/1wYm6c5Gn379zkg7zD7vcWB-1fCjnOocT0oZm-tjft_Q/viewanalytics > - where 75% of our users use Postgres. If you look at a number of locking > issues in Postgres compared to MySQL (similar to what Damian saw) - you can > get your numbers and it's about 10:1 for MySQL - so if you combine the two, > the average ratio of Mysql vs. Postgres is 40:1 - quite high to be > statistically significant, so the statement "MySQL has problems with > locking" is pretty justified. > > But If you would like to lead that effort and do it Jigar - if there is > no-one who has done such statistics, it would be great if you lead it and > get such statistics to back-up your statements. That would be an awesome > decision point for the community if someone had done such reliable > statistically significant analysis. > > J. > >> On Mon, Oct 14, 2024 at 9:19 PM Jigar Parekh <ji...@vizeit.com> wrote: >> >> I have observed that allocating right amount of memory-cpu helps to avoid >> issues with MySQL >> >> Jigar >> >>> On Oct 14, 2024, at 11:51 AM, Damian Shaw <ds...@striketechnologies.com> >> wrote: >>> >>> I can't speak to general performance between MySQL and PostgreSQL, but >> I can tell you a real-world specific issue I we have faced being on MySQL >> for Airflow. >>> >>> MySQL locking and Airflow's rendering of task fields do not play nice, >> and we see many errors (the relevant GitHub issue: >> https://github.com/apache/airflow/issues/32342). It has led us to avoid >> using Dynamic Task Mapping, as well as nesting of Task Groups, while we are >> still on MySQL, which is a shame because these features would be a natural >> fit to solve some of our problems. >>> >>> Damian >>> >>> >>> >>> -----Original Message----- >>> From: Jigar Parekh <ji...@vizeit.com> >>> Sent: Monday, October 14, 2024 2:35 PM >>> To: dev@airflow.apache.org >>> Subject: Re: [DISCUSSION] Plans for Database backend >>> >>> In my understanding, the slower performance of PostgreSQL is a known >> behavior for write intensive applications. PGBouncer used for connection >> pooling cannot change/improve that. And Airflow with multiple DAGs and/or >> dynamic tasks with heavy workload will be write intensive. I have done >> extensive tests and have collected database statistics that show the >> bottleneck. I can share the details if you would like to see. I am >> wondering if there are any load tests or comparisons done so far by anyone >> in the community. Unless I am doing something totally wrong, PostgreSQL >> actually does not seem to be a right choice at all; slower performance and >> higher cost (due to PGBouncer process) compared to MySQL >>> >>> Jigar >>> >>>> On Oct 14, 2024, at 2:41 AM, Jarek Potiuk <ja...@potiuk.com> wrote: >>>> >>>> MySQL is not going away. You can use it if you want. We have no plans >>>> to remove it. >>>> >>>> The advice did not change. Postgres is generally more stable that's >>>> why we recommend it. MySQL has much worse locking behaviour that is >>>> somewhat unpredictable and - especially when you use backfills - it is >>>> known to generate occasional deadlocks. This is likely why you got the >>>> advice (quite likely by me). >>>> >>>> But in general it's fully supported and works and we have no plans to >>>> get rid of it. >>>> >>>> Re: cost/price - it's up to you to choose and pay for services you use >>>> - we do not compare pricing of all possible available services. >>>> >>>> Also if you follow the best practices for Postgres (use pgbouncer: >>>> https://airflow.apache.org/docs/apache-airflow/stable/howto/set-up-dat >>>> abase.html#setting-up-a-postgresql-database), >>>> it's unlikely you will have slower postgres with similar machine/setup. >>>> >>>> J. >>>> >>>> >>>>>> On Mon, Oct 14, 2024 at 7:38 AM Jigar Parekh <ji...@vizeit.com> >> wrote: >>>>> >>>>> Back in June’24, I had a discussion on Slack about a database issue. >>>>> My database backend for the Airflow instance is MySQL. It was >>>>> recommended to migrate to PostgreSQL to resolve such issues. I was >>>>> also told that MySQL may not be supported in the future versions. I >>>>> configured PostgreSQL and performed few tests to compare both the DBs >>>>> with the type of heavy workload I was expecting for my airflow >>>>> instance in the production environment. The test results did not show >>>>> a reason to switch from MySQL to PostgreSQL. In fact, PostgreSQL >>>>> performed slower and airflow configuration cost more compared to >>>>> MySQL. I wanted to start this discussion to find out if others have >>>>> any similar observations about Postgres and what is Airflow community >> planning to do about MySQL support in the future versions? >>>>> >>>>> Jigar >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>>>> For additional commands, e-mail: dev-h...@airflow.apache.org >>>>> >>>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>> For additional commands, e-mail: dev-h...@airflow.apache.org >>> >>> ________________________________ >>> Strike Technologies, LLC (“Strike†) is part of the GTS family of >> companies. Strike is a technology solutions provider, and is not a broker >> or dealer and does not transact any securities related business directly >> whatsoever. This communication is the property of Strike and its >> affiliates, and does not constitute an offer to sell or the solicitation of >> an offer to buy any security in any jurisdiction. It is intended only for >> the person to whom it is addressed and may contain information that is >> privileged, confidential, or otherwise protected from disclosure. >> Distribution or copying of this communication, or the information contained >> herein, by anyone other than the intended recipient is prohibited. If you >> have received this communication in error, please immediately notify Strike >> at i...@striketechnologies.com, and delete and destroy any copies hereof. >>> ________________________________ >>> >>> CONFIDENTIALITY / PRIVILEGE NOTICE: This transmission and any >> attachments are intended solely for the addressee. This transmission is >> covered by the Electronic Communications Privacy Act, 18 U.S.C ''2510-2521. >> The information contained in this transmission is confidential in nature >> and protected from further use or disclosure under U.S. Pub. L. 106-102, >> 113 U.S. Stat. 1338 (1999), and may be subject to attorney-client or other >> legal privilege. Your use or disclosure of this information for any purpose >> other than that intended by its transmittal is strictly prohibited, and may >> subject you to fines and/or penalties under federal and state law. If you >> are not the intended recipient of this transmission, please DESTROY ALL >> COPIES RECEIVED and confirm destruction to the sender via return >> transmittal. >>> >> B‹KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB• È >> [œÝXœØÜšX™K K[XZ[ ˆ ]‹][œÝXœØÜšX™P Z\™› ݢ\ XÚ K›Ü™ÃB‘›Üˆ Y ] [Û˜[ >> ÛÛ[X[™ Ë K[XZ[ ˆ ]‹Z [ Z\™› ݢ\ XÚ K›Ü™ÃB >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >> For additional commands, e-mail: dev-h...@airflow.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org For additional commands, e-mail: dev-h...@airflow.apache.org