Clint Byrum <cl...@fewbar.com> wrote:
> > So I think I didn't speak clearly enough here. The benchmarks are of > course needed, but there's a tipping point when write activity gets to > a certain level where it's cheaper to let it get a little skewed and > correct asynchronously. This is not unique to SQL, this is all large > scale distributed systems. There's probably a super cool formula for it > too, but roughly it is > > (num_trans_per_s * cost_of_fk_check_per_trans) > > versus > > (error_cost * error_rate)+(cost_find_all_errors/seconds_to_find_all_errors) Well the error cost here would be a database that would be “corrupted”, meaning it has rows which no longer refer to things that exist and the database is now in a case where it may very well be unusable by the application, without being rolled back to some known state. If Keystone truly doesn’t care about ACID it might want to consider MyISAM tables, which are faster for read-heavy workloads, though these aren’t compatible with Galera. > So it's not really something I think one can blindly accept as "better", > but rather something that one needs to calculate for themselves. You say > cost_of_fk_check_per_trans is negligible, but that has been measured as > not true in the past: > > http://www.percona.com/blog/2010/09/20/instrumentation-and-the-cost-of-foreign-keys/ That’s not a surprising case because the “parent” row being modified is being referred to by the “child” row that’s still in transaction. This is an implementation detail of the ACID guarantees which one gets when they use a relational database. If Keystone’s relational backend in fact has a performance bottleneck due to an operation like this, it should be visited individually. But I think it’s extremely unlikely this is actually the case. > That article demonstrates that the FK adds lock contention in > InnoDB. There's more. With NDB (MySQL cluster) it's an 18% performance > hit on raw throughput: > > http://johanandersson.blogspot.com/2013/06/benchmarking-performance-impact-of.html For NDB cluster, foreign key support was only added to that system two years ago, in version 5.6.10 in 2013. This is clearly not a system designed to support foreign keys in the first place, the feature is entirely bleeding edge for that specific system, and performance like that is entirely atypical outside for database systems outside of NDB cluster. Specifically with Openstack, the clustering solution usually used is Galera which has no such performance issue. So sure, if you’re using NDB cluster, FOREIGN KEY support is bleeding edge and you may very well want to disable constraints as you’re using a system that wasn’t designed with this use case in mind. But because using a relational database is somewhat pointless if you don’t need ACID, I’d probably use Galera instead. > > Now, where that point is with Keystone I don't know. The point is, if you > write the code relying on the existence, Keystone becomes a vertically > scaling app that cannot ever scale out beyond whatever that limit is. There seems to be some misunderstanding that using foreign keys to enforce referential integrity seems to imply that the application is now dependent on these constraints being in place. I notice that the conversation was originally talking a bit about allowing rows to be deleted using CASCADE, and my original question referred to the notion of foreign key use *overall*, not specifically as a means to offer automatic deletion of related rows with CASCADE. The use of foreign key constraints in openstack applications does not imply an unbreakable reliance upon them at all, for two reasons. For the first reason, foreign keys first and foremost offer nothing more than an integrity guarantee that prevents a particular row from being deleted or having its primary key modified such that other rows which refer to that primary key would now be left with an invalid reference. At this level, you can have an application that is working perfectly, you can then shut off the foreign key constraints entirely, and the application will continue to work perfectly with no change; the difference is only that if the application at some point failed, due to bugs either present or newly introduced, or if an end user tried to tinker with the database directly, operations which would leave referencing rows hanging will not be blocked. So the use of foreign keys at this level has nothing to do with the application relying upon their existence; they are nothing more than integrity guarantees which can be silently removed. For the second, within the realm of ON DELETE CASCADE specifically, Keystone like all other Openstack applications uses SQLAlchemy for relational persistence. SQLAlchemy supports foreign key constraints that support or don’t support ON DELETE CASCADE transparently; meaning, whether or not a foreign key can be relied upon to delete dependent rows is nothing more than a configuration option in the mappings; SQLAlchemy will continue to emit a DELETE statement for rows that are locally mirrored in memory but if it knows that the foreign key has ON DELETE CASCADE set up, it will let the rows that are unloaded be deleted automatically. With the flag off, it will attempt to locate those rows as well. So in this situation, the use of ON DELETE CASCADE *improves* performance, by reducing database round trips. > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev