On 2023-01-16 09:12:30 +1300, dn via Python-list wrote: > On 16/01/2023 08.36, Weatherby,Gerard wrote: > > I think any peformance improvements would have to come from a language > > change or better indexing of the data. > Expanding on @Peter's post: databases (relational or not) are best organised > according to use.
I probably wasn't very clear here. I think one of the main advantages of relational databases over earlier models (especially hierarchical databases but also network databases) is that you *don't* have to do that. In a hierarchical database you have to decide what's higher or lower in the hierarchy (e.g., does a department have employees, or do employees have a department?) and that has a dramatic effect on performance so you must structure the database on which queries are expected to be more common. In a relational database employees and departments are at the same level and you have a relationship between them. You don't need to know whether you'll have to look up all employees of a given department or the department of a given employee more often. Both will be similarly fast with appropriate indexes. (Of course you should still have an idea what the data is used for when designing your data model. But semantics are much more important than use cases at this stage and you don't have to redesign your entire database just because you need a new query.) This flexibility comes with a cost, though: A relational database is almost always good enough, but almost never optimal for any given use. > Postgres and MySQL (for example) enable the establishment of multiple and > sophisticated indices/indexes, and the aptly-named "views" of data. > > If the queries can be grouped according to the manner in which the data must > be accessed, a view could be built for each. At which time, even if every > row must be accessed, the retrieval will be made more efficient and/or the > response better-organised. Nitpick: A view will not be more efficient (unless it's a materialized view which is basically just a table). To retrieve that data, the RDBMS has to perform exactly the same operations as for the underlying query. Views make life simpler for the programmer (or analyst) by letting them *write* simpler queries, but under the surface nothing changes. In fact the performance may be worse, since the perceived simplicity of the view may cause the human to write queries which are hard to optimize. (There is another important use case for views: Access control and enforcement of business rules. But I fear we are straying far from the original topic now.) > Thus, if we have a DB of people. Many retrievals are likely to utilise an > index on 'name'. However, if at times interest is limited to place or > suburb, an index and view of such will speed things from O(n). Similarly, if > a query is only to return people with a dog license. I don't see where a view would help here - but maybe you are thinking of a more complex database than you describe. > Some programmers don't realise that SQL can also be used for calculations, > eg the eponymous COUNT(), which saves (CPU-time and coding-effort) over > post-processing in Python. Agreed. Aggregate and window functions can do a lot of work in the database. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | h...@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
signature.asc
Description: PGP signature
-- https://mail.python.org/mailman/listinfo/python-list