Philip Semanchuk wrote:
So there *may* be some evidence that joins are indeed bad in practice.
If someone has smth specific/interesting on the subject, please post.
It's an unprovable assertion, or a meaningless one depending on how one
defines the terms. You could also say "there *may* be some evidence that
Python lists are bad in practice". Python lists and SQL JOINs are like
any part of a language or toolkit. They're good tools for solving
certain classes of problems. They can also be misapplied to problems
that they're not so good at. Sometimes they're a performance bottleneck,
even when solving the problems for which they're best. Sometimes the
best way to solve a performance bottleneck is to redesign your
app/system so you don't need to solve that kind of problem anymore
(hence the join-less databases). Other times, the cure is worse than the
disease and you're better off throwing hardware at the problem.
Look, I completely agree with what you're saying, but: that doesn't
change the possibility that joins may be expensive in comparison to
other SQL operations. This is the phrase I should have used perhaps;
'expensive in comparison with other SQL operations' instead of 'bad'.
Example from my app, where I behaved "by the book" (I hope) and
normalized my data:
$ time echo "\c hrs;
SELECT hosts.ip, reservation.start_date, architecture.architecture,
os_kind.os_kind, os_rel.os_rel, os_version.os_version, project.project,
email.email FROM hosts
INNER JOIN project ON project.id = hosts.project_id
INNER JOIN architecture ON hosts.architecture_id = architecture.id
INNER JOIN os_kind ON os_kind.id = hosts.os_kind_id
INNER JOIN os_rel ON hosts.os_rel_id = os_rel.id
INNER JOIN os_version ON hosts.os_version_id = os_version.id
INNER JOIN reservation_hosts ON hosts.id = reservation_hosts.host_id
INNER JOIN reservation on reservation.id =
reservation_hosts.reservation_id
INNER JOIN email ON reservation.email_id = email.id
;" | psql > /dev/null
real 0m0.099s
user 0m0.015s
sys 0m0.005s
$ time echo "\c hrs;
> SELECT hosts.ip FROM hosts;
> SELECT reservation.start_date FROM reservation;
> SELECT architecture.architecture FROM architecture;
> SELECT os_rel.os_rel FROM os_rel;
> SELECT os_version.os_version FROM os_version;
> SELECT project.project FROM project;
> SELECT email.email FROM email;
> " | psql > /dev/null
real 0m0.046s
user 0m0.008s
sys 0m0.004s
Note: I've created indexes on those tables, both on data columns like
hosts.ip and on .id columns.
So yes, joins apparently are at least twice as expensive as simple
selects without joins, on a small dataset. Not a drastic increase in
cost, but smth definitely shows.
It would be interesting to see what happens when row numbers increase to
large numbers, but I have no such data.
Regards,
mk
--
http://mail.python.org/mailman/listinfo/python-list