On Nov 18, 7:16 pm, Malcolm Tredinnick <[EMAIL PROTECTED]>
wrote:
> On Mon, 2008-11-17 at 11:07 -0800, Gustavo Picón wrote:
>
> [...]
>
> > So the usual recommendation is:
>
> >  - if you're going to insert a lot more than you read, use adjacency
> > list
> >  - if, as is the most common case, you're going to read your tree more
> > than you insert nodes, use nested sets or materialized path
>
> Size matters, though. Your numbers are for a lot more than 500 nodes and
> I'll repeat my earlier claim that for something that small you just
> aren't going to lose a lot of performance (in fact, depending on what
> you're doing with the data, you'll often gain speed) by doing a naive
> select of almost everything (or everything restricted by parent name,
> say) and then just working Python. You cut out a bunch of database level
> comparisons at that point.

That's debatable. For instance, for such a small table, both nested
sets and
materialized path trees will return sets ordered by an index that is
so
small that will fit completely in RAM. Databases are VERY good at
this, and
both nested sets and materialized paths take advantage of this fact.
There is no need for database level comparisons.

And I agree that you could order your trees in python, but you'll
waste
resources. Granted, not much for 500 records, but if you use this
approach
in a site with traffic you will be dependant on caching much earlier
than
would be really needed if you did things _right_ (just look at the
resource
hog wordpress is right now for design decisions like this).

A good suggestion would be, as usual: MEASURE, know your problem and
set
goals. That's why I published the results of these benchmarks (and the
benchmark sources themselves): to help people decide.


> Your benchmarks match what other people see with those various
> implementation strategies, so they look believable (and the comparisons
> between the various database engines look believable, too), so I think
> they're good measures, but at various ends of the scale (and, let's be
> serious here: 500 is small), alternative strategies work very well.

The comparison between the database engines would be better if I could
make mysql run entirely on RAM, like I did with postgres82/83 and
sqlite.
Any suggestion on how to do this is very welcome.


> Don't misunderstand me: I think treebeard is an interesting project and
> having multiple implementations is a good idea both for drop-in
> practicality and as reference code. I think it's great what you've done.
> Trees are a fascinating storage structure and there are lots of
> trade-offs involved in code complexity and performance (insert vs.
> different retrieval patterns vs number of nodes, for example), so
> anybody picking just one implementation is going to be disappointed at
> some point.

Agreed, and that's kinda why treebeard exists: I needed a materialized
path
tree, but the only tree library available for django was MPTT (that
uses
nested sets). So I wrote mine. Later I added AL and NS trees.

I made some decisions while writing these backends, there are useful
optimizations that I just didn't apply because each one has a
drawback,
for instance:

 - Adding a `level` field to AL nodes, would speed up some reads, but
   would make writes slower, and writes are the ONLY advantage of
using
   AL trees.
 - Adding holes between nodes (spread) in a nested sets tree. This
makes
   writes/deletes faster (these are _very_ expensive in NS trees), but
then
   some read operations would be slower.

So I decided that I would just write a classic backend for these, and
in the
future write the optimizations as subclasses from the original AL/NS
trees.


> (By the way, are those benchmark numbers averages for each run over the
> 1000 nodes, or totals for the "several times" that you did the
> retrievals?)

It's the time taken to retrieve every node in the tree (1000 nodes) 10
times.

Regards.

- tabo
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to