Efficiency of alist vs. vector/array

Urs Liska Tue, 18 Jul 2017 09:31:01 -0700

Hi all,

in a desperate attempt to catch up for my lack of CS education I've
started taking some online classes about algorithms and data structures.
While being far from knowledgeable I've at least started to get an idea
about the *questions* and the relevance of the topic (for example that
20 lines of code or even the choice of a better built-in data structure
can make the difference between getting the result in a few seconds or
never in my lifetime). And I start trying to transfer these ideas to my
use of LilyPond.


Of course there will be more of the kind, but for now I stumbled over
one issue.

Recently I implemented a first version of a setting where all grobs had
an 'id attached (this was done externally with data converted from
Sibelius files). In addition I created an alist which took an id as key
and some property overrides as value (again an alist). In the
after-linebreaking callback I checked whether there is an entry in the
alist for the id of the current grob and if there is one apply the
overrides.

So in the after-linebreaking stage this iterates over all grobs and
performs a lookup in the alist, and it worked surprisingly well to
inject overrides to grobs by their id.

However, from my recent learning I have the following question:

What I *do* know is that iterating over all ids is necessary and works
in linear time (if I want to avoid that I would have to make sure that
only those grobs get an id that I will want to affect. Unfortunately
that's not possible). And as an alist is a linked list the lookup also
takes linear time with the number of items as the worst case. So the
running time of this computation would be characterised as O(n*m) with n
being the number of ids and m the number of entries. This running time
could be improved by using a different data structure for the overrides,
e.g. a (sorted) array or a hash table.

What I do *not* know is: would that be worth it (apart from the
educational value) in real life? In that project I had a song of a few
pages containing about 5.600 IDs. It's not clear how many override that
would get when finished, but let's assume it'll grow to a few hundred
and take 100 as a hypothetical reference. This makes about 5.600 alist
lookups in a list of 100 elements. in 5.500 cases no match will be
found, which means that the list will be traversed fully, so that makes
550.000 individual comparisons plus the 100 matches where we can take
n/2 as the average time (= 5.000 more comparisons).

My gut feeling is that's really a lot of unnecessary stuff, and one
should try to improve where possible. OTOH I don't really have
substantial experience wtih this tpic and don't know how that relates to
what's going on in the whole compilation process. OTOH what would be if
I consider using such a setting for a full symphony with maybe 100.000
grobs and 1.000 overrides? Would I then have to consider a better data
structure? Would I then have to think about a totally different approach?

Any food for thought, educated guesses, advice?

TIA
Urs


-- 
u...@openlilylib.org
https://openlilylib.org
http://lilypondblog.org


_______________________________________________
lilypond-user mailing list
lilypond-user@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-user

Efficiency of alist vs. vector/array

Reply via email to