Re: SP-GiST confusing introductory paragraph

Alex Matchneer Tue, 17 Oct 2023 03:59:35 -0700

I realized later that "these data structures" was referring to
"non-balanced data structures, such as quad-trees, k-d trees, and radix
trees (tries)" from the previous paragraph, and not SP-GiST, which isn't a
data structure per se. I feel a bit silly about this confusion and am not
sure others would have the same confusion I did.


Alex

On Mon, Oct 16, 2023 at 11:09 PM Alvaro Herrera <alvhe...@alvh.no-ip.org>
wrote:

> On 2023-Oct-16, PG Doc comments form wrote:
>
> > I'm confused by this paragraph:
> >
> > > These popular data structures were originally developed for in-memory
> > > usage. In main memory, they are usually designed as a set of
> dynamically
> > > allocated nodes linked by pointers. This is not suitable for direct
> storing
> > > on disk, since these chains of pointers can be rather long which would
> > > require too many disk accesses. In contrast, disk-based data structures
> > > should have a high fanout to minimize I/O. The challenge addressed by
> > > SP-GiST is to map search tree nodes to disk pages in such a way that a
> > > search need access only a few disk pages, even if it traverses many
> nodes.
> >
> > In particular, "These popular datastructures" is ambiguous -- based on
> how
> > the previous paragraph ends, it sounds like the "popular datastructures"
> or
> > SP-GiSTs, but then it goes on to say they were designed for in-memory,
> and
> > then it mentions that SP-GiST's space partitioning (with high fanout) is
> > more appropriate to minimize disk access. I think maybe the solution here
> > would be to replace "These popular data structures" with something like
> > "Classic Postgres indexes such as B-tree indexes..." or something like
> > that.
>
> Yeah, this paragraph is a rewording of Oleg's[1]
>
>    SP-GiST is an abbreviation of space-partitioned GiST - the search
>    tree, which allows to implement a wide range of different
>    non-balanced disk-based data structures, such as quadtree, kd-tree,
>    tries - popular data structures, originally developed for memory
>    storage. Main memory access structures usually designed as a set of
>    dynamically allocated nodes linked by pointers, which is not suitable
>    for direct storing on disk, since these chains of pointers can be
>    rather long and require too many disk accesses. In opposite, disk
>    based data structures have a high fanout to minimize I/O. The
>    challenge is to map nodes of tree to disk pages in such a way, so
>    search algorithm accesses only a few disk pages, even if it traverse
>    many nodes.
>
> where the point is (IMO) much clearer.
>
> [1] http://www.sai.msu.su/~megera/wiki/spgist_dev
>
> > Also, I think a short parenthetical definition of "fanout" would be
> useful
> > here, something like "high fanout (i.e. where each node has a potentially
> > large number of child nodes that reside in the same disk page)". Took me
> > some googling to realize what fanout meant in this context.
>
> Hmm.  We also use the term (hypenated as fan-out) in the reference to
> recursive_worktable_factor.  Maybe we should list it in the glossary in
> a way that works for both.
>
> --
> Álvaro Herrera         PostgreSQL Developer  —
> https://www.EnterpriseDB.com/
> "La primera ley de las demostraciones en vivo es: no trate de usar el
> sistema.
> Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)
>

Re: SP-GiST confusing introductory paragraph

Reply via email to