Another solution, much simpler IMO:

you need an upper limit of active entries that your system can handle, say MAX. You define a vector of MAX elements, each element contains only a pointer and a length.

The pointer is zero, if no item is allocated (never used or deleted) and in the other case,
it points to the item. The length is zero or the actual length of the item.

The storage for the items is allocated with malloc (I suppose, you are using C) or a similar mechanism of other programming language, in any case: LE. This way, you have no concerns about fragmentation; this is done perfectly be LE storage allocation. You simply do malloc (etc.), when you insert an item
and free, when you delete it.

This said: when scanning the table and deleting items, you only move small (8 Byte) elements to "close the gaps", but not large items. New items are added at the end of the list of remaining items.

A very simple solution IMO, and you never move items. The storage of the items is reused because LE does it
(using the free ... malloc sequence).

See this link (PDF): http://bernd-oppolzer.de/stackheap.pdf
it explains the heap implementation of LE. The interesting part starts at page 39.

Kind regards

Bernd


Am 30.12.2024 um 16:09 schrieb Bernd Oppolzer:
The obvious question which comes to mind:

why not use DB2 global temp tables? They work exactly like I outlined it before, but you don't have to code it. They live in main storage, as long as the DB2 buffer pools are large enough ...

Of course, we didn't talk about environments and programming languages and so on ... don't know, if you have DB2 available etc. and if you have the permission to use DB2 temp tables.
We did it in the past, and we made best experiences.

Kind regards

Bernd


Am 30.12.2024 um 15:59 schrieb Bernd Oppolzer:
Some errors ... one line was erroneously doubled ... and some other corrections ...

Feel free to ask if there are question remaining or doubts about how the algorithm works ... IMO this is much the same what DB2 does to keep the movement of data in the buffers to a minimum ... the records are moved only when the fragmentation of free space in the buffer pages requires it ... and even then the external RIDs (or row ids) consisting of page number and slot index (that is,
position of the row in the items vector) remain constant.

It must be clear from the content in the items vector if a certain row is present (active) or not (deleted). If the beginning of the page is used by certain statistic information, a row offset of zero is not valid, so a zero content in the items vector could be used to mark a "deleted" or "never used" record.

HTH, regards

Bernd


Am 30.12.2024 um 15:47 schrieb Bernd Oppolzer:
Ok, I now understand better.

To avoid the movement of the items, I would use a design much like the DB2 data pages.

The items would be stored in a series of buffers of fixed size (say: 4k or 8k or 16k). The buffers are chained or addressed by a pointer vector ... if chained, the number of buffers is unlimited; if pointer vector, the overall size depends on the size of this pointer vector.

Every buffer has its own local management, which consists of
- the number of bytes occupied (total)
- the number of bytes free (total)
- a vector of relative offsets for the individual items (say: 256 items max)
- the pointer (offset) to the first free area
- in every free area the pointer to the next free area (if there is one)

Now:

if an item is deleted, its area is added to the free area queue and its entry in the individual items vector is zeroed; the overall free bytes number is incremented etc.

If an item is to be added to this area, the overall free bytes number is checked; if is it too low, another area has to be allocated and used. If not AND if none of the individual free areas is large enough, a reorganization of the page has to be done; that is: the individual free areas which are too short have to be combined by moving the used items to the top of the page, giving one large free area at the bottom.

The algorithm can be optimized in several ways,
for example, the pointer to the largest free area can be recorded (instead of just a pointer to ANY free area) - or: the free area queue can be built such that the areas appear in ascending or
descending size order ... etc. etc.

If all items have the same size, there will be not much compression activity; the area used by deleted items will be reused smoothly be new incoming items. But if the items differ
significantly, it will be different ...

Hope this helps,
kind regards

Bernd


Am 30.12.2024 um 15:22 schrieb Binyamin Dissen:
There is external data that determines if a record is no longer needed. The records indicate a request sent to an external system that cannot reply.
Periodically the external system is queries as to completed items.

That is my reason for a second area where the incomplete requests are saved
while new requests are also recorded. The areas can be switched in a
synchronous instruction at which point the previous area is serialized and can
be processed.

I was hoping for an idea to avoid the data movement if possible while not losing a lot of storage. If I use multiple areas I can keep the incomplete records in their original area and hope that they will be complete by the time
I cycle all the areas. Otherwise I will need to copy them.


On Mon, 30 Dec 2024 15:11:35 +0100 Bernd Oppolzer <bernd.oppol...@t-online.de>
wrote:

:>some more questions, sorry ...
:>
:>if there are (many) records added and removed at the same time (removed
:>by scanning the existing records):
:>what is the condition that leads to the deletion of existing records?
:>Can it be computed on the existing records alone
:>or does it depend on the incoming records? Is the a certain ordering on
:>the existing or on the incoming records
:>(a key sequence) or a matching logic between the incoming and the
:>existing records?
:>
:>If not, why not first delete (and compress) all the existing records
:>that can be deleted (on their "own local" conditions);
:>and then - in a second step - add the new incoming records at the end of
:>the area?
:>
:>Please explain what I'm still missing ...
:>
:>Thanks and regards
:>
:>Bernd
:>
:>
:>Am 30.12.2024 um 14:57 schrieb Binyamin Dissen:
:>> 1. Only access (other than INSERT) is the periodic scan.
:>>
:>> 2. Issue is that additional records are being inserted at the same time.  So :>> may mover and move and move again. My thought was a header which had a pointer :>> to the next available area and using a synchronized instruction to take the
:>> next slot,
:>>
:>> 3. Unknown, but probably <1K. Potential large number of records.
:>>
:>>
:>> On Mon, 30 Dec 2024 14:39:54 +0100 Bernd Oppolzer <bernd.oppol...@t-online.de>
:>> wrote:
:>>
:>> :>I have (at least) two or three questions for this requirement, to
:>> :>understand it better and to give a better advice:
:>> :>
:>> :>1. is it necessary to access the individual items directly from outside,
:>> :>that is: do they need a pointer or
:>> :>a key or an ID or something, or is the sequential scan the only way to
:>> :>access them (after the insertion)?
:>> :>A pointer is no good solution for that, because it will change, when
:>> :>items are moved (see 2.) ... even
:>> :>if there is no second area. But there are easy solutions to overcome
:>> :>this problem ... if an external "key"
:>> :>is really needed.
:>> :>
:>> :>2. if you are scanning the area sequentially and remove the items that
:>> :>are no longer needed, you don't
:>> :>need a second area, IMO. You simply move the remaining items "nearer to
:>> :>the top", and eliminate the "holes".
:>> :>
:>> :>3. can you give an upper limit to the size of the items (which then
:>> :>gives us the possibility to define a
:>> :>reasonable upper limit for the size of the complete area or structure)?
:>> :>
:>> :>For me, several designs come to mind:
:>> :>
:>> :>- sort of "records" with length information, like VSAM records in an
:>> :>ESDS control interval
:>> :>
:>> :>- something like DB2 data pages with row ids (fixed size - 4k, 8k, 32k -
:>> :>with a row pointer vector
:>> :>at the beginning or at the end; this would allow for external IDs
:>> :>remaining constant,
:>> :>even if the items were moved inside the page)
:>> :>
:>> :>Kind regards
:>> :>
:>> :>Bernd
:>> :>
:>> :>
:>> :>Am 29.12.2024 um 21:51 schrieb Binyamin Dissen:
:>> :>> The data structure consists of various items of different lengths. The items :>> :>> are added serially and periodically the area is scanned where most items are :>> :>> removed. The remaining items are likely to be removed in a later scan (with
:>> :>> new items).
:>> :>>
:>> :>> My thought was to have two data areas, and during the scan insert the :>> :>> undeleted items into the alternate area. It will be a bit of data moves but is
:>> :>> simple.
:>> :>>
:>> :>> Does Knuth talk about such a data structure?
:>> :>>
:>> :>> --
:>> :>> Binyamin Dissen <bdis...@dissensoftware.com>
:>> :>> http://www.dissensoftware.com
:>> :>>
:>> :>> Director, Dissen Software, Bar & Grill - Israel
:>> :>>
:>> :>> ---------------------------------------------------------------------- :>> :>> For IBM-MAIN subscribe / signoff / archive access instructions, :>> :>> send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
:>> :>
:>> :>----------------------------------------------------------------------
:>> :>For IBM-MAIN subscribe / signoff / archive access instructions,
:>> :>send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
:>>
:>> --
:>> Binyamin Dissen <bdis...@dissensoftware.com>
:>> http://www.dissensoftware.com
:>>
:>> Director, Dissen Software, Bar & Grill - Israel
:>>
:>> ----------------------------------------------------------------------
:>> For IBM-MAIN subscribe / signoff / archive access instructions,
:>> send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
:>
:>----------------------------------------------------------------------
:>For IBM-MAIN subscribe / signoff / archive access instructions,
:>send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

--
Binyamin Dissen <bdis...@dissensoftware.com>
http://www.dissensoftware.com

Director, Dissen Software, Bar & Grill - Israel

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Reply via email to