Ok, I now understand better.
To avoid the movement of the items, I would use a design much like the
DB2 data pages.
The items would be stored in a series of buffers of fixed size (say: 4k
or 8k or 16k).
The buffers are chained or addressed by a pointer vector ... if chained,
the number of buffers
is unlimited; if pointer vector, the overall size depends on the size of
this pointer vector.
Every buffer has its own local management, which consists of
- the number of bytes occupied (total)
- the number of bytes free (total)
- a vector of relative offsets for the individual items (say: 256 items max)
- the pointer (offset) to the first free area
- in every free area the pointer to the next free area (if there is one)
- a vector of relative offsets for the individual items (say: 256 items max)
Now:
if an item is deleted, its area is added to the free area queue and its
entry in
the individual items vector is zeroed; the overall free bytes number is
incremented etc.
If an item is to be added to this area, the overall free bytes number is
checked;
if is it too low, another area has to be allocated and used. If not AND
if none of the
individual free areas is large enough, a reorganization of the page has
to be done;
that is: the individual free areas which are too short have to be
combined by
moving the used items to the top of the page, giving one large free area
at the bottom.
The algorithm can be optimized in several ways,
for example, the pointer to the largest free area can be recorded
(instead of just a pointer to
ANY free area) - or: the free area queue can be built such that the
areas appear in ascending or
descending size ... etc. etc.
If all items have the same size, there will be not much compression
activity; the area used by
deleted items will be reused smoothly be new incoming items. But if the
items differ
significantly, it will be different ...
Hope this helps,
kind regards
Bernd
Am 30.12.2024 um 15:22 schrieb Binyamin Dissen:
There is external data that determines if a record is no longer needed. The
records indicate a request sent to an external system that cannot reply.
Periodically the external system is queries as to completed items.
That is my reason for a second area where the incomplete requests are saved
while new requests are also recorded. The areas can be switched in a
synchronous instruction at which point the previous area is serialized and can
be processed.
I was hoping for an idea to avoid the data movement if possible while not
losing a lot of storage. If I use multiple areas I can keep the incomplete
records in their original area and hope that they will be complete by the time
I cycle all the areas. Otherwise I will need to copy them.
On Mon, 30 Dec 2024 15:11:35 +0100 Bernd Oppolzer <bernd.oppol...@t-online.de>
wrote:
:>some more questions, sorry ...
:>
:>if there are (many) records added and removed at the same time (removed
:>by scanning the existing records):
:>what is the condition that leads to the deletion of existing records?
:>Can it be computed on the existing records alone
:>or does it depend on the incoming records? Is the a certain ordering on
:>the existing or on the incoming records
:>(a key sequence) or a matching logic between the incoming and the
:>existing records?
:>
:>If not, why not first delete (and compress) all the existing records
:>that can be deleted (on their "own local" conditions);
:>and then - in a second step - add the new incoming records at the end of
:>the area?
:>
:>Please explain what I'm still missing ...
:>
:>Thanks and regards
:>
:>Bernd
:>
:>
:>Am 30.12.2024 um 14:57 schrieb Binyamin Dissen:
:>> 1. Only access (other than INSERT) is the periodic scan.
:>>
:>> 2. Issue is that additional records are being inserted at the same time. So
:>> may mover and move and move again. My thought was a header which had a
pointer
:>> to the next available area and using a synchronized instruction to take the
:>> next slot,
:>>
:>> 3. Unknown, but probably <1K. Potential large number of records.
:>>
:>>
:>> On Mon, 30 Dec 2024 14:39:54 +0100 Bernd Oppolzer
<bernd.oppol...@t-online.de>
:>> wrote:
:>>
:>> :>I have (at least) two or three questions for this requirement, to
:>> :>understand it better and to give a better advice:
:>> :>
:>> :>1. is it necessary to access the individual items directly from outside,
:>> :>that is: do they need a pointer or
:>> :>a key or an ID or something, or is the sequential scan the only way to
:>> :>access them (after the insertion)?
:>> :>A pointer is no good solution for that, because it will change, when
:>> :>items are moved (see 2.) ... even
:>> :>if there is no second area. But there are easy solutions to overcome
:>> :>this problem ... if an external "key"
:>> :>is really needed.
:>> :>
:>> :>2. if you are scanning the area sequentially and remove the items that
:>> :>are no longer needed, you don't
:>> :>need a second area, IMO. You simply move the remaining items "nearer to
:>> :>the top", and eliminate the "holes".
:>> :>
:>> :>3. can you give an upper limit to the size of the items (which then
:>> :>gives us the possibility to define a
:>> :>reasonable upper limit for the size of the complete area or structure)?
:>> :>
:>> :>For me, several designs come to mind:
:>> :>
:>> :>- sort of "records" with length information, like VSAM records in an
:>> :>ESDS control interval
:>> :>
:>> :>- something like DB2 data pages with row ids (fixed size - 4k, 8k, 32k -
:>> :>with a row pointer vector
:>> :>at the beginning or at the end; this would allow for external IDs
:>> :>remaining constant,
:>> :>even if the items were moved inside the page)
:>> :>
:>> :>Kind regards
:>> :>
:>> :>Bernd
:>> :>
:>> :>
:>> :>Am 29.12.2024 um 21:51 schrieb Binyamin Dissen:
:>> :>> The data structure consists of various items of different lengths. The
items
:>> :>> are added serially and periodically the area is scanned where most
items are
:>> :>> removed. The remaining items are likely to be removed in a later scan
(with
:>> :>> new items).
:>> :>>
:>> :>> My thought was to have two data areas, and during the scan insert the
:>> :>> undeleted items into the alternate area. It will be a bit of data moves
but is
:>> :>> simple.
:>> :>>
:>> :>> Does Knuth talk about such a data structure?
:>> :>>
:>> :>> --
:>> :>> Binyamin Dissen <bdis...@dissensoftware.com>
:>> :>> http://www.dissensoftware.com
:>> :>>
:>> :>> Director, Dissen Software, Bar & Grill - Israel
:>> :>>
:>> :>> ----------------------------------------------------------------------
:>> :>> For IBM-MAIN subscribe / signoff / archive access instructions,
:>> :>> send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
:>> :>
:>> :>----------------------------------------------------------------------
:>> :>For IBM-MAIN subscribe / signoff / archive access instructions,
:>> :>send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
:>>
:>> --
:>> Binyamin Dissen <bdis...@dissensoftware.com>
:>> http://www.dissensoftware.com
:>>
:>> Director, Dissen Software, Bar & Grill - Israel
:>>
:>> ----------------------------------------------------------------------
:>> For IBM-MAIN subscribe / signoff / archive access instructions,
:>> send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
:>
:>----------------------------------------------------------------------
:>For IBM-MAIN subscribe / signoff / archive access instructions,
:>send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
--
Binyamin Dissen <bdis...@dissensoftware.com>
http://www.dissensoftware.com
Director, Dissen Software, Bar & Grill - Israel
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN