Re: Ideas of managing a data structure

Binyamin Dissen Mon, 30 Dec 2024 11:07:09 -0800

Doubt that DB2 will provide efficiency for this case.

On Mon, 30 Dec 2024 16:09:56 +0100 Bernd Oppolzer <bernd.oppol...@t-online.de>
wrote:


:>The obvious question which comes to mind:
:>
:>why not use DB2 global temp tables? They work exactly like I outlined it 
:>before, but you don't have to code it.
:>They live in main storage, as long as the DB2 buffer pools are large 
:>enough ...
:>
:>Of course, we didn't talk about environments and programming languages 
:>and so on ...
:>don't know, if you have DB2 available etc. and if you have the 
:>permission to use DB2 temp tables.
:>We did it in the past, and we made best experiences.
:>
:>Kind regards
:>
:>Bernd
:>
:>
:>Am 30.12.2024 um 15:59 schrieb Bernd Oppolzer:
:>> Some errors ... one line was erroneously doubled ... and some other 
:>> corrections ...
:>>
:>> Feel free to ask if there are question remaining or doubts about how 
:>> the algorithm works ...
:>> IMO this is much the same what DB2 does to keep the movement of data 
:>> in the buffers to a minimum ...
:>> the records are moved only when the fragmentation of free space in the 
:>> buffer pages requires it ...
:>> and even then the external RIDs (or row ids) consisting of page number 
:>> and slot index (that is,
:>> position of the row in the items vector) remain constant.
:>>
:>> It must be clear from the content in the items vector if a certain row 
:>> is present (active) or not (deleted).
:>> If the beginning of the page is used by certain statistic information, 
:>> a row offset of zero is not valid,
:>> so a zero content in the items vector could be used to mark a 
:>> "deleted" or "never used" record.
:>>
:>> HTH, regards
:>>
:>> Bernd
:>>
:>>
:>> Am 30.12.2024 um 15:47 schrieb Bernd Oppolzer:
:>>> Ok, I now understand better.
:>>>
:>>> To avoid the movement of the items, I would use a design much like 
:>>> the DB2 data pages.
:>>>
:>>> The items would be stored in a series of buffers of fixed size (say: 
:>>> 4k or 8k or 16k).
:>>> The buffers are chained or addressed by a pointer vector ... if 
:>>> chained, the number of buffers
:>>> is unlimited; if pointer vector, the overall size depends on the size 
:>>> of this pointer vector.
:>>>
:>>> Every buffer has its own local management, which consists of
:>>> - the number of bytes occupied (total)
:>>> - the number of bytes free (total)
:>>> - a vector of relative offsets for the individual items (say: 256 
:>>> items max)
:>>> - the pointer (offset) to the first free area
:>>> - in every free area the pointer to the next free area (if there is one)
:>>>
:>>> Now:
:>>>
:>>> if an item is deleted, its area is added to the free area queue and 
:>>> its entry in
:>>> the individual items vector is zeroed; the overall free bytes number 
:>>> is incremented etc.
:>>>
:>>> If an item is to be added to this area, the overall free bytes number 
:>>> is checked;
:>>> if is it too low, another area has to be allocated and used. If not 
:>>> AND if none of the
:>>> individual free areas is large enough, a reorganization of the page 
:>>> has to be done;
:>>> that is: the individual free areas which are too short have to be 
:>>> combined by
:>>> moving the used items to the top of the page, giving one large free 
:>>> area at the bottom.
:>>>
:>>> The algorithm can be optimized in several ways,
:>>> for example, the pointer to the largest free area can be recorded 
:>>> (instead of just a pointer to
:>>> ANY free area) - or: the free area queue can be built such that the 
:>>> areas appear in ascending or
:>>> descending size order ... etc. etc.
:>>>
:>>> If all items have the same size, there will be not much compression 
:>>> activity; the area used by
:>>> deleted items will be reused smoothly be new incoming items. But if 
:>>> the items differ
:>>> significantly, it will be different ...
:>>>
:>>> Hope this helps,
:>>> kind regards
:>>>
:>>> Bernd
:>>>
:>>>
:>>> Am 30.12.2024 um 15:22 schrieb Binyamin Dissen:
:>>>> There is external data that determines if a record is no longer 
:>>>> needed. The
:>>>> records indicate a request sent to an external system that cannot 
:>>>> reply.
:>>>> Periodically the external system is queries as to completed items.
:>>>>
:>>>> That is my reason for a second area where the incomplete requests 
:>>>> are saved
:>>>> while new requests are also recorded. The areas can be switched in a
:>>>> synchronous instruction at which point the previous area is 
:>>>> serialized and can
:>>>> be processed.
:>>>>
:>>>> I was hoping for an idea to avoid the data movement if possible 
:>>>> while not
:>>>> losing a lot of storage. If I use multiple areas I can keep the 
:>>>> incomplete
:>>>> records in their original area and hope that they will be complete 
:>>>> by the time
:>>>> I cycle all the areas. Otherwise I will need to copy them.
:>>>>
:>>>>
:>>>> On Mon, 30 Dec 2024 15:11:35 +0100 Bernd Oppolzer 
:>>>> <bernd.oppol...@t-online.de>
:>>>> wrote:
:>>>>
:>>>> :>some more questions, sorry ...
:>>>> :>
:>>>> :>if there are (many) records added and removed at the same time 
:>>>> (removed
:>>>> :>by scanning the existing records):
:>>>> :>what is the condition that leads to the deletion of existing records?
:>>>> :>Can it be computed on the existing records alone
:>>>> :>or does it depend on the incoming records? Is the a certain 
:>>>> ordering on
:>>>> :>the existing or on the incoming records
:>>>> :>(a key sequence) or a matching logic between the incoming and the
:>>>> :>existing records?
:>>>> :>
:>>>> :>If not, why not first delete (and compress) all the existing records
:>>>> :>that can be deleted (on their "own local" conditions);
:>>>> :>and then - in a second step - add the new incoming records at the 
:>>>> end of
:>>>> :>the area?
:>>>> :>
:>>>> :>Please explain what I'm still missing ...
:>>>> :>
:>>>> :>Thanks and regards
:>>>> :>
:>>>> :>Bernd
:>>>> :>
:>>>> :>
:>>>> :>Am 30.12.2024 um 14:57 schrieb Binyamin Dissen:
:>>>> :>> 1. Only access (other than INSERT) is the periodic scan.
:>>>> :>>
:>>>> :>> 2. Issue is that additional records are being inserted at the 
:>>>> same time.  So
:>>>> :>> may mover and move and move again. My thought was a header which 
:>>>> had a pointer
:>>>> :>> to the next available area and using a synchronized instruction 
:>>>> to take the
:>>>> :>> next slot,
:>>>> :>>
:>>>> :>> 3. Unknown, but probably <1K. Potential large number of records.
:>>>> :>>
:>>>> :>>
:>>>> :>> On Mon, 30 Dec 2024 14:39:54 +0100 Bernd Oppolzer 
:>>>> <bernd.oppol...@t-online.de>
:>>>> :>> wrote:
:>>>> :>>
:>>>> :>> :>I have (at least) two or three questions for this requirement, to
:>>>> :>> :>understand it better and to give a better advice:
:>>>> :>> :>
:>>>> :>> :>1. is it necessary to access the individual items directly 
:>>>> from outside,
:>>>> :>> :>that is: do they need a pointer or
:>>>> :>> :>a key or an ID or something, or is the sequential scan the 
:>>>> only way to
:>>>> :>> :>access them (after the insertion)?
:>>>> :>> :>A pointer is no good solution for that, because it will 
:>>>> change, when
:>>>> :>> :>items are moved (see 2.) ... even
:>>>> :>> :>if there is no second area. But there are easy solutions to 
:>>>> overcome
:>>>> :>> :>this problem ... if an external "key"
:>>>> :>> :>is really needed.
:>>>> :>> :>
:>>>> :>> :>2. if you are scanning the area sequentially and remove the 
:>>>> items that
:>>>> :>> :>are no longer needed, you don't
:>>>> :>> :>need a second area, IMO. You simply move the remaining items 
:>>>> "nearer to
:>>>> :>> :>the top", and eliminate the "holes".
:>>>> :>> :>
:>>>> :>> :>3. can you give an upper limit to the size of the items (which 
:>>>> then
:>>>> :>> :>gives us the possibility to define a
:>>>> :>> :>reasonable upper limit for the size of the complete area or 
:>>>> structure)?
:>>>> :>> :>
:>>>> :>> :>For me, several designs come to mind:
:>>>> :>> :>
:>>>> :>> :>- sort of "records" with length information, like VSAM records 
:>>>> in an
:>>>> :>> :>ESDS control interval
:>>>> :>> :>
:>>>> :>> :>- something like DB2 data pages with row ids (fixed size - 4k, 
:>>>> 8k, 32k -
:>>>> :>> :>with a row pointer vector
:>>>> :>> :>at the beginning or at the end; this would allow for external IDs
:>>>> :>> :>remaining constant,
:>>>> :>> :>even if the items were moved inside the page)
:>>>> :>> :>
:>>>> :>> :>Kind regards
:>>>> :>> :>
:>>>> :>> :>Bernd
:>>>> :>> :>
:>>>> :>> :>
:>>>> :>> :>Am 29.12.2024 um 21:51 schrieb Binyamin Dissen:
:>>>> :>> :>> The data structure consists of various items of different 
:>>>> lengths. The items
:>>>> :>> :>> are added serially and periodically the area is scanned 
:>>>> where most items are
:>>>> :>> :>> removed. The remaining items are likely to be removed in a 
:>>>> later scan (with
:>>>> :>> :>> new items).
:>>>> :>> :>>
:>>>> :>> :>> My thought was to have two data areas, and during the scan 
:>>>> insert the
:>>>> :>> :>> undeleted items into the alternate area. It will be a bit of 
:>>>> data moves but is
:>>>> :>> :>> simple.
:>>>> :>> :>>
:>>>> :>> :>> Does Knuth talk about such a data structure?
:>>>> :>> :>>
:>>>> :>> :>> --
:>>>> :>> :>> Binyamin Dissen <bdis...@dissensoftware.com>
:>>>> :>> :>> http://www.dissensoftware.com
:>>>> :>> :>>
:>>>> :>> :>> Director, Dissen Software, Bar & Grill - Israel
:>>>> :>> :>>
:>>>> :>> :>> 
:>>>> ----------------------------------------------------------------------
:>>>> :>> :>> For IBM-MAIN subscribe / signoff / archive access instructions,
:>>>> :>> :>> send email to lists...@listserv.ua.edu with the message: 
:>>>> INFO IBM-MAIN
:>>>> :>> :>
:>>>> :>> 
:>>>> :>----------------------------------------------------------------------
:>>>> :>> :>For IBM-MAIN subscribe / signoff / archive access instructions,
:>>>> :>> :>send email to lists...@listserv.ua.edu with the message: INFO 
:>>>> IBM-MAIN
:>>>> :>>
:>>>> :>> --
:>>>> :>> Binyamin Dissen <bdis...@dissensoftware.com>
:>>>> :>> http://www.dissensoftware.com
:>>>> :>>
:>>>> :>> Director, Dissen Software, Bar & Grill - Israel
:>>>> :>>
:>>>> :>> 
:>>>> ----------------------------------------------------------------------
:>>>> :>> For IBM-MAIN subscribe / signoff / archive access instructions,
:>>>> :>> send email to lists...@listserv.ua.edu with the message: INFO 
:>>>> IBM-MAIN
:>>>> :>
:>>>> :>---------------------------------------------------------------------- 
:>>>>
:>>>> :>For IBM-MAIN subscribe / signoff / archive access instructions,
:>>>> :>send email to lists...@listserv.ua.edu with the message: INFO 
:>>>> IBM-MAIN
:>>>>
:>>>> -- 
:>>>> Binyamin Dissen <bdis...@dissensoftware.com>
:>>>> http://www.dissensoftware.com
:>>>>
:>>>> Director, Dissen Software, Bar & Grill - Israel
:>>>>
:>>>> ----------------------------------------------------------------------
:>>>> For IBM-MAIN subscribe / signoff / archive access instructions,
:>>>> send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
:>>>
:>>> ----------------------------------------------------------------------
:>>> For IBM-MAIN subscribe / signoff / archive access instructions,
:>>> send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
:>
:>----------------------------------------------------------------------
:>For IBM-MAIN subscribe / signoff / archive access instructions,
:>send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

--
Binyamin Dissen <bdis...@dissensoftware.com>
http://www.dissensoftware.com

Director, Dissen Software, Bar & Grill - Israel

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: Ideas of managing a data structure

Reply via email to