Doubt that DB2 will provide efficiency for this case. On Mon, 30 Dec 2024 16:09:56 +0100 Bernd Oppolzer <bernd.oppol...@t-online.de> wrote:
:>The obvious question which comes to mind: :> :>why not use DB2 global temp tables? They work exactly like I outlined it :>before, but you don't have to code it. :>They live in main storage, as long as the DB2 buffer pools are large :>enough ... :> :>Of course, we didn't talk about environments and programming languages :>and so on ... :>don't know, if you have DB2 available etc. and if you have the :>permission to use DB2 temp tables. :>We did it in the past, and we made best experiences. :> :>Kind regards :> :>Bernd :> :> :>Am 30.12.2024 um 15:59 schrieb Bernd Oppolzer: :>> Some errors ... one line was erroneously doubled ... and some other :>> corrections ... :>> :>> Feel free to ask if there are question remaining or doubts about how :>> the algorithm works ... :>> IMO this is much the same what DB2 does to keep the movement of data :>> in the buffers to a minimum ... :>> the records are moved only when the fragmentation of free space in the :>> buffer pages requires it ... :>> and even then the external RIDs (or row ids) consisting of page number :>> and slot index (that is, :>> position of the row in the items vector) remain constant. :>> :>> It must be clear from the content in the items vector if a certain row :>> is present (active) or not (deleted). :>> If the beginning of the page is used by certain statistic information, :>> a row offset of zero is not valid, :>> so a zero content in the items vector could be used to mark a :>> "deleted" or "never used" record. :>> :>> HTH, regards :>> :>> Bernd :>> :>> :>> Am 30.12.2024 um 15:47 schrieb Bernd Oppolzer: :>>> Ok, I now understand better. :>>> :>>> To avoid the movement of the items, I would use a design much like :>>> the DB2 data pages. :>>> :>>> The items would be stored in a series of buffers of fixed size (say: :>>> 4k or 8k or 16k). :>>> The buffers are chained or addressed by a pointer vector ... if :>>> chained, the number of buffers :>>> is unlimited; if pointer vector, the overall size depends on the size :>>> of this pointer vector. :>>> :>>> Every buffer has its own local management, which consists of :>>> - the number of bytes occupied (total) :>>> - the number of bytes free (total) :>>> - a vector of relative offsets for the individual items (say: 256 :>>> items max) :>>> - the pointer (offset) to the first free area :>>> - in every free area the pointer to the next free area (if there is one) :>>> :>>> Now: :>>> :>>> if an item is deleted, its area is added to the free area queue and :>>> its entry in :>>> the individual items vector is zeroed; the overall free bytes number :>>> is incremented etc. :>>> :>>> If an item is to be added to this area, the overall free bytes number :>>> is checked; :>>> if is it too low, another area has to be allocated and used. If not :>>> AND if none of the :>>> individual free areas is large enough, a reorganization of the page :>>> has to be done; :>>> that is: the individual free areas which are too short have to be :>>> combined by :>>> moving the used items to the top of the page, giving one large free :>>> area at the bottom. :>>> :>>> The algorithm can be optimized in several ways, :>>> for example, the pointer to the largest free area can be recorded :>>> (instead of just a pointer to :>>> ANY free area) - or: the free area queue can be built such that the :>>> areas appear in ascending or :>>> descending size order ... etc. etc. :>>> :>>> If all items have the same size, there will be not much compression :>>> activity; the area used by :>>> deleted items will be reused smoothly be new incoming items. But if :>>> the items differ :>>> significantly, it will be different ... :>>> :>>> Hope this helps, :>>> kind regards :>>> :>>> Bernd :>>> :>>> :>>> Am 30.12.2024 um 15:22 schrieb Binyamin Dissen: :>>>> There is external data that determines if a record is no longer :>>>> needed. The :>>>> records indicate a request sent to an external system that cannot :>>>> reply. :>>>> Periodically the external system is queries as to completed items. :>>>> :>>>> That is my reason for a second area where the incomplete requests :>>>> are saved :>>>> while new requests are also recorded. The areas can be switched in a :>>>> synchronous instruction at which point the previous area is :>>>> serialized and can :>>>> be processed. :>>>> :>>>> I was hoping for an idea to avoid the data movement if possible :>>>> while not :>>>> losing a lot of storage. If I use multiple areas I can keep the :>>>> incomplete :>>>> records in their original area and hope that they will be complete :>>>> by the time :>>>> I cycle all the areas. Otherwise I will need to copy them. :>>>> :>>>> :>>>> On Mon, 30 Dec 2024 15:11:35 +0100 Bernd Oppolzer :>>>> <bernd.oppol...@t-online.de> :>>>> wrote: :>>>> :>>>> :>some more questions, sorry ... :>>>> :> :>>>> :>if there are (many) records added and removed at the same time :>>>> (removed :>>>> :>by scanning the existing records): :>>>> :>what is the condition that leads to the deletion of existing records? :>>>> :>Can it be computed on the existing records alone :>>>> :>or does it depend on the incoming records? Is the a certain :>>>> ordering on :>>>> :>the existing or on the incoming records :>>>> :>(a key sequence) or a matching logic between the incoming and the :>>>> :>existing records? :>>>> :> :>>>> :>If not, why not first delete (and compress) all the existing records :>>>> :>that can be deleted (on their "own local" conditions); :>>>> :>and then - in a second step - add the new incoming records at the :>>>> end of :>>>> :>the area? :>>>> :> :>>>> :>Please explain what I'm still missing ... :>>>> :> :>>>> :>Thanks and regards :>>>> :> :>>>> :>Bernd :>>>> :> :>>>> :> :>>>> :>Am 30.12.2024 um 14:57 schrieb Binyamin Dissen: :>>>> :>> 1. Only access (other than INSERT) is the periodic scan. :>>>> :>> :>>>> :>> 2. Issue is that additional records are being inserted at the :>>>> same time. So :>>>> :>> may mover and move and move again. My thought was a header which :>>>> had a pointer :>>>> :>> to the next available area and using a synchronized instruction :>>>> to take the :>>>> :>> next slot, :>>>> :>> :>>>> :>> 3. Unknown, but probably <1K. Potential large number of records. :>>>> :>> :>>>> :>> :>>>> :>> On Mon, 30 Dec 2024 14:39:54 +0100 Bernd Oppolzer :>>>> <bernd.oppol...@t-online.de> :>>>> :>> wrote: :>>>> :>> :>>>> :>> :>I have (at least) two or three questions for this requirement, to :>>>> :>> :>understand it better and to give a better advice: :>>>> :>> :> :>>>> :>> :>1. is it necessary to access the individual items directly :>>>> from outside, :>>>> :>> :>that is: do they need a pointer or :>>>> :>> :>a key or an ID or something, or is the sequential scan the :>>>> only way to :>>>> :>> :>access them (after the insertion)? :>>>> :>> :>A pointer is no good solution for that, because it will :>>>> change, when :>>>> :>> :>items are moved (see 2.) ... even :>>>> :>> :>if there is no second area. But there are easy solutions to :>>>> overcome :>>>> :>> :>this problem ... if an external "key" :>>>> :>> :>is really needed. :>>>> :>> :> :>>>> :>> :>2. if you are scanning the area sequentially and remove the :>>>> items that :>>>> :>> :>are no longer needed, you don't :>>>> :>> :>need a second area, IMO. You simply move the remaining items :>>>> "nearer to :>>>> :>> :>the top", and eliminate the "holes". :>>>> :>> :> :>>>> :>> :>3. can you give an upper limit to the size of the items (which :>>>> then :>>>> :>> :>gives us the possibility to define a :>>>> :>> :>reasonable upper limit for the size of the complete area or :>>>> structure)? :>>>> :>> :> :>>>> :>> :>For me, several designs come to mind: :>>>> :>> :> :>>>> :>> :>- sort of "records" with length information, like VSAM records :>>>> in an :>>>> :>> :>ESDS control interval :>>>> :>> :> :>>>> :>> :>- something like DB2 data pages with row ids (fixed size - 4k, :>>>> 8k, 32k - :>>>> :>> :>with a row pointer vector :>>>> :>> :>at the beginning or at the end; this would allow for external IDs :>>>> :>> :>remaining constant, :>>>> :>> :>even if the items were moved inside the page) :>>>> :>> :> :>>>> :>> :>Kind regards :>>>> :>> :> :>>>> :>> :>Bernd :>>>> :>> :> :>>>> :>> :> :>>>> :>> :>Am 29.12.2024 um 21:51 schrieb Binyamin Dissen: :>>>> :>> :>> The data structure consists of various items of different :>>>> lengths. The items :>>>> :>> :>> are added serially and periodically the area is scanned :>>>> where most items are :>>>> :>> :>> removed. The remaining items are likely to be removed in a :>>>> later scan (with :>>>> :>> :>> new items). :>>>> :>> :>> :>>>> :>> :>> My thought was to have two data areas, and during the scan :>>>> insert the :>>>> :>> :>> undeleted items into the alternate area. It will be a bit of :>>>> data moves but is :>>>> :>> :>> simple. :>>>> :>> :>> :>>>> :>> :>> Does Knuth talk about such a data structure? :>>>> :>> :>> :>>>> :>> :>> -- :>>>> :>> :>> Binyamin Dissen <bdis...@dissensoftware.com> :>>>> :>> :>> http://www.dissensoftware.com :>>>> :>> :>> :>>>> :>> :>> Director, Dissen Software, Bar & Grill - Israel :>>>> :>> :>> :>>>> :>> :>> :>>>> ---------------------------------------------------------------------- :>>>> :>> :>> For IBM-MAIN subscribe / signoff / archive access instructions, :>>>> :>> :>> send email to lists...@listserv.ua.edu with the message: :>>>> INFO IBM-MAIN :>>>> :>> :> :>>>> :>> :>>>> :>---------------------------------------------------------------------- :>>>> :>> :>For IBM-MAIN subscribe / signoff / archive access instructions, :>>>> :>> :>send email to lists...@listserv.ua.edu with the message: INFO :>>>> IBM-MAIN :>>>> :>> :>>>> :>> -- :>>>> :>> Binyamin Dissen <bdis...@dissensoftware.com> :>>>> :>> http://www.dissensoftware.com :>>>> :>> :>>>> :>> Director, Dissen Software, Bar & Grill - Israel :>>>> :>> :>>>> :>> :>>>> ---------------------------------------------------------------------- :>>>> :>> For IBM-MAIN subscribe / signoff / archive access instructions, :>>>> :>> send email to lists...@listserv.ua.edu with the message: INFO :>>>> IBM-MAIN :>>>> :> :>>>> :>---------------------------------------------------------------------- :>>>> :>>>> :>For IBM-MAIN subscribe / signoff / archive access instructions, :>>>> :>send email to lists...@listserv.ua.edu with the message: INFO :>>>> IBM-MAIN :>>>> :>>>> -- :>>>> Binyamin Dissen <bdis...@dissensoftware.com> :>>>> http://www.dissensoftware.com :>>>> :>>>> Director, Dissen Software, Bar & Grill - Israel :>>>> :>>>> ---------------------------------------------------------------------- :>>>> For IBM-MAIN subscribe / signoff / archive access instructions, :>>>> send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN :>>> :>>> ---------------------------------------------------------------------- :>>> For IBM-MAIN subscribe / signoff / archive access instructions, :>>> send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN :> :>---------------------------------------------------------------------- :>For IBM-MAIN subscribe / signoff / archive access instructions, :>send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- Binyamin Dissen <bdis...@dissensoftware.com> http://www.dissensoftware.com Director, Dissen Software, Bar & Grill - Israel ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN