Re: [capnproto] [Feature Request] Putting List() object at last and allow reuse of the MallocMessageBuilder

'Kenton Varda' via Cap'n Proto Mon, 05 Jun 2023 10:05:31 -0700

Cross-segment references are in fact expressed as a pair of a segment
number and an offset today. So with sufficiently complex allocation logic,
you could indeed allocate the side objects in a separate segment from the
list itself, and thus allow the list to be resized continuously.


But the C++ implementation, at least, was never intended to be used this
way. The main purpose of segments in C++ is to allow more space to be
allocated without relocating the existing data in memory. Each time an
allocation is unable to be satisfied by the existing segments, a new
segment is allocated, usually twice the size of the previous one.

Perhaps it would not take too much effort to expose some APIs such that you
can designate a particular list must live in its own segment, so that it
can be grown, and the segment itself can even be re-allocated in a
different location to accommodate when the list outgrows the previous
allocation. This of course means that all Reader and Builder objects
pointing into the list would be invalidated any time it grows. But I
suppose a carefully-written application could handle all that.

There is another problem with this, though, which is that "far pointers"
(that point across segment boundaries) have more overhead than in-segment
pointers. An in-segment pointer is always 8 bytes. Far pointers typically
require 8 bytes in the source segment *and* 8 bytes in the destination
segment, for a total of 16 bytes. (And when the destination segment doesn't
have space to allocate this, 16 bytes will be needed in some other segment,
for a total of 24 bytes). In a large list of small items this overhead
could be substantial.

-Kenton

On Mon, Jun 5, 2023 at 11:26 AM Jonathan Shapiro <s...@buttonsmith.com>
wrote:

> Chiming in here in relative ignorance…
>
> Offhand, it seems to me that the advantage to placing strings last is that
> you can buffer them in another segment making a single pass over the struct
> array and then appending them without a second touch. It plays fairly
> nicely with the cache, and writev() becomes your friend when the time comes
> to do your I/O.
>
> Offhand, it seems to me that you could equally well buffer the structs,
> but it depends on the on-wire representation for references.
>
> Logical truncate and physical truncate are two different things. I’d have
> to look at the wire representation to be sure, but I suspect it’s okay to
> truncate an already buffered array in the sense that the recipient will not
> be told about the (now garbage) extra bytes. Not efficient from a
> transmission perspective, and I can’t think of a good use case.
>
> Kevin: I have two ignorant design questions here:
>
> 1. Given that messages are already divided into segments, did you consider
> an on-wire representation for references taking the form segment::offset
> coupled with an in-memory segment table somewhere? I expect you did, and
> I’m wondering what the issues are/were? Wide adoption of writev() is
> relatively recent, which seems relevant here, but using a byte offset on
> the wire precludes both out-of-order segment serialization and out-of order
> segment writes.
>
> 2. If there were a reason to, does the current API expose the on-wire
> reference representation? Is the current representation effectively part of
> the API?
>
>
> Jonathan
>
>
>
> On Mon, Jun 5, 2023 at 7:32 AM 'Kenton Varda' via Cap'n Proto <
> capnproto@googlegroups.com> wrote:
>
>> Hi Gokul,
>>
>> There's currently no way to "inline" one struct into another.
>>
>> The fundamental problem you have here is this: Say you have a list
>> containing pointer values, and the list is actually at the end of the
>> message. So, you can resize it without fragmentation. You add one new
>> element. Then you populate the element, by allocating the pointer values
>> that go into it. These new allocations are added to the end of the message,
>> therefore the list is no longer at the end, and so can no longer be resized
>> without fragmentation. Yes, in theory, if you could remove all of the
>> objects allocated past the end of the list, then you could perhaps resize
>> it again. But, then you'd have to bring those object back again, added to
>> the end of the list. Each time you added a new element, the number of
>> objects that you are rolling back and forward every time increases, so the
>> overall time complexity ends up being O(n^2).
>>
>> You might want to consider an alternative design: Instead of a list,
>> maybe you want a stream. In a streaming approach, you would append each
>> item to your byte stream as a new encoded message. The down side of this
>> approach is that you cannot easily jump to a random location in the list;
>> you must parse the items in order. But maybe that's OK for your use case.
>>
>> A third approach could combine the two: You could create a stream of
>> messages, and separately maintain an index into that stream. The index
>> would store the starting byte offset of each message in the stream, so you
>> can seek to it. The elements of the index are fixed-size (just an integer),
>> so you could maintain the index as a Cap'n Proto List(UInt64) or the like.
>>
>> -Kenton
>>
>> On Wed, May 31, 2023 at 10:14 AM Gokul Rajiv <gokul.raji...@gmail.com>
>> wrote:
>>
>>> Hi Kenton,
>>>
>>> As a follow up to the case of structs containing pointer/non-primitive
>>> types, might there currently exist workarounds to being able to resize
>>> lists of these structs (like the one below) at the back of a message:
>>> perhaps removing the allocated non-primitive data or embedding them inline
>>> in the struct?
>>>
>>> ```
>>> struct FourPoints {
>>>     pt1 @0 :import "vector2.capnp".Vector2f; # bottom-left
>>>     pt2 @1 :import "vector2.capnp".Vector2f; # bottom-right
>>>     pt3 @2 :import "vector2.capnp".Vector2f; # top-right
>>>     pt4 @3 :import "vector2.capnp".Vector2f; # top-left
>>> }
>>>
>>> # the size of TagDetection is known
>>> struct TagDetection {
>>>     id @0 :UInt32;
>>>     hammingDistance @1 :UInt8;
>>>     tagSize @2 :Float32;
>>>     areaPixel @3 :Float32; # number of pixels the tag occupies in the
>>> image
>>>
>>>     tagCenter @4 :import "vector2.capnp".Vector2f;
>>>
>>>     # corners
>>>     pointsPolygon @5 :FourPoints;
>>>     pointsUndistortedPolygon @6 :FourPoints;
>>>
>>>     poseInCameraFrame @7 :import "se3.capnp".Se3; # in RDF camera frame
>>> }
>>> ```
>>>
>>> Of course, I could manually list the primitive fields of the constituent
>>> non-primitive types of the struct, but I'm trying to avoid having to do
>>> this.
>>>
>>> - Gokul
>>> On Monday, December 12, 2022 at 9:22:22 PM UTC+8 Hui min wrote:
>>>
>>>> Hi Kenton,
>>>>
>>>> Got it, it makes sense. So if I remove `labelString` data from the
>>>> definition, I could use truncate function to operate my detections @4
>>>> :List(Detection2d); object. Thanks!
>>>>
>>>> On Tuesday, 6 December 2022 at 10:43:17 pm UTC+8 ken...@cloudflare.com
>>>> wrote:
>>>>
>>>>> Hi Hui,
>>>>>
>>>>> Again sorry for the slow reply.
>>>>>
>>>>> In fact, the functionality you suggest does exist today, in the form
>>>>> of `capnp::Orphan<T>::truncate()`, which can be used to resize a list
>>>>> (truncate *or* extend), doing so in-place if possible. If the list is at
>>>>> the end of the message, it's possible it can be extended in-place. To use
>>>>> it, you would do:
>>>>>
>>>>> ```
>>>>> auto orphan = reader.disownDetections();
>>>>> orphan.truncate(newSize);
>>>>> reader.adoptDetections(kj::mv(orphan));
>>>>> ```
>>>>>
>>>>> HOWEVER, there is a problem that might apply to your sample use case:
>>>>> Your `Detection2d` struct contains `labelString @1 :Text`, which is a
>>>>> pointer type. When you set this field to a string value, the string is
>>>>> allocated at the end of the message. This means your list is *not* at the
>>>>> end of the message anymore, so you are no longer able to resize the list 
>>>>> to
>>>>> make it longer. To be able to extend your list, you will need the list to
>>>>> contain only primitive types, so that it stays at the end of the message.
>>>>>
>>>>> -Kenton
>>>>>
>>>>> On Sat, Nov 5, 2022 at 9:16 PM Hui min <cgz...@sina.com> wrote:
>>>>>
>>>>>> Hi Cap'n Proto Team,
>>>>>>
>>>>>> Thanks for the amazing tool you have created so far.
>>>>>>
>>>>>> Understand that with the current design of arena style memory
>>>>>> allocation, we cannot simply reuse the message, and re-init List() object
>>>>>> to send it again. This will cause the message to grow every time we send
>>>>>> (memory leak).
>>>>>>
>>>>>> However, sending variable length data is still pretty common
>>>>>> practice. If we have to reallocate a brand new heap for new message, it 
>>>>>> is
>>>>>> quite wasteful. In most cases however, the variable length list is just
>>>>>> one. So I have an idea.
>>>>>>
>>>>>> ```
>>>>>> struct Detection2d {
>>>>>>
>>>>>>     labelIdx @0 :UInt32;
>>>>>>     labelString @1 :Text;
>>>>>>     xmin @2 :Float32;
>>>>>>     xmax @3 :Float32;
>>>>>>     ymin @4 :Float32;
>>>>>>     ymax @5 :Float32;
>>>>>>     confidence @6 :Float32;
>>>>>>
>>>>>> }
>>>>>>
>>>>>> struct Detections2d {
>>>>>>
>>>>>>     header @0 :import "header.capnp".Header;
>>>>>>
>>>>>>     imageData @1 :Data; # should be RGB
>>>>>>
>>>>>>     width @2 :UInt32;
>>>>>>     height @3 :UInt32;
>>>>>>
>>>>>>    detections @4 :List(Detection2d);
>>>>>> }
>>>>>> ```
>>>>>>
>>>>>> In this case, I have put the variable length object at the vary last.
>>>>>> That means everything in front is fix length. Is there a way i could 
>>>>>> force
>>>>>> the Capnproto to discard the memory of the old List and create a new one
>>>>>> directly from its old memory location, with out leaking a chunk of memory
>>>>>> in the arena?
>>>>>>
>>>>>> If it is not currently possible, do you think it is a convenient
>>>>>> function to be added? Thanks!
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "Cap'n Proto" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to capnproto+...@googlegroups.com.
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/capnproto/c500ece7-1871-4015-ba78-f456365a8bc3n%40googlegroups.com
>>>>>> <https://groups.google.com/d/msgid/capnproto/c500ece7-1871-4015-ba78-f456365a8bc3n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Cap'n Proto" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to capnproto+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/capnproto/1b3e3492-ab37-4a42-8b00-dd9e4871a0fbn%40googlegroups.com
>>> <https://groups.google.com/d/msgid/capnproto/1b3e3492-ab37-4a42-8b00-dd9e4871a0fbn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Cap'n Proto" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to capnproto+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/capnproto/CAJouXQ%3DwwJ0_1-4hUjXOK3u0nEfy8o6VPEj3TDm3jWkgiahaVQ%40mail.gmail.com
>> <https://groups.google.com/d/msgid/capnproto/CAJouXQ%3DwwJ0_1-4hUjXOK3u0nEfy8o6VPEj3TDm3jWkgiahaVQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to capnproto+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/capnproto/CAJouXQmUudu9vshdiOvPkTF0uu6HTtDzYCd%3Dq6%3Dub_CVKGQHMA%40mail.gmail.com.

Re: [capnproto] [Feature Request] Putting List() object at last and allow reuse of the MallocMessageBuilder

Reply via email to