FWIW capnp messages already encode their own size at the start of the
message (or, rather, they encode a segment table, which you can sum up to
get the total size).

This might be useful:
https://github.com/sandstorm-io/capnproto/blob/master/c++/src/capnp/serialize.h#L111

-Kenton

On Fri, Apr 14, 2017 at 1:17 PM, <[email protected]> wrote:

> Thanks for the reply. Option 1 seems pretty reasonable for me. I would
> probably go as far as to frame the messages with magic + message size, that
> way I can verify that when there's another magic (or end of file) at
> current position + message size It's probably correct.
>
> On Friday, April 14, 2017 at 1:08:55 PM UTC-7, Kenton Varda wrote:
>>
>> Hi Stepan,
>>
>> No, there's no easy way to detect the corruption your describe. In fact,
>> for most serialization formats, there's no solution to this problem. Once
>> you've lost track of message boundaries, it's impossible to tell the
>> difference between the start of a new message vs. data in the previous
>> message, since any message can contain arbitrary byte blobs (e.g. via the
>> `Data` type).
>>
>> If what you describe is a requirement for your use case, you could
>> accomplish it with an additional framing layer.
>>
>> Option 1: Choose an 128-bit unguessable random number before you start
>> writing. Write that number before each message. Now you can scan the bytes
>> of the file looking for this 128-bit sequence and, if you see it, you can
>> be fairly certain (p ~= 2^-128) that a new message starts after it. You
>> have to use a new random number for every file in case you ever embed a
>> whole file into another file.
>>
>> Option 2: Choose a magic number to write before each message, *and* scan
>> the contents of each message for this number, replacing it with an "escape
>> sequence" if seen. Do the opposite transformation while reading. This
>> allows you to detect boundaries "perfectly" (zero probability of false
>> positive) but you lose the benefits of zero-copy due to the need to process
>> escape sequences.
>>
>> -Kenton
>>
>> On Fri, Apr 14, 2017 at 12:35 PM, <[email protected]> wrote:
>>
>>> I have a message that serializes into 24 bytes. I write two messages to
>>> a file resulting in a file thats 48 bytes long. Now I truncate the file to
>>> 40 bytes and write one message, so the file now looks like this: 1 full
>>> message, one broken, 1 full message. Is there any way to iterate over the
>>> file and when encountering the broken message detect that it is broken and
>>> skip directly to the second full message? I've been using python to read
>>> such file with following code
>>>
>>> def main():
>>>     with open('dates.txt', 'r') as fp:
>>>         for date in date_capnp.Date.read_multiple(fp):
>>>                 print(date)
>>>
>>> But it fails with following message:
>>>
>>> Message contains non-struct pointer where struct pointer was expected
>>>
>>> Also, if it's possible to detect such message, is it possible to get
>>> it's position and length? Thank you.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Cap'n Proto" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> Visit this group at https://groups.google.com/group/capnproto.
>>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Cap'n Proto" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> Visit this group at https://groups.google.com/group/capnproto.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
Visit this group at https://groups.google.com/group/capnproto.

Reply via email to