The protobuf binary format doesn't provide any mechanism for determining
where a message begins or ends, so I don't think this is possible. Maybe
the only way to do it would be to introduce your own metadata header spaced
out at regular intervals (e.g. every 1 GiB), and have this special header
indicate where the next block begins.

On Sun, Oct 4, 2020 at 10:16 AM Angel Cervera Claudio <
[email protected]> wrote:

>
> I try to read chuncks of a file that contains sequence of PB blocks. Is
> there a way to detect where a block starts?
>
> A little bit of context:
> It is a huge file (around 60GB).
> The file format is a sequences of [[Block header][Block content]]. In
> reallity, It is a little bit more complex, but as sample is enough.
> The [Block header] contains the lenght of the next [block content].
> So the way to read it is sequencially.
>
> I wrote a Spark Connector. The first version is reading the file
> sequencially as well.
>
> In the next version, I want to proccess the file splitted, as Spark
> provides it. So I will get chuncks of the file.
> I need to search where a [block header] starts, to be able to read
> sequencially from that point.
> So, How to find this first block? Any idea?
>
> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/protobuf/01bd0fbf-cc13-476d-ab3a-c50a278f81aen%40googlegroups.com
> <https://groups.google.com/d/msgid/protobuf/01bd0fbf-cc13-476d-ab3a-c50a278f81aen%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/protobuf/CADqAXr4kphhZEogD71o0fiukYCZvq%2B7mko2B%3DeYvD510AtSNkQ%40mail.gmail.com.

Reply via email to