The protobuf binary format doesn't provide any mechanism for determining where a message begins or ends, so I don't think this is possible. Maybe the only way to do it would be to introduce your own metadata header spaced out at regular intervals (e.g. every 1 GiB), and have this special header indicate where the next block begins.
On Sun, Oct 4, 2020 at 10:16 AM Angel Cervera Claudio < [email protected]> wrote: > > I try to read chuncks of a file that contains sequence of PB blocks. Is > there a way to detect where a block starts? > > A little bit of context: > It is a huge file (around 60GB). > The file format is a sequences of [[Block header][Block content]]. In > reallity, It is a little bit more complex, but as sample is enough. > The [Block header] contains the lenght of the next [block content]. > So the way to read it is sequencially. > > I wrote a Spark Connector. The first version is reading the file > sequencially as well. > > In the next version, I want to proccess the file splitted, as Spark > provides it. So I will get chuncks of the file. > I need to search where a [block header] starts, to be able to read > sequencially from that point. > So, How to find this first block? Any idea? > > -- > You received this message because you are subscribed to the Google Groups > "Protocol Buffers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/protobuf/01bd0fbf-cc13-476d-ab3a-c50a278f81aen%40googlegroups.com > <https://groups.google.com/d/msgid/protobuf/01bd0fbf-cc13-476d-ab3a-c50a278f81aen%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/protobuf/CADqAXr4kphhZEogD71o0fiukYCZvq%2B7mko2B%3DeYvD510AtSNkQ%40mail.gmail.com.
