The 2 GB serialization limit comes protobuf itself so there is nothing we can do about it. You need to write logic to break your layers at possibly arbitrary places not just at layer boundaries, so that each chunk is <= 2GB and regather them on the receiver side with your service code logic.
On Monday, January 27, 2025 at 1:54:26 PM UTC+5:30 Saurav Pawar wrote: > Just a follow up. > > On Friday, January 24, 2025 at 10:57:51 PM UTC+4 Saurav Pawar wrote: > >> Hello, hope everything is well with you. >> >> I am currently using grpc for communication of LLMs (large language >> models) having 1 to 7 billion parameters. I know that there is a 2 GB >> serialization limit and that is why I have used chunking. So basically in >> my chunking, let's say I have a model having 100 layers, where each layer >> is less than 2 GB. I basically send `batch_size` amount of layers in a >> single go. For example, let's say I can send 5 layers in a single go so I >> will need 20 rounds to communicate the whole 100 layers. Also, this >> chunking I am doing is using python and not grpc. >> >> But right now I have a model which has a single layer itself which has a >> size greater than 2 GB so in that case I am not sure how to proceed. >> >> Can anyone please give me some info on how can I leverage grpc chunking >> for this issue? >> >> Kind regards, >> Saurav >> > -- You received this message because you are subscribed to the Google Groups "grpc.io" group. To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscr...@googlegroups.com. To view this discussion visit https://groups.google.com/d/msgid/grpc-io/1e1e2a34-5525-416f-a064-822801548cb8n%40googlegroups.com.