Hello, hope everything is well with you. I am currently using grpc for communication of LLMs (large language models) having 1 to 7 billion parameters. I know that there is a 2 GB serialization limit and that is why I have used chunking. So basically in my chunking, let's say I have a model having 100 layers, where each layer is less than 2 GB. I basically send `batch_size` amount of layers in a single go. For example, let's say I can send 5 layers in a single go so I will need 20 rounds to communicate the whole 100 layers. Also, this chunking I am doing is using python and not grpc.
But right now I have a model which has a single layer itself which has a size greater than 2 GB so in that case I am not sure how to proceed. Can anyone please give me some info on how can I leverage grpc chunking for this issue? Kind regards, Saurav -- You received this message because you are subscribed to the Google Groups "grpc.io" group. To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscr...@googlegroups.com. To view this discussion visit https://groups.google.com/d/msgid/grpc-io/aa9a9bb2-d58f-4367-a1e4-3fc66edc174dn%40googlegroups.com.