Hello, hope everything is well with you.

I am currently using grpc for communication of LLMs (large language models) 
having 1 to 7 billion parameters. I know that there is a 2 GB serialization 
limit and that is why I have used chunking. So basically in my chunking, 
let's say I have a model having 100 layers, where each layer is less than 2 
GB. I basically send `batch_size` amount of layers in a single go. For 
example, let's say I can send 5 layers in a single go so I will need 20 
rounds to communicate the whole 100 layers. Also, this chunking I am doing 
is using python and not grpc.

But right now I have a model which has a single layer itself which has a 
size greater than 2 GB so in that case I am not sure how to proceed. 

Can anyone please give me some info on how can I leverage grpc chunking for 
this issue?

Kind regards,
Saurav

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/grpc-io/aa9a9bb2-d58f-4367-a1e4-3fc66edc174dn%40googlegroups.com.

Reply via email to