Just a follow up.

On Friday, January 24, 2025 at 10:57:51 PM UTC+4 Saurav Pawar wrote:

> Hello, hope everything is well with you.
>
> I am currently using grpc for communication of LLMs (large language 
> models) having 1 to 7 billion parameters. I know that there is a 2 GB 
> serialization limit and that is why I have used chunking. So basically in 
> my chunking, let's say I have a model having 100 layers, where each layer 
> is less than 2 GB. I basically send `batch_size` amount of layers in a 
> single go. For example, let's say I can send 5 layers in a single go so I 
> will need 20 rounds to communicate the whole 100 layers. Also, this 
> chunking I am doing is using python and not grpc.
>
> But right now I have a model which has a single layer itself which has a 
> size greater than 2 GB so in that case I am not sure how to proceed. 
>
> Can anyone please give me some info on how can I leverage grpc chunking 
> for this issue?
>
> Kind regards,
> Saurav
>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/grpc-io/bf393214-17f4-4895-a8ef-d68ce37993efn%40googlegroups.com.

Reply via email to