Just a follow up. On Friday, January 24, 2025 at 10:57:51 PM UTC+4 Saurav Pawar wrote:
> Hello, hope everything is well with you. > > I am currently using grpc for communication of LLMs (large language > models) having 1 to 7 billion parameters. I know that there is a 2 GB > serialization limit and that is why I have used chunking. So basically in > my chunking, let's say I have a model having 100 layers, where each layer > is less than 2 GB. I basically send `batch_size` amount of layers in a > single go. For example, let's say I can send 5 layers in a single go so I > will need 20 rounds to communicate the whole 100 layers. Also, this > chunking I am doing is using python and not grpc. > > But right now I have a model which has a single layer itself which has a > size greater than 2 GB so in that case I am not sure how to proceed. > > Can anyone please give me some info on how can I leverage grpc chunking > for this issue? > > Kind regards, > Saurav > -- You received this message because you are subscribed to the Google Groups "grpc.io" group. To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscr...@googlegroups.com. To view this discussion visit https://groups.google.com/d/msgid/grpc-io/bf393214-17f4-4895-a8ef-d68ce37993efn%40googlegroups.com.