original-brownbear commented on PR #13610: URL: https://github.com/apache/lucene/pull/13610#issuecomment-2256450538
@mikemccand It's admittedly the fault of design decisions made and each individually isn't that big. We've simply navigated ourselves into a situation where we have far too many segments per node temporarily at least. They're ~ a couple hundred bytes each in the case I looked into, but it adds up quickly if you have loads of segments as well as many fields per segment. But yes it's actually duplicated. It's not the same instance unfortunately, these are just read from the stream twice in two separate reads, same way we duplicate the writes. Not sure what the right strategy is for fixing the serialization side of things, but I assume that's a new version of the format and it makes sense to do a PR like this one first to save memory for existing segments? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
