Hi, On 2019-09-27 10:53:48 -0400, Robert Haas wrote: > A lot of that is because they hit the 1GB allocation limit, and I > wonder whether we shouldn't be trying harder to avoid imposing that > limit in multiple places.
> It's reasonable - and necessary - to impose > a limit on the size of an individual datum, but when that same limit > is imposed on other things, like the worst-case size of the encoding > conversion, the size of an individual message sent via the wire > protocol, etc., you end up with a situation where users have trouble > predicting what the behavior is going to be. >=1GB definitely won't > work, but it'll probably break at some point before you even get that > far depending on a bunch of complex factors that are hard to > understand, not really documented, and mostly the result of applying > 1GB limit to every single memory allocation across the whole backend > without really thinking about what that does to the user-visible > behavior. +1 - that will be a long, piecemeal, project I think... But deciding that we should do so is a good first step. Note that one of the additional reasons for the 1GB limit is that it protects against int overflows. I'm somewhat unconvinced that that's a sensible approach, but ... I wonder if we shouldn't make stringinfos use size_t lengths, btw. Only supporting INT32_MAX (not even UINT32_MAX) seems weird these days. But we'd presumably have to make it opt-in. > One approach I think we should consider is, for larger strings, > actually scan the string and figure out how much memory we're going to > need for the conversion and then allocate exactly that amount (and > fail if it's >=1GB). An extra scan over the string is somewhat costly, > but allocating hundreds of megabytes of memory on the theory that we > could hypothetically have needed it is costly in different way. My proposal for this is something like https://www.postgresql.org/message-id/20190924214204.mav4md77xg5u5wq6%40alap3.anarazel.de which should avoid the overallocation without a second pass, and hopefully without loosing much efficiency. It's worthwhile to note that additional passes over data are often quite expensive, memory latency hasn't shrunk that much in last decade or so. I have frequently seen all the memcpys from one StringInfo/char* into another StringInfo show up in profiles. Greetings, Andres Freund