I’m curious what you learned. I’ve been tripping over the buffer allocation issue when either splitting input text into chunks or aggregating chunks in selects. I’ve decided that I need to move this to client side.
The use case is genomics. Extracting substrings is common. So going to chunked storage makes sense. And we have a menagerie of code to deal with. Legacy Perl for loading. Clients in Perl, Java, node, python. A whole zoo. Thanks > On Apr 11, 2023, at 10:51 AM, Rob Sargent <robjsarg...@gmail.com> wrote: > > >> On 4/11/23 11:41, Joe Carlson wrote: >> Hello, >> >> I’ve recently encountered the issue of trying to insert more than 1 Gb into >> a TEXT column. While the docs say TEXT is unlimited length, I had been >> unaware of the 1Gb buffer size limitations. >> >> We can debate whether or not saving something this big in a single column is >> a good idea (spoiler: it isn’t. But not my design and, in fairness, was not >> anticipated when the schema was designed.), I’d like to implement something >> that is not a major disruption and try to keep the mods on the server side. >> My first idea is to have a chunked associated table (in pseudo code) >> >> CREATE TABLE associated(key_id integer references main_table(key_id), chunk >> integer, text_start integer, text_end integer, text_chunk TEXT); >> >> And define functions for inserting and selecting by dividing into 1Mb chunks >> >> CREATE FUNCTION insertText(INTEGER,TEXT) RETURNS INTEGER AS $$ >> DECLARE >> chunk INTEGER := 0; >> key_id ALIAS for $1; >> the_text ALIAS for $2; >> text_chunk TEXT; >> BEGIN >> LOOP >> text_chunk := substr(the_text,chunk*1000000,1000000); >> IF length(text_chunk) = 0 THEN >> EXIT; >> END IF; >> INSERT INTO associated(key_id,chunk,text_start,text_end,text_chunk) >> VALUES >> (key_id,chunk,chunk*1000000,(chunk*1000000+length(text_chunk)),text_chunk); >> chunk := chunk + 1; >> END LOOP; >> RETURN chunk; >> END; >> $$ LANGUAGE plpgsql; >> >> This apparently runs into the same issues of buffers size: I get an ‘invalid >> message length’ in the log file and the insert fails. I can see from adding >> notices in the code that I never enter the LOOP; I assume having function >> arguments > 1Gb is also a bad thing. >> >> I’d like to continue to keep the modifications on the server size. And I’d >> like to believe someone else has had this problem before. Any suggestions >> other than have the client do the chunking? Can I use a different language >> binding and get around the argument length limitations? >> >> Thanks >> >> >> > I've hit this same limitation in Java (with write to db). What is your stack > in this case? Not sure my solution applies. > >