Re: TEXT column > 1Gb

Joe Carlson Wed, 12 Apr 2023 08:13:44 -0700

I’m curious what you learned. I’ve been tripping over the buffer allocation 
issue when either splitting input text into chunks or aggregating chunks in 
selects. I’ve decided that I need to move this to client side.


The use case is genomics. Extracting substrings is common. So going to chunked 
storage makes sense.

And we have a menagerie of code to deal with. Legacy Perl for loading. Clients 
in Perl, Java, node, python. A whole zoo. 

Thanks

> On Apr 11, 2023, at 10:51 AM, Rob Sargent <robjsarg...@gmail.com> wrote:
> 
> 
>> On 4/11/23 11:41, Joe Carlson wrote:
>> Hello,
>> 
>> I’ve recently encountered the issue of trying to insert more than 1 Gb into 
>> a TEXT column. While the docs say TEXT is unlimited length, I had been 
>> unaware of the 1Gb buffer size limitations.
>> 
>> We can debate whether or not saving something this big in a single column is 
>> a good idea (spoiler: it isn’t. But not my design and, in fairness, was not 
>> anticipated when the schema was designed.), I’d like to implement something 
>> that is not a major disruption and try to keep the mods on the server side. 
>> My first idea is to have a chunked associated table (in pseudo code)
>> 
>> CREATE TABLE associated(key_id integer references main_table(key_id), chunk 
>> integer, text_start integer, text_end integer, text_chunk TEXT);
>> 
>> And define functions for inserting and selecting by dividing into 1Mb chunks
>> 
>> CREATE FUNCTION insertText(INTEGER,TEXT) RETURNS INTEGER AS $$
>> DECLARE
>>   chunk INTEGER := 0;
>>   key_id ALIAS for $1;
>>   the_text ALIAS for $2;
>>   text_chunk TEXT;
>> BEGIN
>>   LOOP
>>     text_chunk := substr(the_text,chunk*1000000,1000000);
>>     IF length(text_chunk) = 0 THEN
>>       EXIT;
>>     END IF;
>>     INSERT INTO associated(key_id,chunk,text_start,text_end,text_chunk) 
>> VALUES 
>> (key_id,chunk,chunk*1000000,(chunk*1000000+length(text_chunk)),text_chunk);
>>     chunk := chunk + 1;
>>   END LOOP;
>>   RETURN chunk;
>> END;
>> $$ LANGUAGE plpgsql;
>> 
>> This apparently runs into the same issues of buffers size: I get an ‘invalid 
>> message length’ in the log file and the insert fails. I can see from adding 
>> notices in the code that I never enter the LOOP; I assume having function 
>> arguments > 1Gb is also a bad thing.
>> 
>> I’d like to continue to keep the modifications on the server size. And I’d 
>> like to believe someone else has had this problem before. Any suggestions 
>> other than have the client do the chunking? Can I use a different language 
>> binding and get around the argument length limitations?
>> 
>> Thanks
>> 
>> 
>> 
> I've hit this same limitation in Java (with write to db).  What is your stack 
> in this case?  Not sure my solution applies.
> 
>

Re: TEXT column > 1Gb

Reply via email to