Re: TEXT column > 1Gb

2023-04-12 Thread Rob Sargent
On 4/12/23 15:03, Joe Carlson wrote: On Apr 12, 2023, at 12:21 PM, Rob Sargent wrote: On 4/12/23 13:02, Ron wrote: /Must/ the genome all be in one big file, or can you store them one line per table row? The assumption in the schema I’m using is 1 chromosome per record. Chromosomes are ty

Re: TEXT column > 1Gb

2023-04-12 Thread Joe Carlson
> On Apr 12, 2023, at 12:21 PM, Rob Sargent wrote: > > On 4/12/23 13:02, Ron wrote: >> Must the genome all be in one big file, or can you store them one line per >> table row? The assumption in the schema I’m using is 1 chromosome per record. Chromosomes are typically strings of continuous s

Re: TEXT column > 1Gb

2023-04-12 Thread Ron
On 4/12/23 14:21, Rob Sargent wrote: On 4/12/23 13:02, Ron wrote: /Must/ the genome all be in one big file, or can you store them one line per table row? Not sure what OP is doing with plant genomes (other than some genomics) but the tools all use files and pipeline of sub-tools. In and out o

Re: TEXT column > 1Gb

2023-04-12 Thread Benedict Holland
Yea. For ease of use, out of the box solutions that will just work, large objects. You might know them as BLOBS in other SQL varieties. If you are dealing with that much data, I'm going to assume that storage isn't really your concern. I wouldn't even waste time compressing. I use them frequently t

Re: TEXT column > 1Gb

2023-04-12 Thread Rob Sargent
On 4/12/23 13:02, Ron wrote: /Must/ the genome all be in one big file, or can you store them one line per table row? Not sure what OP is doing with plant genomes (other than some genomics) but the tools all use files and pipeline of sub-tools.  In and out of tuples would be expensive.  Very,v

Re: TEXT column > 1Gb

2023-04-12 Thread Ron
/Must/ the genome all be in one big file, or can you store them one line per table row? On 4/12/23 12:19, Joe Carlson wrote: I’ve certainly thought about using a different representation. A factor of 2x would be good, for a while anyway. For nucleotide sequence, we’d need to consider a 10 cha

Re: TEXT column > 1Gb

2023-04-12 Thread Rob Sargent
On 4/12/23 11:24, Benedict Holland wrote: For documents that long I would seriously consider using large objects and refencing them with their OIDs. Text fields get put in a special location within the database. It's similar (possibly exactly) to using large objects. Also, you can potentially c

Re: TEXT column > 1Gb

2023-04-12 Thread Benedict Holland
For documents that long I would seriously consider using large objects and refencing them with their OIDs. Text fields get put in a special location within the database. It's similar (possibly exactly) to using large objects. Also, you can potentially compress them to save space on write and read.

Re: TEXT column > 1Gb

2023-04-12 Thread Joe Carlson
I’ve certainly thought about using a different representation. A factor of 2x would be good, for a while anyway. For nucleotide sequence, we’d need to consider a 10 character alphabet (A, C, G, T, N and the lower case forms when representing ’soft masked’ sequence*). So it would be 2 bases/byte.

Re: TEXT column > 1Gb

2023-04-12 Thread Mark Dilger
> On Apr 12, 2023, at 7:59 AM, Joe Carlson wrote: > > The use case is genomics. Extracting substrings is common. So going to > chunked storage makes sense. Are you storing nucleotide sequences as text strings? If using the simple 4-character (A,C,G,T) alphabet, you can store four bases per

Re: TEXT column > 1Gb

2023-04-12 Thread Rob Sargent
On 4/12/23 08:59, Joe Carlson wrote: I’m curious what you learned. I’ve been tripping over the buffer allocation issue when either splitting input text into chunks or aggregating chunks in selects. I’ve decided that I need to move this to client side. The use case is genomics. Extracting subs

Re: TEXT column > 1Gb

2023-04-12 Thread Joe Carlson
I’m curious what you learned. I’ve been tripping over the buffer allocation issue when either splitting input text into chunks or aggregating chunks in selects. I’ve decided that I need to move this to client side. The use case is genomics. Extracting substrings is common. So going to chunked s

Re: TEXT column > 1Gb

2023-04-11 Thread Pavel Stehule
Hi út 11. 4. 2023 v 19:42 odesílatel Joe Carlson napsal: > Hello, > > I’ve recently encountered the issue of trying to insert more than 1 Gb > into a TEXT column. While the docs say TEXT is unlimited length, I had been > unaware of the 1Gb buffer size limitations. > I think so this is some mis

Re: TEXT column > 1Gb

2023-04-11 Thread Rob Sargent
On 4/11/23 11:41, Joe Carlson wrote: Hello, I’ve recently encountered the issue of trying to insert more than 1 Gb into a TEXT column. While the docs say TEXT is unlimited length, I had been unaware of the 1Gb buffer size limitations. We can debate whether or not saving something this big in

TEXT column > 1Gb

2023-04-11 Thread Joe Carlson
Hello, I’ve recently encountered the issue of trying to insert more than 1 Gb into a TEXT column. While the docs say TEXT is unlimited length, I had been unaware of the 1Gb buffer size limitations. We can debate whether or not saving something this big in a single column is a good idea (spoile