On 2021-05-26 18:43, Alan Gauld via Python-list wrote: > On 26/05/2021 14:09, Tim Chase wrote: >>> If so, doesn't that introduce a pretty big storage overhead for >>> large strings? >> >> Yes. Though such large strings tend to be more rare, largely >> because they become unweildy for other reasons. > > I do have some scripts that work on large strings - mainly produced > by reading an entire text file into a string using file.read(). > Some of these are several MB long so potentially now 4x bigger than > I thought. But you are right, even a 100MB string should still be > OK on a modern PC with 8GB+ RAM!...
If you don't decode it upon reading it in, it should still be 100MB because it's a stream of encoded bytes. It would only 2x or 4x in size if you decoded that (either as a parameter of how you opened it, or if you later took that string and decoded it explicitly, though now you have the original 100MB byte-string **plus** the 100/200/400MB decoded unicode string). You don't specify what you then do with this humongous string, but for most of my large files like this, I end up iterating over them piecewise rather than f.read()'ing them all in at once. Or even if the whole file does end up in memory, it's usually chunked and split into useful pieces. That could mean that each line is its own string, almost all of which are one-byte-per-char with a couple strings at sporadic positions in the list-of-strings where they are 2/4 bytes per char. -tkc -- https://mail.python.org/mailman/listinfo/python-list
