I work with Postgres and wonder whether for my purposes there is a good-enough 
reason to update one of these days.

I’m an editor working with some 60,000 Early Modern texts, many of them in need 
of some editorial attention. The texts are XM encoded documents. Each word is 
wrapped in a <w> element with attributes for various linguistic metadata. 
Typically a type of error occurs several or many times, and at the margins they 
need individual attention. I use Python scripts to extract stuff from the main 
corpus—sometimes dozens, sometimes thousands or millions—turn them into keyword 
in contexts and import them into Postgres. I basically use Postgres as a giant 
spreadsheet.  Its excellent string-handling routines make it relatively easy to 
to perform search and sort operations that identify tokens in need of 
correction. Once they corrections are made in Postgres—typically as batch 
updates-- I move them as a data frame into Python, and from Python I move them 
back into the texts.

I do this on a recent Mac with 64 GB of memory and a 6 cor i& processor.  I use 
Data Studio as an editing interface.

Unless a more recent version of Postgress has additional string handling 
routines, or indexing routines that speed up working with tables with rows in 
the low millions, or other features that are likely to speed up operations, I 
don’t see any reasons to update.

I could imagine a table that has up to 40 million rows.  That would be pretty 
sluggish on my current equipment, which handles up to 10 million rows quite 
comfortably.

A I right in thinking that given my tasks and equipment it would be a waste of 
time to update? Or is there something I’m missing?

Martin Mueller
Professor emeritus of English and Classiccs
Northwestern University

Reply via email to