Re: [GENERAL] Recursive Arrays 101

Rob Sargent Mon, 26 Oct 2015 14:01:00 -0700

On 10/26/2015 02:51 PM, David Blomstrom wrote:

I'm focusing primarily on vertebrates at the moment, which have atotal of (I think) about 60,000-70,000 rows for all taxons (species,families, etc.). My goal is to create a customized database that doesa really good job of handling vertebrates first, manually adding a fewkey invertebrates and plants as needed.
I couldn't possibly repeat the process with invertebrates or plants,which are simply overwhelming. So, if I ever figure out the Catalogueof Life's database, then I'm simply going to modify its tables so theywork with my system. My vertebrates database will override theirvertebrate rows (except for any extra information they have to offer).
As for "hand-entry," I do almost all my work in spreadsheets. I spenta day or two copying scientific names from the Catalogue of Life intomy spreadsheet. Common names and slugs (common names in a URL format)is a project that will probably take years. I might type a scientificname or common name into Google and see where it leads me. If acertain scientific name is associated with the common name "yellowbirch," then its slug becomes yellow-birch. If two or more species arecalled yellow birch, then I enter yellow-birch in a different table("Floaters"), which leads to a disambiguation page.
For organisms with two or more popular common names - well, I haven'treally figured that out yet. I'll probably have to make an extra tablefor additional names. Catalogue of Life has common names in itsdatabase, but they all have upper case first letters - like AmericanBeaver. That works fine for a page title but in regular text I need tomake beaver lowercase without changing American. So I'm just startingfrom square one and recreating all the common names from scratch.

Multiple names can be handled in at least two ways. A child table ofspecies which has species id and alternate name per record - then youcan get all other-names back by species id. Of course going fromaltername-name back to species may get you multiple species. Or, welcometo postgres' arrays-as-column: you can have one column, maybe calledaliases which is an array of string.

It gets still more complicated when you get into "specialist names.";) But the system I've set up so far seems to be working pretty nicely.

On Mon, Oct 26, 2015 at 1:41 PM, Rob Sargent <[email protected]<mailto:[email protected]>> wrote:


    On 10/26/2015 02:29 PM, David Blomstrom wrote:

        Sorry for the late response. I don't have Internet access at
        home, so I only post from the library or a WiFi cafe.

        Anyway, where do I begin?

        Regarding my "usage patterns," I use spreadsheets (Apple's
        Numbers program) to organize data. I then save it as a CSV
        file and import it into a database table. It would be very
        hard to break with that tradition, because I don't know of any
        other way to organize my data.

        On the other hand, I have a column (Rank) that identifies
        different taxonomic levels (kingdom, class, etc.). So I can
        easily sort a table into specific taxonomic levels and save
        one level at a time for a database table.

        There is one problem, though. I can easily put all the
        vertebrate orders and even families into a table. But genera
        might be harder, and species probably won't work; there are
        simply too many. My spreadsheet program is almost overwhelmed
        by fish species alone. The only solution would be if I could
        import Mammals.csv, then import Birds.csv, Reptiles.csv, etc.
        But that might be kind of tedious, especially if I have to
        make multiple updates.

    Yes I suspect you spreadsheet will be limited in rows, but of
    course you can send all the spreadsheets to a single table in the
    database. If that's what you want.  You don't have to, but you see
    mention of tables millions of records routinely.  On the other
    hand, if performance becomes an issue with the single table
    approach you might want to look at "partitioning".  But I would be
    surprised if you had to go there.

    What is your data source?  How much hand-entry are you doing?
    There are tools which (seriously) upgrade the basic 'COPY into
    <table>' command.


        As for "attributes," I'll post my table's schema, with a
        description, next.





--
David Blomstrom
Writer & Web Designer (Mac, M$ & Linux)
www.geobop.org <http://www.geobop.org>

Re: [GENERAL] Recursive Arrays 101

Reply via email to