Here's a summary of the many responses I got. Looks like Access is
the most common suggestion, but it sounds like Excel can be tweaked
to do a lot of what we are looking for. And the Specify software
sounds like an interesting option.
David
On Fri, Dec 7, 2012 at 4:43 PM, David Inouye
<<mailto:[email protected]>[email protected]> wrote:
I'm collaborating with colleagues from Dartmouth College on an
ongoing long-term study combining my 40 years of data on the
phenology and abundance of flowering by about 120 species of
wildflower species near the Rocky Mountain Biological Laboratory with
three years of data on phenology and abundance of bees (derived from
a combination of pan traps, netting, and observation). Data for the
flowers are stored in a combination of Excel spreadsheets (one for
each plot/year) and a (now very large) R matrix with all the data.
Data for the bees are also in Excel files, but with over 12,000
entries after three years we're starting to think it would be
preferable to have database software for the bees with:
-pull-down menus to reduce data entry errors
-ability to use the same information for multiple specimens (batch entry)
-ability to replace temporary names (sp. X, sp. Y) with the species
name once specimens are identified
-ability to query the system
One option is to get some help writing custom software for this
project, and another would be to use some existing database software.
If you have recommendations, I'd be glad to hear them.
***********************************
We have thousands of Triatominae (Chagas disease vectors) with lots
of associated data and foe many of the bugs combine it with household
socioeconomic surveys of where the bugs were collected.
We are just starting to use Specify 6 for organizing the data. It's
designed for museum collections (supported by NSF funding), free, can
upload data from excel sheets and can cross - reference the bug and
house data. It will do all the things you list below.
http://specifysoftware.org/
**********************
MS Access can do all of those things and there are classes that you
can take (even at places like Office Max) that will allow you to
design a database to suit your needs. Access is incredibly
powerful--you just have to have someone that knows how to use it
properly. Good luck! -Christa
**********************
I can highly recommend the research database project REDCap
(<http://project-redcap.org/>http://project-redcap.org/), initiated
and maintained by Vanderbilt University. It can do all the things you
mentioned - and many more besides. Individual data capture projects
are highly customizable, data is securely stored online, and projects
can easily shared between different institutions and sites with
variable data access rights. As far as I know, it is also free to use.
I have only worked with medical/psychiatric data on this so far at
one of Vanderbilt's existing institutional partners, so I don't know
how easy it is for individuals to set this up at their own
institutions (I'm not the official REDCap administrator there). For
most ecological projects, especially at smaller universities and
colleges, this may not be useful, which is why I didn't mention it to
the whole listserv. But for larger, long-term, and multi-site studies
I think this would be a tremendous asset to have. If you think it
would be useful information for the whole listserv, please feel free
to share it with everyone or let me know and I can re-post it to all.
I am working with several research projects that use REDCap at the
moment, just in case you have any questions about how to use it or
what it can do.
**********************
We have custom built databases for data entry and data access for
researchers for long-term data collected in animal populations
(meerkats, red deer, soay sheep) built by Sunadal Data Solutions. The
database is in Access, seems to work reasonably well for field
technicians to enter the data, and also works well for us retrieving
the data using queries. It is based upon Access but I quite like it
even though I would prefer to not work with Access.
Website:
http://www.sunadal.co.uk/
******************************
I believe that Microsoft Access would have all the characteristics
that you're looking for. We use it for similar data entry needs at
the Idaho Department of Fish and Game research office.
*****************************
I highly recommend MS Access. Not only it can handle excel
spreadsheets well but it has much larger capacity (storage, technical
capabilities, queries, combines, reports, etc).
The only drawback is that it has a learning curve however, there are
youtube tutorials available.
*********************************
Though this may not be exactly what you want (and perhaps you're
already aware of this), but Excel does have the capabilities of
generating drop down menus. You can write formulas corresponding to
those drop-down menus to have it "autofill" various cells, based on
the selection of drop-down menus. Querying may not be as effective in
excel as it is in access, but I suppose it depends what you're
interested in doing with the data. By using filters and pivot tables
and things, I've been able to use Excel for most everything I do
including some rather sophisticated data analysis, without needing to
get involved in Access or another database software. I find this
useful as most of my colleagues are familiar with excel and can
follow formulas to understand my logic and things.
*********************************
The easiest would be to push the data into a web-based open source
database (e.g. MySQL) and setup a simple interface (i.e. website)
where you can query/edit the data or even upload/download data. For
this you'd need a web server. Best is the LAMP architecture (Linux,
Apache, MySQL, PHP). Then, with the data being available from the web
(i.e. anywhere) you have more flexibility, security, and redundancy.
We've done similar things for some clients (on a not-for-profit
base). Let me know if you have more questions.
************************************
Interesting question you've posed to the list, and one which I found
myself asking almost 2 years ago when I began a similar project. My
research aims to identify environmental drivers of plankton phenology
for about 50 plankton species from a 23 year long term plankton time
series in the Gulf of Naples. There are 344 species of phytoplankton
and 111 species of zooplankton. I typically work with species lists
by dates and abundance by dates as input files and then I manipulate
the matrices in R. I am still taking the extra step to clean data
files and not using a query based approach to extract from a master database.
When I began this project (I am 2 years into a 3 year fellowship) I
considered using MySQl for it's compatibility with Microsoft systems,
etc. It can be queried directly from R and also allows backwards
export into excel and R file format. I opted not to go this route
unfortunately because I knew the set-up would be a lot of work (no
formal database exists at the moment) and after my project was
completed, I realized it would not be utilized.
Therefore, if you can project long into the future, ie: that your
data will need to be accessed for similar work, especially of a time
series nature, then I definitely suggest using a SQL type approach.
Do not pay for any commercial software, it's certainly available for
open platforms.
And with that, I will close and say my expertise does not lie with
the use of MySQL at the moment, so I'm sure you'll receive better
advice, however as someone using the same tools and trying to
seamlessly and efficiently make them work, my experience is that a
query based approach may have helped from day 1.
********************************
I'm working on a bass dispersal study on Lake Champlain following
catch and release tournaments. We have two years of data on several
thousand fish. I set up the data in a relational database using
Microsoft Access. Each fish has original capture and recapture
locations. During data entry we control for data quality by using
drop down menus for location. Further controls include limits on fish
length and weight to be within reasonable (and legal) limits for
tournament caught black bass on Lake Champlain (i.e. no fish under
300mm). The relational database allows me to assign each location a
latitude, longitude, average depth, and distance from the tournament
weigh-in. There's a bit of a learning curve with Access, but the
functionality and customization allowed are worth the effort. I've
attached a few screen shots of our database. Feel free to email me if
you have further questions on database set up!
*************************************
You absolutely should be using a database for this. We did similar
work in the 90's on Great Lake phytoplankton and zooplankton. We used
a product called INFO which eventually morphed into the base data
base for ArcGis. You could use other relational databases such as
Access, Oracle, MySql among many others.
If you would like additional assistance, please feel free to contact
me. I have included a couple of references for examples of the use of
relational databases to analyze large data sets.
Chandler - Misener Award for the best article in the Journal of Great
Lakes Research in 2000-2001 for Makarewicz, J.C., P.E. Bertram and
T.W. Lewis. 2000. Chemistry of the offshore surface waters of Lake
Erie: Pre- and post- Dreissena introduction (1983-1993). J. Great
Lakes Res. 26:82-93 http://www.iaglr.org/as/cm.php .
Makarewicz, J.C., P.E. Bertram and T.W. Lewis. 1991. Phytoplankton
and zooplankton composition, abundance and distribution and trophic
interactions: offshore region of Lakes Erie, Lake Huron and Lake
Michigan, 1985. U.S.E.P.A. Great Lakes National Program Office, Chicago, IL.
*************************************
I am a former database manager. You can do all of these things simply
by switching from excel to an ACCESS relational data base - this is a
Microsoft product that imports and exports Excel to ease your
re-importation into a stats program. However any commercially
available relational database will suit - there are pros and cons to
each. It may already be a part of your MSOffice suite. If you have no
staff already familiar with creating a relational database with these
features for you, I'm certain a graduate student or postdoc could
take a short class for a nominal fee and be quickly able to set you
up. In my experience ACCESS can easily handle in excess of 65K
individual records - I don't recall the limit off the top of my head
but a quick google search could give you that information.
**************************************
I've been collecting seawater and sediment quality data along with
benthic invertebrate and plankton community data for over 15 years in
my study area. We've compiled these data into an Access database.
With a little bit of simple programming, you can develop data entry
and reporting tables. The relational structure of the database allows
one to link between tables and develop queries.
One thing that I've found useful is the inclusion of a "species"
table. I include a species code, and the taxonomic hierarchy to
species level. The code is used in all data tables where I reference
species. I have many unidentified species. When I obtain
identification of previously unidentified species, all I need to do
is to update one cell in the species table, and all subsequent
queries will provide the correct name. This makes it much easier to
identify/correct species names or spelling.
*****************************************
Have you looked into Access 2010? I'm currently building a database,
and I find Access very user friendly. I've even started to do a
little bit of VBA coding and it allows me to do pretty much anything
I can think of. It's a very flexible program, and it communicates
well with Excel. I've also heard you can use it easily with R,
although I haven't tried yet.
I didn't know anything about Databases when I started, so I took an
online course from this website:
<http://www.simonsezit.com/courses/microsoft/learn_microsoft_access_2010_training_tutorials/>http://www.simonsezit.com/courses/microsoft/learn_microsoft_access_2010_training_tutorials/
They also offer an Advanced course and it only costs $25 a month for
as many courses as you like (you could complete both courses within a
month if you have the time). I'm midway through the Advanced course,
and I'm pretty sure I could do all of the things you mentioned in
your wishlist.
Anyway, Access is probably the your simplest answer, and you know
what they say about simple answers!
************************************
Have you considered using Excel and its familiar and fairly powerful
user interface as a front end to a relational database, such as
Access or MySQL or Postgres? The VBA language (Visual Basic for
Applications) that is built into Excel can serve as the bridge to the
database. I should think this would be simplest with Access, because
both are Microsoft products and presumably play well together. On the
otherhand, MySQL and Postgres are both very powerful and are free.
One nice aspect of this approach is that the relational database
would be directly accessible via SQL for constructing complex queries
or for importing or exporting data more readily than might be
possible from the Excel interface.
*************************************
I assume your data has a strong spatial component and, therefore, I
would recommend you consider an ArcGIS data base, or a data base
engine that understands spatial data and connects well with ArcGIS,
or other GIS application. If you have a little money for programming,
Python would be a good choice for building your forms.
Also FWIW: A good logical analysis of your data and the
interrelationships would be a better place to start than looking for
some data base software. Most people build a conceptual model in
their head, then skip directly to a physical model that is
implemented in Excel spreadsheets. When the resulting multiple Excel
spreadsheets are discovered to be inadequate they start looking for
some data base software, again skipping the logical analysis. The
result is often disappointment in the selected data base, and three
or four iterations through the physical design process. The Wikipedia
article on Logical Data Model
(http://en.wikipedia.org/wiki/Logical_data_model) is not too bad.
***************************************
You could probably do this fairly easily with microsoft access (part
of the standard MS office suite). Its basically a pumped up version
of excel, with the ability to write queries, link data tables, and
create forms (etc). The functionalities that you want could easily be
created as data entry forms, and it already has a "query wizard" built in.
I currently work as a database manager/programmer for the US EPA, but
I also do some freelance work on the side. If you're looking to hire
someone to set up the access databases and forms, I would be happy to
help -- just provide me with a little more information, and I could
give you a quote. I would be happy to provide some samples so you
could get an idea of the look and feel of an access database, and
what can be done with it. As it is, your project sounds like it
should only take a few hours.
*******************************************
If he were local, I'd get my Spring class to build him a (custom)
HiPerCiC application, but we're still getting established on campus,
and it's too early to try to do something elsewhere.
But at least some customization would be desirable, since he needs
data entry menus, reuse of entry fields, etc. There may be good
integrated systems for everything from reliable and convenient data
entry to data transformation (such as substituting species names or
other info that needs to be unknown at first) to data retrieval (I
wonder what kinds of queries he wants to do), but I don't track that
kind of thing. If he can find the right disciplinary package, all the better.
Otherwise, he might do well to think of a 2-stage process: data entry
and transformation into a repository (perhaps SQL database or R
table) , then further transformation, analysis, and retrieval from
that repository. Google forms and I'm guessing new in the latest
Excel might make it relatively easy for a graduate student to build
something for data entry and initial transformation. The choice of
repository technology depends on what types of analysis and retrieval
he wants. If it's mostly statistical analysis, something like R would
fit well; if he's looking for sophisticated retrieval, involving
accessing and operating on the data according to logical combinations
of its properties, an SQL database might be a great ft.
****************************
Hi David. This might not be the direction you want to go, but I work
with a group of database developers. I had been told by Access and
other database developers that the data I had were impossible (too
big) to handle. But the guys I now work with on these issues did it
with no problem. They're based in Louisiana and I am in Florida so we
do this remotely. They are very good. Every time I start a new
project, I work with them from the start to develop a database for my
needs. They build in querying, etc... whatever you need. And can
export for any packages you'd like... SAS, R, Matlab. Their products
always look nice and are super user friendly. I say this might not be
the direction you want to go because it's not software to buy and
start using -- they develop products to fit your needs, so it costs.
If you are interested, you can contact Craig Conzelmann [email protected]
************************
Does your current Excel implementation use VBA coding? It is quite
powerful programming language (even though it has annoying
idiosyncrasies). You can make Excel sit up and beg, so to speak. I
once wrote a 16,000 line application in Excel VBA that presented a
very detailed user interface and did major calculations. Among the
other tools it provides, you can make popup input boxes with
drop-down lists, input slots, check boxes, radio buttons, etc. They
can be pre-populated when the data are available and can collect new
data (with error checking) that is more convenient than typing
directly into spread-sheet cells. The application can be entirely
self-contained within a single spread sheet, or can talk to multiple
sheets in the same workbook, or to other spreadsheets on the same
computer (or, for all I know, over a network), and of course to an
outside relational database.
Another thing you should consider is to use the free open-source
versions in Open Office. There you can get very high functionality
without being tied to a possibly capricious vendor.
******************************
This is a bit of a non-answer to your question, but you can do a lot
of what you describe below in Excel.
-pull-down menus to reduce data entry errors
I can't think of any way to directly do this in Excel, but you could
set up a second column that checks if the column you are entering into
has predefined names using nested if functions. See attached excel
example. If you have more than about 6 nestings, you need to start
another column. Sometimes once the formula has been used, it's
helpful to copy it and paste special it as values to keep it from
using up a lot of memory (I usually keep a copy of the formula in a
comment in the cell header)
-ability to use the same information for multiple specimens (batch entry)
Copy and paste (control c + control p) works well for some of this.
You can also sort by whatever feature it is that needs to be done in
batches to make it easier (if you have a particular order that needs
to be retained, just add a sort order column that you can sort by to
get your original order)
-ability to replace temporary names (sp. X, sp. Y) with the species
name once specimens are identified
Find and replace. In the sample excel sheet, if you hit control + h,
you can find something and replace it. This works for correcting
typos as well. You can also recode data this way (e.g., find all
instances of Apis melifera
-ability to query the system
Pivot tables are awesome for this! Under the insert tab, there is a
pivot table option (insert menu on older versions of excel).
Highlight the data you want, and have it create a pivot table in a new
sheet. You can then manipulate the data in a variety of ways. It is
really easy to get summary statistics (in the sample excel worksheet,
I've got it set up to compute the average of counts by bee species.
Note that if you needed different counts by year, you could have the
years set up as columns. Where Excel beats Access (to my limited
knowledge of Access), is if you needed the data by year and by
something else, you can do that (whereas I could only get one thing to
work in a crosstabs query in access, but my knowledge of Access is
limited as I find it less intuitive and far more restrictive than
Excel).
The pivot table is also a helpful way to find typos in the data entry.
There are a lot of options for the pivot table, and some of them take
some finding (when in doubt, right click, sometimes knowing what to
rightclick is a challenge)
You can also join data from different Excel workbooks using vlookup functions.
I think I have tutorial notes on pivot tables and on lookup functions
that I used for an informal brownbag lying around somewhere that I can
look for, if this is of interest to you.
So, not a database software, but tips on how to make Excel function
more as a database. I'd be happy to provide more input on the subject
if desired.
*******************************
A colleague of mine has forwarded your message to ECOLOG-L to me for
the reason that we are working on the type of software you have mentioned.
You can take a closer look at software features at
<http://www.earthcape.com>http://www.earthcape.com.
We are currently only starting up and working with selected projects
(e.g. <http://www.safeproject.net>http://www.safeproject.net) to
build their databases and bring them online.
If you would like to consider working with us I would be happy to
answer any questions and share more details on our products and services.
Looking forward to hearing from you,
<http://www.earthcape.com>http://www.earthcape.com
http://www.twitter.com/eartcape
<http://www.facebook.com/earthcape>http://www.facebook.com/earthcape
************************************
It sounds like MS Access might meet your needs. It is fairly
accessible (comes with MS office packages) so your cost would be low
(hopefully none). With some VBA (Visual Basic) you can customize your
front end (forms for data entry and queries) to a great extent. Being
a MS product it interfaces easily with excel, should feel more
familiar and there are R packages to move data in and out as well.
I would caution that Access isn't ideal for all databases, you'll
find database managers that hate it. In my limited experience this is
mainly when you have more advanced needs or want to use something
else for front end, like a website that could be queried dynamically
or spatial data better displayed in ArcMap.
Other things you might want to look into (open source):
mySQL
PostgreSQL
All three of these are used widely and there is a strong DB
community/resources for them.
***********************************
David, Excel and R can do all the functions you require. As an
ecological analyst, the main wish that I would have is that the data
are kept in a single spreadsheet, which decreases data management
time just a bit by lessening the chance that the spreadsheets are
differentially formatted. Also, the number of records is not large
compared to some ecological files I have worked with.
Without fully understanding your concerns, I would recommend
restricting variable values in excel for data entry. The batch entry
is kind of context-dependent... you could use either R or excel to
fill in records. I would probably use R. Use excel for find/replace.
For querying, use either pivot tables in excel, or embark on the
journey of R through using hadley wickam's plyr package to rearrange
data to your heart's content. Best wishes, and thanks for tending the
listserve,
*********************************
It sounds like you do need a database! For other suggestions see an
earlier similar discussion, but if
you're prepared to pay Filemaker Pro will do everything you want, is
very easy to set up and use, and
is fully compatible with Excel import/export if required.
For the batch entry you'd probably have to first enter the records
and then fill the common values
afterwards (trivial to do, and you could set up a button to automate
it if required). The other features
you need are inherent to FM.
************************************
Did you get a response to your inquiry yet? This is not a complicated
task to structure a relational database to hold this data. As part of that
process one could create some queries and export functions to structure the
data for other applications like R. You might even think of it as four
separate tasks: 1) designing the database structure; 2) building the
database using particular relational database software (MS access, Oracle,
etc.); 3) populating the database with your data; and 4) developing
additional tools within the database to extract standard information about
your data. There will always be a fifth step, though, of developing queries
to extract data and format data for specific questions.
The initial design is very important for ease of extracting data
later.
***************************************
Specify 6 looks pretty good. I like that it's open source and already
set up for geographic data.
The limits on Access are number of objects, not number of entries.
I've worked with tables with hundreds of thousands of records with no
problems. That said, if the support and tutorials for Specify 6 looks
better, that looks like a good option.
Thanks for replying. I had not heard of Specify 6 and will be checking it out.