Here's a summary of the many responses I got. Looks like Access is the most common suggestion, but it sounds like Excel can be tweaked to do a lot of what we are looking for. And the Specify software sounds like an interesting option.

David

On Fri, Dec 7, 2012 at 4:43 PM, David Inouye <<mailto:[email protected]>[email protected]> wrote: I'm collaborating with colleagues from Dartmouth College on an ongoing long-term study combining my 40 years of data on the phenology and abundance of flowering by about 120 species of wildflower species near the Rocky Mountain Biological Laboratory with three years of data on phenology and abundance of bees (derived from a combination of pan traps, netting, and observation). Data for the flowers are stored in a combination of Excel spreadsheets (one for each plot/year) and a (now very large) R matrix with all the data. Data for the bees are also in Excel files, but with over 12,000 entries after three years we're starting to think it would be preferable to have database software for the bees with:
-pull-down menus to reduce data entry errors
-ability to use the same information for multiple specimens (batch entry)
-ability to replace temporary names (sp. X, sp. Y) with the species name once specimens are identified
-ability to query the system
One option is to get some help writing custom software for this project, and another would be to use some existing database software. If you have recommendations, I'd be glad to hear them.

***********************************
We have thousands of Triatominae (Chagas disease vectors) with lots of associated data and foe many of the bugs combine it with household socioeconomic surveys of where the bugs were collected.

We are just starting to use Specify 6 for organizing the data. It's designed for museum collections (supported by NSF funding), free, can upload data from excel sheets and can cross - reference the bug and house data. It will do all the things you list below.

http://specifysoftware.org/


**********************

MS Access can do all of those things and there are classes that you can take (even at places like Office Max) that will allow you to design a database to suit your needs. Access is incredibly powerful--you just have to have someone that knows how to use it properly. Good luck! -Christa

**********************

I can highly recommend the research database project REDCap (<http://project-redcap.org/>http://project-redcap.org/), initiated and maintained by Vanderbilt University. It can do all the things you mentioned - and many more besides. Individual data capture projects are highly customizable, data is securely stored online, and projects can easily shared between different institutions and sites with variable data access rights. As far as I know, it is also free to use.

I have only worked with medical/psychiatric data on this so far at one of Vanderbilt's existing institutional partners, so I don't know how easy it is for individuals to set this up at their own institutions (I'm not the official REDCap administrator there). For most ecological projects, especially at smaller universities and colleges, this may not be useful, which is why I didn't mention it to the whole listserv. But for larger, long-term, and multi-site studies I think this would be a tremendous asset to have. If you think it would be useful information for the whole listserv, please feel free to share it with everyone or let me know and I can re-post it to all.

I am working with several research projects that use REDCap at the moment, just in case you have any questions about how to use it or what it can do.

**********************
We have custom built databases for data entry and data access for researchers for long-term data collected in animal populations (meerkats, red deer, soay sheep) built by Sunadal Data Solutions. The database is in Access, seems to work reasonably well for field technicians to enter the data, and also works well for us retrieving the data using queries. It is based upon Access but I quite like it even though I would prefer to not work with Access.

Website:
http://www.sunadal.co.uk/

******************************

I believe that Microsoft Access would have all the characteristics that you're looking for. We use it for similar data entry needs at the Idaho Department of Fish and Game research office.

*****************************
I highly recommend MS Access. Not only it can handle excel spreadsheets well but it has much larger capacity (storage, technical capabilities, queries, combines, reports, etc). The only drawback is that it has a learning curve however, there are youtube tutorials available.

*********************************
Though this may not be exactly what you want (and perhaps you're already aware of this), but Excel does have the capabilities of generating drop down menus. You can write formulas corresponding to those drop-down menus to have it "autofill" various cells, based on the selection of drop-down menus. Querying may not be as effective in excel as it is in access, but I suppose it depends what you're interested in doing with the data. By using filters and pivot tables and things, I've been able to use Excel for most everything I do including some rather sophisticated data analysis, without needing to get involved in Access or another database software. I find this useful as most of my colleagues are familiar with excel and can follow formulas to understand my logic and things.

*********************************
The easiest would be to push the data into a web-based open source database (e.g. MySQL) and setup a simple interface (i.e. website) where you can query/edit the data or even upload/download data. For this you'd need a web server. Best is the LAMP architecture (Linux, Apache, MySQL, PHP). Then, with the data being available from the web (i.e. anywhere) you have more flexibility, security, and redundancy. We've done similar things for some clients (on a not-for-profit base). Let me know if you have more questions.

************************************
Interesting question you've posed to the list, and one which I found myself asking almost 2 years ago when I began a similar project. My research aims to identify environmental drivers of plankton phenology for about 50 plankton species from a 23 year long term plankton time series in the Gulf of Naples. There are 344 species of phytoplankton and 111 species of zooplankton. I typically work with species lists by dates and abundance by dates as input files and then I manipulate the matrices in R. I am still taking the extra step to clean data files and not using a query based approach to extract from a master database.

When I began this project (I am 2 years into a 3 year fellowship) I considered using MySQl for it's compatibility with Microsoft systems, etc. It can be queried directly from R and also allows backwards export into excel and R file format. I opted not to go this route unfortunately because I knew the set-up would be a lot of work (no formal database exists at the moment) and after my project was completed, I realized it would not be utilized.

Therefore, if you can project long into the future, ie: that your data will need to be accessed for similar work, especially of a time series nature, then I definitely suggest using a SQL type approach. Do not pay for any commercial software, it's certainly available for open platforms.

And with that, I will close and say my expertise does not lie with the use of MySQL at the moment, so I'm sure you'll receive better advice, however as someone using the same tools and trying to seamlessly and efficiently make them work, my experience is that a query based approach may have helped from day 1.

********************************

I'm working on a bass dispersal study on Lake Champlain following catch and release tournaments. We have two years of data on several thousand fish. I set up the data in a relational database using Microsoft Access. Each fish has original capture and recapture locations. During data entry we control for data quality by using drop down menus for location. Further controls include limits on fish length and weight to be within reasonable (and legal) limits for tournament caught black bass on Lake Champlain (i.e. no fish under 300mm). The relational database allows me to assign each location a latitude, longitude, average depth, and distance from the tournament weigh-in. There's a bit of a learning curve with Access, but the functionality and customization allowed are worth the effort. I've attached a few screen shots of our database. Feel free to email me if you have further questions on database set up!

*************************************
You absolutely should be using a database for this. We did similar work in the 90's on Great Lake phytoplankton and zooplankton. We used a product called INFO which eventually morphed into the base data base for ArcGis. You could use other relational databases such as Access, Oracle, MySql among many others.

If you would like additional assistance, please feel free to contact me. I have included a couple of references for examples of the use of relational databases to analyze large data sets.

Chandler - Misener Award for the best article in the Journal of Great Lakes Research in 2000-2001 for Makarewicz, J.C., P.E. Bertram and T.W. Lewis. 2000. Chemistry of the offshore surface waters of Lake Erie: Pre- and post- Dreissena introduction (1983-1993). J. Great Lakes Res. 26:82-93 http://www.iaglr.org/as/cm.php .

Makarewicz, J.C., P.E. Bertram and T.W. Lewis. 1991. Phytoplankton and zooplankton composition, abundance and distribution and trophic interactions: offshore region of Lakes Erie, Lake Huron and Lake Michigan, 1985. U.S.E.P.A. Great Lakes National Program Office, Chicago, IL.

*************************************

I am a former database manager. You can do all of these things simply by switching from excel to an ACCESS relational data base - this is a Microsoft product that imports and exports Excel to ease your re-importation into a stats program. However any commercially available relational database will suit - there are pros and cons to each. It may already be a part of your MSOffice suite. If you have no staff already familiar with creating a relational database with these features for you, I'm certain a graduate student or postdoc could take a short class for a nominal fee and be quickly able to set you up. In my experience ACCESS can easily handle in excess of 65K individual records - I don't recall the limit off the top of my head but a quick google search could give you that information.

**************************************
I've been collecting seawater and sediment quality data along with benthic invertebrate and plankton community data for over 15 years in my study area. We've compiled these data into an Access database. With a little bit of simple programming, you can develop data entry and reporting tables. The relational structure of the database allows one to link between tables and develop queries.

One thing that I've found useful is the inclusion of a "species" table. I include a species code, and the taxonomic hierarchy to species level. The code is used in all data tables where I reference species. I have many unidentified species. When I obtain identification of previously unidentified species, all I need to do is to update one cell in the species table, and all subsequent queries will provide the correct name. This makes it much easier to identify/correct species names or spelling.

*****************************************
Have you looked into Access 2010? I'm currently building a database, and I find Access very user friendly. I've even started to do a little bit of VBA coding and it allows me to do pretty much anything I can think of. It's a very flexible program, and it communicates well with Excel. I've also heard you can use it easily with R, although I haven't tried yet.

I didn't know anything about Databases when I started, so I took an online course from this website:
<http://www.simonsezit.com/courses/microsoft/learn_microsoft_access_2010_training_tutorials/>http://www.simonsezit.com/courses/microsoft/learn_microsoft_access_2010_training_tutorials/

They also offer an Advanced course and it only costs $25 a month for as many courses as you like (you could complete both courses within a month if you have the time). I'm midway through the Advanced course, and I'm pretty sure I could do all of the things you mentioned in your wishlist.

Anyway, Access is probably the your simplest answer, and you know what they say about simple answers!

************************************
Have you considered using Excel and its familiar and fairly powerful user interface as a front end to a relational database, such as Access or MySQL or Postgres? The VBA language (Visual Basic for Applications) that is built into Excel can serve as the bridge to the database. I should think this would be simplest with Access, because both are Microsoft products and presumably play well together. On the otherhand, MySQL and Postgres are both very powerful and are free. One nice aspect of this approach is that the relational database would be directly accessible via SQL for constructing complex queries or for importing or exporting data more readily than might be possible from the Excel interface.

*************************************

I assume your data has a strong spatial component and, therefore, I would recommend you consider an ArcGIS data base, or a data base engine that understands spatial data and connects well with ArcGIS, or other GIS application. If you have a little money for programming, Python would be a good choice for building your forms.

Also FWIW: A good logical analysis of your data and the interrelationships would be a better place to start than looking for some data base software. Most people build a conceptual model in their head, then skip directly to a physical model that is implemented in Excel spreadsheets. When the resulting multiple Excel spreadsheets are discovered to be inadequate they start looking for some data base software, again skipping the logical analysis. The result is often disappointment in the selected data base, and three or four iterations through the physical design process. The Wikipedia article on Logical Data Model (http://en.wikipedia.org/wiki/Logical_data_model) is not too bad.

***************************************
You could probably do this fairly easily with microsoft access (part of the standard MS office suite). Its basically a pumped up version of excel, with the ability to write queries, link data tables, and create forms (etc). The functionalities that you want could easily be created as data entry forms, and it already has a "query wizard" built in.

I currently work as a database manager/programmer for the US EPA, but I also do some freelance work on the side. If you're looking to hire someone to set up the access databases and forms, I would be happy to help -- just provide me with a little more information, and I could give you a quote. I would be happy to provide some samples so you could get an idea of the look and feel of an access database, and what can be done with it. As it is, your project sounds like it should only take a few hours.

*******************************************
If he were local, I'd get my Spring class to build him a (custom) HiPerCiC application, but we're still getting established on campus, and it's too early to try to do something elsewhere.

But at least some customization would be desirable, since he needs data entry menus, reuse of entry fields, etc. There may be good integrated systems for everything from reliable and convenient data entry to data transformation (such as substituting species names or other info that needs to be unknown at first) to data retrieval (I wonder what kinds of queries he wants to do), but I don't track that kind of thing. If he can find the right disciplinary package, all the better.

Otherwise, he might do well to think of a 2-stage process: data entry and transformation into a repository (perhaps SQL database or R table) , then further transformation, analysis, and retrieval from that repository. Google forms and I'm guessing new in the latest Excel might make it relatively easy for a graduate student to build something for data entry and initial transformation. The choice of repository technology depends on what types of analysis and retrieval he wants. If it's mostly statistical analysis, something like R would fit well; if he's looking for sophisticated retrieval, involving accessing and operating on the data according to logical combinations of its properties, an SQL database might be a great ft.

****************************
Hi David. This might not be the direction you want to go, but I work with a group of database developers. I had been told by Access and other database developers that the data I had were impossible (too big) to handle. But the guys I now work with on these issues did it with no problem. They're based in Louisiana and I am in Florida so we do this remotely. They are very good. Every time I start a new project, I work with them from the start to develop a database for my needs. They build in querying, etc... whatever you need. And can export for any packages you'd like... SAS, R, Matlab. Their products always look nice and are super user friendly. I say this might not be the direction you want to go because it's not software to buy and start using -- they develop products to fit your needs, so it costs. If you are interested, you can contact Craig Conzelmann [email protected]

************************

Does your current Excel implementation use VBA coding? It is quite powerful programming language (even though it has annoying idiosyncrasies). You can make Excel sit up and beg, so to speak. I once wrote a 16,000 line application in Excel VBA that presented a very detailed user interface and did major calculations. Among the other tools it provides, you can make popup input boxes with drop-down lists, input slots, check boxes, radio buttons, etc. They can be pre-populated when the data are available and can collect new data (with error checking) that is more convenient than typing directly into spread-sheet cells. The application can be entirely self-contained within a single spread sheet, or can talk to multiple sheets in the same workbook, or to other spreadsheets on the same computer (or, for all I know, over a network), and of course to an outside relational database.

Another thing you should consider is to use the free open-source versions in Open Office. There you can get very high functionality without being tied to a possibly capricious vendor.

******************************
This is a bit of a non-answer to your question, but you can do a lot
of what you describe below in Excel.

-pull-down menus to reduce data entry errors

I can't think of any way to directly do this in Excel, but you could
set up a second column that checks if the column you are entering into
has predefined names using nested if functions. See attached excel
example. If you have more than about 6 nestings, you need to start
another column. Sometimes once the formula has been used, it's
helpful to copy it and paste special it as values to keep it from
using up a lot of memory (I usually keep a copy of the formula in a
comment in the cell header)

-ability to use the same information for multiple specimens (batch entry)
Copy and paste (control c + control p) works well for some of this.
You can also sort by whatever feature it is that needs to be done in
batches to make it easier (if you have a particular order that needs
to be retained, just add a sort order column that you can sort by to
get your original order)

-ability to replace temporary names (sp. X, sp. Y) with the species
name once specimens are identified
Find and replace. In the sample excel sheet, if you hit control + h,
you can find something and replace it. This works for correcting
typos as well. You can also recode data this way (e.g., find all
instances of Apis melifera

-ability to query the system
Pivot tables are awesome for this! Under the insert tab, there is a
pivot table option (insert menu on older versions of excel).
Highlight the data you want, and have it create a pivot table in a new
sheet. You can then manipulate the data in a variety of ways. It is
really easy to get summary statistics (in the sample excel worksheet,
I've got it set up to compute the average of counts by bee species.
Note that if you needed different counts by year, you could have the
years set up as columns. Where Excel beats Access (to my limited
knowledge of Access), is if you needed the data by year and by
something else, you can do that (whereas I could only get one thing to
work in a crosstabs query in access, but my knowledge of Access is
limited as I find it less intuitive and far more restrictive than
Excel).

The pivot table is also a helpful way to find typos in the data entry.

There are a lot of options for the pivot table, and some of them take
some finding (when in doubt, right click, sometimes knowing what to
rightclick is a challenge)

You can also join data from different Excel workbooks using vlookup functions.

I think I have tutorial notes on pivot tables and on lookup functions
that I used for an informal brownbag lying around somewhere that I can
look for, if this is of interest to you.

So, not a database software, but tips on how to make Excel function
more as a database. I'd be happy to provide more input on the subject
if desired.

*******************************
A colleague of mine has forwarded your message to ECOLOG-L to me for the reason that we are working on the type of software you have mentioned. You can take a closer look at software features at <http://www.earthcape.com>http://www.earthcape.com.

We are currently only starting up and working with selected projects (e.g. <http://www.safeproject.net>http://www.safeproject.net) to build their databases and bring them online.

If you would like to consider working with us I would be happy to answer any questions and share more details on our products and services.

Looking forward to hearing from you,

<http://www.earthcape.com>http://www.earthcape.com
http://www.twitter.com/eartcape
<http://www.facebook.com/earthcape>http://www.facebook.com/earthcape


************************************

It sounds like MS Access might meet your needs. It is fairly accessible (comes with MS office packages) so your cost would be low (hopefully none). With some VBA (Visual Basic) you can customize your front end (forms for data entry and queries) to a great extent. Being a MS product it interfaces easily with excel, should feel more familiar and there are R packages to move data in and out as well.

I would caution that Access isn't ideal for all databases, you'll find database managers that hate it. In my limited experience this is mainly when you have more advanced needs or want to use something else for front end, like a website that could be queried dynamically or spatial data better displayed in ArcMap.

Other things you might want to look into (open source):
mySQL
PostgreSQL

All three of these are used widely and there is a strong DB community/resources for them.

***********************************

David, Excel and R can do all the functions you require. As an ecological analyst, the main wish that I would have is that the data are kept in a single spreadsheet, which decreases data management time just a bit by lessening the chance that the spreadsheets are differentially formatted. Also, the number of records is not large compared to some ecological files I have worked with.

Without fully understanding your concerns, I would recommend restricting variable values in excel for data entry. The batch entry is kind of context-dependent... you could use either R or excel to fill in records. I would probably use R. Use excel for find/replace. For querying, use either pivot tables in excel, or embark on the journey of R through using hadley wickam's plyr package to rearrange data to your heart's content. Best wishes, and thanks for tending the listserve,

*********************************

It sounds like you do need a database! For other suggestions see an earlier similar discussion, but if you're prepared to pay Filemaker Pro will do everything you want, is very easy to set up and use, and
is fully compatible with Excel import/export if required.

For the batch entry you'd probably have to first enter the records and then fill the common values afterwards (trivial to do, and you could set up a button to automate it if required). The other features
you need are inherent to FM.

************************************

Did you get a response to your inquiry yet? This is not a complicated
task to structure a relational database to hold this data. As part of that
process one could create some queries and export functions to structure the
data for other applications like R. You might even think of it as four
separate tasks: 1) designing the database structure; 2) building the
database using particular relational database software (MS access, Oracle,
etc.); 3) populating the database with your data; and 4) developing
additional tools within the database to extract standard information about
your data. There will always be a fifth step, though, of developing queries
to extract data and format data for specific questions.

        The initial design is very important for ease of extracting data
later.

***************************************

Specify 6 looks pretty good. I like that it's open source and already set up for geographic data.

The limits on Access are number of objects, not number of entries. I've worked with tables with hundreds of thousands of records with no problems. That said, if the support and tutorials for Specify 6 looks better, that looks like a good option.

Thanks for replying. I had not heard of Specify 6 and will be checking it out.

Reply via email to