On Thu, May 7, 2020 at 4:16 PM Thomas Petzoldt <t...@simecol.de> wrote: > > On 07.05.2020 at 11:19 Deepayan Sarkar wrote: > > On Thu, May 7, 2020 at 12:58 AM Thomas Petzoldt <t...@simecol.de> wrote: > >> > >> Sorry if I'm joining a little bit late. > >> > >> I've put some related links and scripts together a few weeks ago. Then I > >> stopped with this, because there is so much. > >> > >> The data format employed by John Hopkins CSSE was sort of a big surprise > >> to me. > > > > Why? I find it quite convenient to drop the first few columns and > > extract the data as a matrix (using data.matrix()). > > > > -Deepayan > > Many thanks for the hint to use data.matrix > > My aim was not to say that it is difficult, especially as R has all the > tools for data mangling. > > My surprise was that "wide tables" and non-ISO dates as column names are > not the "data base way" that we in general teach to our students
Well, I am all for long format data when it makes sense, but I would disagree that that is always the "right approach". In the case of regular multiple time series, as in this context, a matrix-like structure seems much more natural (and nicely handled by ts() in R), and I wouldn't even bother reshaping the data in the first place. See, for example, https://github.com/deepayan/deepayan.github.io/blob/master/covid-19/deaths.rmd and https://deepayan.github.io/covid-19/deaths.html -Deepayan > With reshape2::melt or tidyr::gather resp. pivot_longer, conversion is > quite easy, regardless if one wants to use tidyverse or not, see example > below. > > Again, thanks, Thomas > > > library("dplyr") > library("readr") > library("tidyr") > > file <- > "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv" > > dat <- read_delim(file, delim=",") > names(dat)[1:2] <- c("Province_State", "Country_Region") > dat2 <- > dat %>% > ## summarize Country/Region duplicates > group_by(Country_Region) %>% summarise_at(vars(-(1:4)), sum) %>% > ## make it a long table > pivot_longer(cols = -Country_Region, names_to = "time") %>% > ## convert to ISO 8601 date > mutate(time = as.POSIXct(time, format="%m/%e/%y")) > > > > > > >> An opposite approach was taken in Germany, that organized it as a > >> big JSON trees. > >> > >> Fortunately, both can be "tidied" with R, and represent good didactic > >> examples for our students. > >> > >> Here yet another repo linking to the data: > >> > >> https://github.com/tpetzoldt/covid > >> > >> > >> Thomas > >> > >> > >> On 04.05.2020 at 20:48 James Spottiswoode wrote: > >>> Sure. COVID-19 Data Repository by the Center for Systems Science and > >>> Engineering (CSSE) at Johns Hopkins University is available here: > >>> > >>> https://github.com/CSSEGISandData/COVID-19 > >>> > >>> All in csv fiormat. > >>> > >>> > >>>> On May 4, 2020, at 11:31 AM, Bernard McGarvey > >>>> <mcgarvey.bern...@comcast.net> wrote: > >>>> > >>>> Just curious does anyone know of a website that has data available in a > >>>> format that R can download and analyze? > >>>> > >>>> Thanks > >>>> > >>>> > >>>> Bernard McGarvey > >>>> > >>>> > >>>> Director, Fort Myers Beach Lions Foundation, Inc. > >>>> > >>>> > >>>> Retired (Lilly Engineering Fellow). > >>>> > >>>> ______________________________________________ > >>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>> PLEASE do read the posting guide > >>>> http://www.R-project.org/posting-guide.html > >>>> and provide commented, minimal, self-contained, reproducible code. > >>>> > >>> > >>> James Spottiswoode > >>> Applied Mathematics & Statistics > >>> (310) 270 6220 > >>> jamesspottiswoode Skype > >>> ja...@jsasoc.com > >>> > > -- > Dr. Thomas Petzoldt > senior scientist > > Technische Universitaet Dresden > Faculty of Environmental Sciences > Institute of Hydrobiology > 01062 Dresden, Germany > > https://tu-dresden.de/Members/thomas.petzoldt ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.