On Thu, 29 Sep 2022, Nick Wray writes: > ---------- Forwarded message --------- > From: Nick Wray <nickmw...@gmail.com> > Date: Thu, 29 Sept 2022 at 15:32 > Subject: Re: [R] Reading very large text files into R > To: Ben Tupper <btup...@bigelow.org> > > > Hi Ben > Beneath is an example of the text (also in an attachment) and it's the "B", > of which there are quite a few scattered throughout the text doc which > causes the reading in error message (btw I don't need the "RAIN" column or > the 1's after it or the last four elements). I have also attached the > snippet as text file > > 1980-01-01 10:00, 225620, RAIN, 1, 1, WAHRAIN, 5091, 1001, 0, , 9, 0, , , > 1980-01-01 10:00, 226918, RAIN, 1, 1, WAHRAIN, 5124, 1001, 0, , 9, 0, , , > 1980-01-01 10:00, 228562, RAIN, 1, 1, WAHRAIN, 491, 1001, 0, , 9, 0, , , > 1980-01-01 10:00, 231581, RAIN, 1, 1, WAHRAIN, 5213, 1001, 0, , 9, 0, , , > 1980-01-01 10:00, 232671, RAIN, 1, 1, WAHRAIN, 487, 1001, 0, , 9, 0, , , > 1980-01-01 10:00, 232913, RAIN, 1, 1, WAHRAIN, 5243, 1001, 0, , 9, 0, , , > 1980-01-01 10:00, 234362, RAIN, 1, 1, WAHRAIN, 5265, 1001, 0, , 10009, 0, , > , B > 1980-01-01 10:00, 234682, RAIN, 1, 1, WAHRAIN, 5271, 1001, 0, , 9, 0, , , > 1980-01-01 10:00, 235389, RAIN, 1, 1, WAHRAIN, 5279, 1001, 0, , 9, 0, , , > 1980-01-01 10:00, 236466, RAIN, 1, 1, WAHRAIN, 497, 1001, 0, , 9, 0, , , > 1980-01-01 10:00, 243350, RAIN, 1, 1, SREW, 484, 1001, 0, , 9, 0, , , > 1980-01-01 10:00, 243350, RAIN, 1, 1, WAHRAIN, 484, 1001, 0, 0, 9, 9, , , > > Thanks Nick > > On Thu, 29 Sept 2022 at 15:12, Ben Tupper <btup...@bigelow.org> wrote: > >> Hi Nick, >> >> It's hard to know without seeing at least a snippet of the data. >> Could you do the following and paste the result into a plain text >> email? If you don't set your email client to plain text (from rich >> text or html) then we are apt to see a jumble of output on our email >> clients. >> >> >> ## start >> x <- readLines(filename, n = 20) >> cat(x, sep = "\n") >> ## end >> >> Cheers, >> Ben >> >> >> On Thu, Sep 29, 2022 at 9:54 AM Nick Wray <nickmw...@gmail.com> wrote: >> > >> > Hello I may be offending the R purists with this question but it is >> > linked to R, as will become clear. I have very large data sets from the >> UK >> > Met Office in notepad form. Unfortunately, I can’t read them directly >> > into R because, for some reason, although most lines in the text doc >> > consist of 15 elements, every so often there is a sixteenth one and R >> > doesn’t like this and gives me an error message because it has assumed >> that >> > every line has 15 elements and doesn’t like finding one with more. I >> have >> > tried playing around with the text document, inserting an extra element >> > into the top line etc, but to no avail. >> > >> > Also unfortunately you need access permission from the Met Office to get >> > the files in question so this link probably won’t work: >> > >> > https://catalogue.ceda.ac.uk/uuid/bbd6916225e7475514e17fdbf11141c1 >> > >> > So what I have done is simply to copy and paste the text docs into excel >> > csv and then read them in, which is time-consuming but works. However >> the >> > later datasets are over the excel limit of 1048576 lines. I can paste in >> > the first 1048576 lines but then trying to isolate the remainder of the >> > text doc to paste it into a second csv doc is proving v difficult – the >> > only way I have found is to scroll down by hand and that’s taking ages. >> I >> > cannot find another way of editing the notepad text doc to get rid of the >> > part which I have already copied and pasted. >> > >> > Can anyone help with a)ideally being able to simply read the text tables >> > into R or b)suggest a way of editing out the bits of the text file I >> have >> > already pasted in without laborious scrolling? >> > >> > Thanks Nick Wray >> >
[...] >> >> -- >> Ben Tupper (he/him) >> Bigelow Laboratory for Ocean Science >> East Boothbay, Maine >> http://www.bigelow.org/ >> https://eco.bigelow.org >> > Maybe I have missed it, but could you please show how you tried to read the table? When I use your file with read.table("sample text.txt", header = FALSE, sep = ",") I get ## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 ## 1 1980-01-01 10:00 225620 RAIN 1 1 WAHRAIN 5091 1001 0 NA 9 0 NA NA ## 2 1980-01-01 10:00 226918 RAIN 1 1 WAHRAIN 5124 1001 0 NA 9 0 NA NA ## ## ..... ## 7 1980-01-01 10:00 234362 RAIN 1 1 WAHRAIN 5265 1001 0 NA 10009 0 NA NA B ## 8 1980-01-01 10:00 234682 RAIN 1 1 WAHRAIN 5271 1001 0 NA 9 0 NA NA -- Enrico Schumann Lucerne, Switzerland http://enricoschumann.net ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.