On 23/09/2023 6:55 p.m., Parkhurst, David wrote:
With help from several people, I used file.choose() to get my file name, and
read.csv() to read in the file as KurtzData. Then when I print KurtzData, the
last several lines look like this:
39 5/31/22 16.0 341 1.75525 0.0201 0.0214 7.00
40 6/28/22 2:00 PM 0.0 215 0.67950 0.0156 0.0294 NA
41 7/25/22 11:00 AM 11.9 1943.5 NA NA 0.0500 7.80
42 8/31/22 0 220.5 NA NA 0.0700 30.50
43 9/28/22 0.067 10.9 NA NA 0.0700 10.20
44 10/26/22 0.086 237 NA NA 0.1550 45.00
45 1/12/23 1:00 PM 36.26 24196 NA NA 0.7500 283.50
46 2/14/23 1:00 PM 20.71 55 NA NA 0.0500 2.40
47 NA NA NA NA
48 NA NA NA NA
49 NA NA NA NA
Then the NA�s go down to one numbered 973. Where did those extras likely come
from, and how do I get rid of them? I assume I need to get rid of all the
lines after #46, to do calculations and graphics, no?
Many Excel spreadsheets have a lot of garbage outside the range of the
data. Sometimes it is visible if you know where to look, sometimes it
is blank cells. Perhaps at some point you (or the file creator)
accidentally entered a number in line 973. Then Excel will think the
sheet has 973 lines. I don't know the best way to tell Excel that those
lines are pure garbage.
That's why old fogies like me recommend that you do as little as
possible in Excel. Get the data into a reliable form as soon as possible.
Once it is an R dataframe, you can delete lines using negative indices.
In this case use
fixed <- KurtzData[-(47:nrow(KurtzData)), ]
which will create a new dataframe with only rows 1 to 46.
Duncan Murdoch
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.