Re: [R] duplicate values

Prof Brian Ripley Sun, 16 Nov 2008 10:44:26 -0800

Is the question 'duplicated next to each other' or 'duplicated anywherelater'? I read it as the latter, so would use


dup <- duplicated(x$dt)


or

dup <- duplicated(x[c("Date", "time")]

Also, be very careful as Date-time values like this can be duplicated andrefer to different times on days when DST ends. E.g. there are both


"2008-10-26 02:30:00 CEST"
"2008-10-26 02:30:00 CET"

in the timezone of Germany (at least with the names my system gives me inEnglish).


On Sun, 16 Nov 2008, jim holtman wrote:

This should do it for you:

x <- read.table(textConnection(         "Date time                      
Temperature

+ 1        2008-6-1 00:00:00      5
+ 2        2008-6-1 02:00:00      5
+ 3        2008-6-1 03:00:00      6
+ 4        2008-6-1 03:00:00      0
+ 5        2008-6-1 04:00:00      6
+ 6        2008-6-1 04:00:00      0
+ 7        2008-6-1 05:00:00      7
+ 8        2008-6-1 06:00:00      7"), header=TRUE)

closeAllConnections()
# create datetime
x$dt <- as.POSIXct(paste(x$Date, x$time))
# create list of duplicate values next to each other
dup <- c(FALSE, diff(x$dt) == 0)
# remove
x[!dup,]

     Date     time Temperature                  dt
1 2008-6-1 00:00:00           5 2008-06-01 00:00:00
2 2008-6-1 02:00:00           5 2008-06-01 02:00:00
3 2008-6-1 03:00:00           6 2008-06-01 03:00:00
5 2008-6-1 04:00:00           6 2008-06-01 04:00:00
7 2008-6-1 05:00:00           7 2008-06-01 05:00:00
8 2008-6-1 06:00:00           7 2008-06-01 06:00:00


On Sun, Nov 16, 2008 at 1:10 PM, Antje Nöthlich <[EMAIL PROTECTED]> wrote:

Hei R Users,

i have the following dataframe:

         Datetime                      Temperature             and many more 
collumns
1        2008-6-1 00:00:00      5
2        2008-6-1 02:00:00      5
3        2008-6-1 03:00:00      6
4        2008-6-1 03:00:00      0
5        2008-6-1 04:00:00      6
6        2008-6-1 04:00:00      0
7        2008-6-1 05:00:00      7
8        2008-6-1 06:00:00      7
.            .                                .
.            .                                .
.            .                                .
3000  2008-8-31 00:00:00    3


the problem is that row 3 & 4 and row 5 & 6 have the same "Datetime" value but they 
differ in the values of the "Temperature" column.
Now for the whole dataframe i would like to delete rows that have the same 
"Datetime" value as the prior row.
I have tried unique(dataframe), but it does not work here because the rows are 
no real duplicates of each other.
thanks in advance for your help!

Antje



--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?


--
Brian D. Ripley,                  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] duplicate values

Reply via email to