Thanks Marc, but see below!

On 2016-02-08 19:26, Marc Schwartz wrote:

On Feb 8, 2016, at 11:26 AM, Göran Broström <goran.brost...@umu.se> wrote:

I have a data frame with dates as integers:

summary(persons[, c("foddat", "doddat")])
     foddat             doddat
Min.   :16790000   Min.   :18000000
1st Qu.:18760904   1st Qu.:18810924
Median :19030426   Median :19091227
Mean   :18946659   Mean   :19027233
3rd Qu.:19220911   3rd Qu.:19310526
Max.   :19660124   Max.   :19691228
NA's   :624        NA's   :207570

After converting the dates to Date format ('as.Date') I get:

summary(per[, c("foddat", "doddat")])
    foddat               doddat
Min.   :1679-07-01   Min.   :1800-01-26
1st Qu.:1876-09-04   1st Qu.:1881-09-24
Median :1903-04-26   Median :1909-12-27
Mean   :1895-02-04   Mean   :1903-02-22
3rd Qu.:1922-09-10   3rd Qu.:1931-05-26
Max.   :1966-01-24   Max.   :1969-12-28

My question is: Why are the numbers of missing values not printed in the second 
case? 'is.na' gives the correct (same) numbers.

Can I somehow force 'summary' to print NA's? I found no clues in the 
documentation.


Hi,

Two things:

1. We are going to need to see the exact call to as.Date() that you used. as.Date() will 
take a numeric vector as input, but the presumption is that the number represents the 
number of days since an origin, which needs to be specified explicitly. If you coerced 
the numeric vector to character first, presuming a "%Y%m%d" format, then you 
need to be cautious about how that is done and the result.

2. Your second call is to a data frame called 'per', which may or may not have 
the same content as 'persons' in your first call.


If I do the following, taking some of your numeric values from above:

x <- c(18000000, 18810924, 19091227, 19027233, 19310526, 19691228, NA)

DF <- data.frame(x)

summary(DF)
        x
  Min.   :18000000
  1st Qu.:18865001
  Median :19059230
  Mean   :18988523
  3rd Qu.:19255701
  Max.   :19691228
  NA's   :1

as.character(DF$x)
[1] "1.8e+07"  "18810924" "19091227" "19027233" "19310526" "19691228"
[7] NA

DF$x.Date <- as.Date(as.character(DF$x), format = "%Y%m%d")

DF
          x     x.Date
1 18000000       <NA>
2 18810924 1881-09-24
3 19091227 1909-12-27
4 19027233       <NA>
5 19310526 1931-05-26
6 19691228 1969-12-28
7       NA       <NA>

summary(DF)
        x                x.Date
  Min.   :18000000   Min.   :1881-09-24
  1st Qu.:18865001   1st Qu.:1902-12-04
  Median :19059230   Median :1920-09-10
  Mean   :18988523   Mean   :1923-04-12
  3rd Qu.:19255701   3rd Qu.:1941-01-17
  Max.   :19691228   Max.   :1969-12-28
  NA's   :1          NA's   :3

But:

> summary(DF[, "x.Date", drop = FALSE])
     x.Date
 Min.   :1881-09-24
 1st Qu.:1902-12-04
 Median :1920-09-10
 Mean   :1923-04-12
 3rd Qu.:1941-01-17
 Max.   :1969-12-28

No NA's. But again:

> summary(DF[, "x.Date"])
Min. 1st Qu. Median Mean 3rd Qu. Max. "1881-09-24" "1902-12-04" "1920-09-10" "1923-04-12" "1941-01-17" "1969-12-28"
        NA's
         "3"


So summary does support the reporting of NA's for Dates, using summary.Date().

Not always, as it seems. Strange. (The 'persons' vs. 'per' is a red herring.)

Göran Broström


Regards,

Marc Schwartz


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to