Thanks to all.
Steven Yen
At 06:18 PM 9/30/2014, Nordlund, Dan (DSHS/RDA) wrote:
> -----Original Message-----
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of Steven Yen
> Sent: Tuesday, September 30, 2014 2:04 PM
> To: r-help
> Subject: [R] Reading text file with fortran format
>
> Hello
>
> I read data with fortran format:
> mydata<-read.fortran('foo.txt',
> c("4F10.4","F8.3","3F3.0","20F2.0"))
> colnames(mydata)<-c("q1","q2","q3","q4","income","hhsize",
> "weekend","dietk","quart1","quart2","quart3","male","age35",
> "age50","age65","midwest","south","west","nonmetro",
> "suburb","black","asian","other","hispan","hhtype1",
> "hhtype2","hhtype3","emp_stat")
> dstat(mydata,digits=6)
>
> I produced the following sample statistics for the first 4
> variables (q1,q2,q3,q4):
>
> Mean Std.dev Min Max Obs
> q1 0.000923 0.002509 0 0.035245 5649
> q2 0.000698 0.001681 0 0.038330 5649
> q3 0.000766 0.002138 0 0.040100 5649
> q4 0.000373 0.001140 0 0.026374 5649
>
> The correct sample statistics are:
> Variable| Mean Std.Dev. Minimum Maximum
> --------+----------------------------------------------------
> Q1| 9.227632 25.09311 0.0 352.4508
> Q2| 6.983078 16.80984 0.0 383.2995
> Q3| 7.657381 21.38337 0.0 400.9950
> Q4| 3.727952 11.40446 0.0 263.7398
> INCOME| 16.01603 13.70296 0.0 100.0
> HHSIZE| 2.586475 1.464282 1.0 16.0
>
> In other words, values for q1-q4 were scaled down by a factor of
> 10,000.
> My raw data look like (with proper format)
>
> 0.0000 0.0000 0.0000 0.0000 48.108...
> 0.0000 0.0000 0.0000 0.0000 11.640...
> 35.3450 0.0000 95.7656 0.0000 4.667...
> 0.0000 0.0000 0.0000 0.0000 9.000...
> 84.0000 4.8038 0.0000 3.1886 2.923...
> 0.0000 0.0000 0.0000 1.1636 10.000...
> 0.0000 10.7818 109.7884 0.0000 17.000...
> 0.0000 7.9528 0.0000 4.7829 35.000...
>
> True that the data here are space delimited. But I need to read data
> elsewhere where data are not space delimited.
>
> Any idea/suggestion would be appreciated.
>
The read.fortran function appears to work differently from how
FORTRAN would read the data if there are already decimals points in
the numbers. If memory serves, FORTRAN ignores the decimal portion
of the format if it finds a decimal in what it reads. The
read.fortran function appears to read the number 'as is' and then
multiplies by 10^-d, where d is the number of decimal places in the
format. Since you have decimals specified, you should specify the
format with 0 decimal places, i.e.
c("4F10.0","F8.0","3F3.0","20F2.0"))
hope this is helpful,
Dan
Daniel J. Nordlund, PhD
Research and Data Analysis Division
Services & Enterprise Support Administration
Washington State Department of Social and Health Services
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.