Re: [R] How do I paste double quotes arround a character string?

Philip James Smith Thu, 03 Jul 2008 06:37:42 -0700

R Community:

At the risk of getting my hands slapped by posting "too much" on theforum, I've described the strategy for reading only certain portions ofhuge .csv files below.

I think that this very well could be of interest to others... I'm surethat I'm not alone in the need to read only certain variables (ie,columns) from VERY huge .csv files.

It has been suggested by Charles Berry, Ted Harding, and Brian Riply touse the unix "cut" command along with the R pipe() function. THeiradvice has been invaluable.

As I've written the code so farm I'm finding that the "cut" command isnot reading the file properly... or at least in the manner that I'mexpecting.


Here was my strategy:

*STEP 1. read the whole huge file --- (almost impossible! even with avery good computer!)STEP 2. use the pipe and cut commands to read only the desired columnsof the fileSTEP 3. compare results by tabulating a variable from the whole filewith the file obtained in (2)*


I found that the comparision gave different tabulations!  :-(

I've provided my code below. I'd be quite grateful for suggestions onhow to fix this.


My sincere thanks to all who have or will provide guidance on this problem.

Phil Smith
Duluth, GA

*## STEP 1: read the whole huge file*
##
## read the whole file
##
   your.file    <-    c("//home//philipsmith//mydata.csv")
   dat        <-    read.csv( file = your.file )

##
## read the names from the 1st line of the whole file
## that line contains all of the variable names
##

col.namz <- c( scan( your.file , what=character(0), nlines=1 ,sep=",") )


##
## check to see whether  all of the column names from the whole file
## are the same as in col.namz
##
    all( col.namz == names(dat))

##
## they are!! :-)
##

*## STEP 2: use the pipe and cut commands to read only the desiredcolumns of the file*

##
## designate which variable names are to be read
## using the unix command "cut" and the function pipe()
##
   colz    <-    c("ESTIAP07" )

##
## find the column numbers in the whole file that correspond to
## the variables designated to be read by the unix command
## and specified in the colz vector
##

   col.pos     <-     match( colz , col.namz , nomatch=0 )
   ##
   ## the following line is commented out,
   ## since for this example the number of designated variables
   ## by colz is only 1 variable
   ##
   ## col.pos        <-    paste( col.pos , collapse=',' )

##
## character string of file name for unix read with cut function
##
   fn        <-    c("/home/philipsmith/mydata.csv")

##
## create a character vector of the unix command
##

unix.cmd <- paste( "cut -d, -f" , col.pos , " " , fn , sep ='' )


##
## read the designated columns, only, from the whole file
## using pipe() and the unix command cut
##
   gnu.dat        <-    read.csv( pipe ( description=unix.cmd ) )

*## STEP 3. compare results by tabulating a variable from the whole filewith the file obtained in (2)*

##
## tabulate the designated variable from the whole file
##
   table( dat$ESTIAP07 )

##
## tabulate the designated variable from the file
## that has the designated columns, only
##
   table( gnu.dat$ESTIAP07 )

> table( dat$ESTIAP07 )

1 2 4 5 6 7 8 10 11 12 13 14 16 17 18 19 20 2224 25340 278 304 319 334 295 405 342 519 474 413 476 511 322 517 393 364 377447 42527 28 29 30 31 34 35 36 37 38 40 41 44 46 47 49 50 5152 53462 382 368 502 385 494 454 497 484 385 360 419 355 466 461 369 372 431384 33154 55 56 57 58 59 60 61 62 63 64 65 66 68 69 72 73 7475 76478 468 348 323 363 287 322 364 317 363 423 337 409 312 370 360 348 309244 300

77  79  80 773
307 454 445 340
>
> ##
> ## tabulate the designated variable from the file
> ## that has the designated columns, only
> ##
> table( gnu.dat$ESTIAP07 )

1 2 3 4 5 6 7 8 10 11 12 13 14 16 17 18 19 2022 24342 291 1 308 319 334 295 405 341 518 471 413 476 511 322 517 393 363377 44625 27 28 29 30 31 34 35 36 37 38 40 41 44 46 47 49 5051 52425 461 382 368 502 385 494 454 496 483 385 360 419 354 466 461 369 371431 38453 54 55 56 57 58 59 60 61 62 63 64 65 66 68 69 72 7374 75331 478 467 348 322 363 287 320 364 317 363 423 337 408 312 368 360 347309 243

76  77  79  80 157 773
300 307 454 445   1 340
> ?pipe
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How do I paste double quotes arround a character string?

Reply via email to