from:"Alexandra Catena"

[R] How to download and unzip data in a loop

2015-02-04 Thread Alexandra Catena

Hi All,

I need to loop through and download the past 10 years of met data to a
temporary directory.  I then need to unzip it and place it into another
directory.


year = (2005:2015)

for (i in year)
  tmpdir = tempdir()
  file[i] = file.path(tmpdir, sprintf('724927-23285-%4i.gz', i))
  url = sprintf('
ftp://ftp.ncdc.noaa.gov/pub/data/noaa/%4i/724927-23285-%4i.gz', i, i)
  #file = basename(url)
  download.file(url, file[i])
  files = dir(tmpdir, '*.gz', full.names=FALSE)
  read.table(gzfile('files'))



'file' returns 2015 indices with "/tmp/RtmpKvB4Wz/724927-23285-2015.gz"
next to 2015. and files returns 724927-23285-2015.gz.  However, when I try
to unzip the gz file using the last line, it says it cannot open the
connection and the probable reason is that there is no such file or
directory.



Thanks,
Alexandra

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to download and unzip data in a loop

2015-02-05 Thread Alexandra Catena

Thank you guys for the response.

I'm trying to download the last ten years of meteorology data from a
weather station in Livermore from the URL:
ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2015/724927-23285-2015.gz
The Livermore station code is 724927-23285.  If I wanted to download data
from 2005, the URL would be:
ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2005/724927-23285-2005.gz

Once I download the data into a temporary file, I want to unzip it and
store it into another directory where I can access it.

Also, why are there 2015 indices instead of just 10 when I'm only looping
through 2005:2015?

Thanks,
Alexandra

On Thu, Feb 5, 2015 at 3:11 AM, Jon Skoien 
wrote:

> In addition to following Jim's suggestion, you should probably also use
> full.names = TRUE, otherwise you will try to open a connection to files in
> your current directory, not in tmpdir.
> Another thing is that the unzipped files appear irregular with respect to
> columns, so read.table might not work too well.
>
> Jon
>
>
> On 2/5/2015 11:30 AM, jim holtman wrote:
>
>> try taking the quotes off of 'files'
>>
>>
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>> On Wed, Feb 4, 2015 at 5:24 PM, Alexandra Catena 
>> wrote:
>>
>>  Hi All,
>>>
>>> I need to loop through and download the past 10 years of met data to a
>>> temporary directory.  I then need to unzip it and place it into another
>>> directory.
>>>
>>>
>>> year = (2005:2015)
>>>
>>> for (i in year)
>>>tmpdir = tempdir()
>>>file[i] = file.path(tmpdir, sprintf('724927-23285-%4i.gz', i))
>>>url = sprintf('
>>> ftp://ftp.ncdc.noaa.gov/pub/data/noaa/%4i/724927-23285-%4i.gz', i, i)
>>>#file = basename(url)
>>>download.file(url, file[i])
>>>files = dir(tmpdir, '*.gz', full.names=FALSE)
>>>read.table(gzfile('files'))
>>>
>>>
>>>
>>> 'file' returns 2015 indices with "/tmp/RtmpKvB4Wz/724927-23285-2015.gz"
>>> next to 2015. and files returns 724927-23285-2015.gz.  However, when I
>>> try
>>> to unzip the gz file using the last line, it says it cannot open the
>>> connection and the probable reason is that there is no such file or
>>> directory.
>>>
>>>
>>>
>>> Thanks,
>>> Alexandra
>>>
>>>  [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
> --
> Jon Olav Skøien
> Joint Research Centre - European Commission
> Institute for Environment and Sustainability (IES)
> Climate Risk Management Unit
>
> Via Fermi 2749, TP 100-01,  I-21027 Ispra (VA), ITALY
>
> jon.sko...@jrc.ec.europa.eu
> Tel:  +39 0332 789205
>
> Disclaimer: Views expressed in this email are those of the individual and
> do not necessarily represent official views of the European Commission.
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to unzip a .gz file

2015-02-10 Thread Alexandra Catena

Hello,

Can someone help me with unzipping a .gz file.  I used:

readLines(gzfile('/home/file.gz'))


I also found that I could use gunzip, but after trying to install it, it
says:

 "package ‘gunzip’ is not available (for R version 2.15.1)"


Thanks,
Alexandra

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help with looping

2015-02-17 Thread Alexandra Catena

Hi,

I need help with a for loop and printing data.  I want to loop through a
few years and print the data from each year stacked on top of each other.
For example,

for (i in 2000:2003){
#script for downloading each year
Data = readLines(sprintf('file/%4i,i))
}

It only prints out the data from the last year.  Also, I tried

Data[i] =  readLines(sprintf('file/%4i,i))

but it says:

"number of items to replace is not a multiple of replacement length"

How do I get it to not replace each year of data? I have R version 2.15.1

Thanks,
Alexandra

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Replacing 9999 and 999 values with NA

2015-02-20 Thread Alexandra Catena

Hello All,

I have a data frame of two columns for wind.  The first column is for wind
speed and the second wind direction.  I'm trying to replace the  values
in the first column and the 999 values in the second column with NA.  I
tried to use the function ltdl.fix.df but it doesn't seem to do anything.

> ltdl.fix.df(windMV, zero2na = FALSE, coded = 999)

  n = 9432 by p = 4 matrix checked, 0 NA(s) present

  0 factor variable(s) present

  5675 value(s) coded 999 set to NA

  0 -ve value(s) set to +ve half the negative value


I have R version 3.1.1

Thanks,
Alexandra

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Error with using windRose function from the open air package

2015-02-23 Thread Alexandra Catena

Hello All,

I have a data frame called windSFO of four columns, wind speed, wind
direction, station number, and date (mmdd).  I downloaded the gz
data from a site online and then unzipped it using readLines. I then
concatenated these four columns from the unzipped data into a
dataframe using cbind.

windSFO = data.frame(cbind(ws,wd,stn,yearSite))

Here are the first four rows as an example:

   ws  wd  stn   yearSite

1  36 290 724940-23234 20090101

2  77 280 724940-23234 20090101

3  72 290 724940-23234 20090101

4  46 290 724940-23234 20090101


I'm trying to make a wind rose using the windRose function but I keep
getting an error that I don't understand. I type in:

windRose(windSFO,ws='ws',wd='wd')

I then get the error:

Error in Summary.factor(c(27L, 35L, 34L, 29L, 28L, 25L, 25L, 24L, 24L,  :
  max not meaningful for factors
In addition: Warning messages:
1: In Ops.factor(mydata[[wd]], 10) : %% not meaningful for factors
2: In Ops.factor(mydata[[wd]], angle) : / not meaningful for factors

Can anyone tell me what this means/what I'm doing wrong?

Also, I have R version 3.1.1

Thank you!
Alexandra

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Plotting using tapply function output

2015-03-30 Thread Alexandra Catena

Hello,

I am trying to plot the hourly standard deviation of wind speeds from
13 different measured locations over many years. I imported the data
using readLines and into a dataframe called finalData. Using tapply, I
determined the standard deviation of the windspeed (ws) for each hour
(hour) from every location (stn) using this command line:

statHour = tapply(finalData$ws,list(finalData$stn,finalData$hour),sd)

I want to plot the standard deviation for each hour of the day, with
hours as the x-axis and the standard deviation for the y-axis, and
each station as a different color.  I've managed to get a boxplot of
this, but ideally, I'd like a scatter plot to determine the variations
between each instrument throughout the day.  The boxplot command is
this:

boxplot(statHour, names=colnames(statHour),xlab='Hour of the
Day',ylab='Standard Deviation of Wind Speed')

I also tried to make a dataframe of the tapply output but it ends up
using the hours as the column names instead of putting it into the
dataframe.  Please help!!

I have R version 3.1.1

Thanks a lot,
Alexandra

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Finding values in a dataframe at a specified hour

2015-04-10 Thread Alexandra Catena

Hello,

I have a large dataframe (windHW) of wind speeds (ws) at each hour
from many days over a set of years.  Some of these values are
obviously wrong (600 m/s) and I want to get rid of all the values that
are larger than 5*sigma for each hour.  The 5*sigma (variable name
sigma5) values are located in different dataframes for each season,
with each dataframe titled as a season.  For example, in the
dataframe, spring, the 5*sigma value is 79.6 m/s for hour 1.

So my question is as follows: how can I get it so that the code will
be able to find all the wind speed values in the dataframe, windHW, of
a specific hour be higher than the 5*sigma value at that hour?
For example, I would like to find if any of the wind speed values at
hour 1 are higher than 79.6 m/s, and if so, then replace that value
with NA.

I have something like this but I can't seem to figure out how to get
it for specific hours:

windHW$ws[windHW$ws>=spring$sigma5] <- NA

I imported the data using readLines and into the dataframe windHW.  I
also have R version 3.1.1

Any help would be appreciated!

Thanks,
Alexandra

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Finding values in a dataframe at a specified hour

2015-04-10 Thread Alexandra Catena

Update:

I have this so far.  * The first column of windHW is the wind speed.
The 5th column of the dataframe, spring, is the 5*sigma value of every
hour.  hourRow gives out all the rows of wind speed at a given hour.

for (i in 0:23){
  hourRow = which(windHW$hour==i,arr.ind=TRUE)
  for (h in hourRow){
if (windHW[h,1]>=spring[spring$hour==i,5]){
  windHW[h,1]<-NA}
  }
}

This then gives the error: Error in if (windHW[h, 1] >=
spring[spring$hour == i, 5]) { : argument is of length zero

*Note: The dataframe for each of the seasons have 24 rows
corresponding to each hour of the day 0:23.

Thanks,
Alexandra


On Fri, Apr 10, 2015 at 1:07 PM, Alexandra Catena  wrote:
> Hello,
>
> I have a large dataframe (windHW) of wind speeds (ws) at each hour
> from many days over a set of years.  Some of these values are
> obviously wrong (600 m/s) and I want to get rid of all the values that
> are larger than 5*sigma for each hour.  The 5*sigma (variable name
> sigma5) values are located in different dataframes for each season,
> with each dataframe titled as a season.  For example, in the
> dataframe, spring, the 5*sigma value is 79.6 m/s for hour 1.
>
> So my question is as follows: how can I get it so that the code will
> be able to find all the wind speed values in the dataframe, windHW, of
> a specific hour be higher than the 5*sigma value at that hour?
> For example, I would like to find if any of the wind speed values at
> hour 1 are higher than 79.6 m/s, and if so, then replace that value
> with NA.
>
> I have something like this but I can't seem to figure out how to get
> it for specific hours:
>
> windHW$ws[windHW$ws>=spring$sigma5] <- NA
>
> I imported the data using readLines and into the dataframe windHW.  I
> also have R version 3.1.1
>
> Any help would be appreciated!
>
> Thanks,
> Alexandra

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Finding values in a dataframe at a specified hour

2015-04-10 Thread Alexandra Catena

Hi Jim,

Thanks for the response, but unfortunately it results in the same
error.  I think it is something wrong with the if statement.  I tried
it out manually for the first row and hour that it's testing and
indeed, the wind speed is not higher than the 5*sigma value.  Since it
is not higher than the 5*sigma value, I would think it would just pass
to the next loop, yet it doesn't. I will keep trying!

Thanks,
Alexandra

On Fri, Apr 10, 2015 at 3:43 PM, Jim Lemon  wrote:
> Hi Alexandra,
> The error probably comes from the first iteration of i in 0:23. As indexing
> in R begins at 1, there is no element 0. Try using:
>
> for(i in 1:24) {
> ...
>
> and see what happens.
>
> Jim
>
>
> On Sat, Apr 11, 2015 at 7:06 AM, Alexandra Catena  wrote:
>>
>> Update:
>>
>> I have this so far.  * The first column of windHW is the wind speed.
>> The 5th column of the dataframe, spring, is the 5*sigma value of every
>> hour.  hourRow gives out all the rows of wind speed at a given hour.
>>
>> for (i in 0:23){
>>   hourRow = which(windHW$hour==i,arr.ind=TRUE)
>>   for (h in hourRow){
>> if (windHW[h,1]>=spring[spring$hour==i,5]){
>>   windHW[h,1]<-NA}
>>   }
>> }
>>
>> This then gives the error: Error in if (windHW[h, 1] >=
>> spring[spring$hour == i, 5]) { : argument is of length zero
>>
>> *Note: The dataframe for each of the seasons have 24 rows
>> corresponding to each hour of the day 0:23.
>>
>> Thanks,
>> Alexandra
>>
>>
>> On Fri, Apr 10, 2015 at 1:07 PM, Alexandra Catena 
>> wrote:
>> > Hello,
>> >
>> > I have a large dataframe (windHW) of wind speeds (ws) at each hour
>> > from many days over a set of years.  Some of these values are
>> > obviously wrong (600 m/s) and I want to get rid of all the values that
>> > are larger than 5*sigma for each hour.  The 5*sigma (variable name
>> > sigma5) values are located in different dataframes for each season,
>> > with each dataframe titled as a season.  For example, in the
>> > dataframe, spring, the 5*sigma value is 79.6 m/s for hour 1.
>> >
>> > So my question is as follows: how can I get it so that the code will
>> > be able to find all the wind speed values in the dataframe, windHW, of
>> > a specific hour be higher than the 5*sigma value at that hour?
>> > For example, I would like to find if any of the wind speed values at
>> > hour 1 are higher than 79.6 m/s, and if so, then replace that value
>> > with NA.
>> >
>> > I have something like this but I can't seem to figure out how to get
>> > it for specific hours:
>> >
>> > windHW$ws[windHW$ws>=spring$sigma5] <- NA
>> >
>> > I imported the data using readLines and into the dataframe windHW.  I
>> > also have R version 3.1.1
>> >
>> > Any help would be appreciated!
>> >
>> > Thanks,
>> > Alexandra
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to download and unzip data in a loop

Re: [R] How to download and unzip data in a loop

[R] How to unzip a .gz file

[R] Help with looping

[R] Replacing 9999 and 999 values with NA

[R] Error with using windRose function from the open air package

[R] Plotting using tapply function output

[R] Finding values in a dataframe at a specified hour

Re: [R] Finding values in a dataframe at a specified hour

Re: [R] Finding values in a dataframe at a specified hour

10 matches

Site Navigation

Mail list logo

Footer information