Problem in defining multidimensional array matrix and regression

2017-11-19 Thread shalu . ashu50
Hi, All,

I have 6 variables in CSV file. One is rainfall (dependent, at y-axis) and 
others are predictors (at x). I want to do multiple regression and create a 
correlation matrix between rainfall (y) and predictors (x; n1=5). Thus I want 
to read rainfall as a separate variable and others in separate columns, so I 
can apply the algo. However, I am not able to make a proper matrix for them. 

Here are my data and codes?
Please suggest me for the same.
I am new to Python.

RF  P1  P2  P3  P4  P5
120.235 0.234   -0.012  0.145   21.023  0.233
200.14  0.512   -0.021  0.214   22.21   0.332
185.362 0.147   -0.32   0.136   24.65   0.423
201.895 0.002   -0.12   0.217   30.25   0.325
165.235 0.256   0.001   0.2231.245  0.552
198.236 0.012   -0.362  0.215   32.25   0.333
350.263 0.98-0.85   0.321   38.412  0.411
145.25  0.046   -0.36   0.147   39.256  0.872
198.654 0.65-0.45   0.224   40.235  0.652
245.214 0.47-0.325  0.311   26.356  0.632
214.02  0.18-0.012  0.242   22.01   0.745
147.256 0.652   -0.785  0.311   18.256  0.924

import numpy as np
import statsmodels as sm
import statsmodels.formula as smf
import csv

with open("pcp1.csv", "r") as csvfile:
readCSV=csv.reader(csvfile)

rainfall = []
csvFileList = [] 

for row in readCSV:
Rain = row[0]
rainfall.append(Rain)

if len (row) !=0:
csvFileList = csvFileList + [row]   

print(csvFileList)
print(rainfall)

Please suggest me guys
Thanks

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Problem in defining multidimensional array matrix and regression

2017-11-19 Thread shalu . ashu50
Hello Peter,

Many thanks for your suggestion. 
Now I am using Pandas &
I already did that but now I need to make a multi-dimensional array for reading 
all variables (5 in this case) at one x-axis, so I can perform multiple 
regression analysis. 

I am not getting how to bring all variables at one axis (e.g. at x-axis)?

Thanks
Vishal

On Sunday, 19 November 2017 22:32:06 UTC+5:30, Peter Otten  wrote:
> shalu.ash...@gmail.com wrote:
> 
> > Hi, All,
> > 
> > I have 6 variables in CSV file. One is rainfall (dependent, at y-axis) and
> > others are predictors (at x). I want to do multiple regression and create
> > a correlation matrix between rainfall (y) and predictors (x; n1=5). Thus I
> > want to read rainfall as a separate variable and others in separate
> > columns, so I can apply the algo. However, I am not able to make a proper
> > matrix for them.
> > 
> > Here are my data and codes?
> > Please suggest me for the same.
> > I am new to Python.
> > 
> > RF  P1  P2  P3  P4  P5
> > 120.235 0.234   -0.012  0.145   21.023  0.233
> > 200.14  0.512   -0.021  0.214   22.21   0.332
> > 185.362 0.147   -0.32   0.136   24.65   0.423
> > 201.895 0.002   -0.12   0.217   30.25   0.325
> > 165.235 0.256   0.001   0.2231.245  0.552
> > 198.236 0.012   -0.362  0.215   32.25   0.333
> > 350.263 0.98-0.85   0.321   38.412  0.411
> > 145.25  0.046   -0.36   0.147   39.256  0.872
> > 198.654 0.65-0.45   0.224   40.235  0.652
> > 245.214 0.47-0.325  0.311   26.356  0.632
> > 214.02  0.18-0.012  0.242   22.01   0.745
> > 147.256 0.652   -0.785  0.311   18.256  0.924
> > 
> > import numpy as np
> > import statsmodels as sm
> > import statsmodels.formula as smf
> > import csv
> > 
> > with open("pcp1.csv", "r") as csvfile:
> > readCSV=csv.reader(csvfile)
> > 
> > rainfall = []
> > csvFileList = []
> > 
> > for row in readCSV:
> > Rain = row[0]
> > rainfall.append(Rain)
> > 
> > if len (row) !=0:
> > csvFileList = csvFileList + [row]
> > 
> > print(csvFileList)
> > print(rainfall)
> 
> You are not the first to read tabular data from a file; therefore numpy (and 
> pandas) offer highlevel function to do just that. Once you have the complete 
> table extracting a specific column is easy. For instance:
> 
> $ cat rainfall.txt 
> RF  P1  P2  P3  P4  P5
> 120.235 0.234   -0.012  0.145   21.023  0.233
> 200.14  0.512   -0.021  0.214   22.21   0.332
> 185.362 0.147   -0.32   0.136   24.65   0.423
> 201.895 0.002   -0.12   0.217   30.25   0.325
> 165.235 0.256   0.001   0.2231.245  0.552
> 198.236 0.012   -0.362  0.215   32.25   0.333
> 350.263 0.98-0.85   0.321   38.412  0.411
> 145.25  0.046   -0.36   0.147   39.256  0.872
> 198.654 0.65-0.45   0.224   40.235  0.652
> 245.214 0.47-0.325  0.311   26.356  0.632
> 214.02  0.18-0.012  0.242   22.01   0.745
> 147.256 0.652   -0.785  0.311   18.256  0.924
> $ python3
> Python 3.4.3 (default, Nov 17 2016, 01:08:31) 
> [GCC 4.8.4] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import numpy
> >>> rf = numpy.genfromtxt("rainfall.txt", names=True)
> >>> rf["RF"]
> array([ 120.235,  200.14 ,  185.362,  201.895,  165.235,  198.236,
> 350.263,  145.25 ,  198.654,  245.214,  214.02 ,  147.256])
> >>> rf["P3"]
> array([ 0.145,  0.214,  0.136,  0.217,  0.22 ,  0.215,  0.321,  0.147,
> 0.224,  0.311,  0.242,  0.311])

-- 
https://mail.python.org/mailman/listinfo/python-list


How to make code space variant with gridded data (lat/long)

2018-03-28 Thread shalu . ashu50
Hello all,

This code is written for multivariate (multiple independent variables 
x1,x2,x3..xn and a dependent variable y) time series analysis using logistic 
regression (correlation and prediction). 

#Import Libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

#Import Dataset
dataset = pd.read_csv(‘precipitation.csv’)
x = dataset.iloc[:,[2,3]].values
y =dataset.iloc[:,4].values

#Split Training Set and Testing Set
from sklearn.cross_validation import train_test_split
x_train, x_test, y_train, y_test =train_test_split(x,y,test_size=0.25)

#Feature Scaling
from sklearn.preprocessing import StandardScaler
sc_X=StandardScaler()
x_train=sc_X.fit_transform(x_train)
x_test=sc_X.transform(x_test)

#Training the Logistic Model
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression()
classifier.fit(x_train, y_train)

#Predicting the Test Set Result
y_pred = classifier.predict(x_test)


This code is based on one point location (one lat/long) datasets. Suppose, I am 
having gridded datasets (which has many points/locations, lat/long, varying in 
space and time) then How I will implement this code. I am not expertise in 
python. If somebody can help me in this? If somebody can give me an example or 
idea so I can implement this code as per my requirement.

Thank you in advance. 

Vishu
-- 
https://mail.python.org/mailman/listinfo/python-list


How to apply LR over gridded time series datasets ?

2018-03-28 Thread shalu . ashu50
Hello all,

This code is written for multivariate (multiple independent variables 
x1,x2,x3..xn and a dependent variable y) time series analysis using logistic 
regression (correlation and prediction). 

#Import Libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

#Import Dataset
dataset = pd.read_csv(‘precipitation.csv’)
x = dataset.iloc[:,[2,3]].values
y =dataset.iloc[:,4].values

#Split Training Set and Testing Set
from sklearn.cross_validation import train_test_split
x_train, x_test, y_train, y_test =train_test_split(x,y,test_size=0.25)

#Feature Scaling
from sklearn.preprocessing import StandardScaler
sc_X=StandardScaler()
x_train=sc_X.fit_transform(x_train)
x_test=sc_X.transform(x_test)

#Training the Logistic Model
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression()
classifier.fit(x_train, y_train)

#Predicting the Test Set Result
y_pred = classifier.predict(x_test)


This code is based on one point location (one lat/long) datasets. Suppose, I am 
having gridded datasets (which has many points/locations, lat/long, varying in 
space and time) then How I will implement this code. I am not expertise in 
python. If somebody can help me in this? If somebody can give me an example or 
idea so I can implement this code as per my requirement.

Thank you in advance. 

Vishu
-- 
https://mail.python.org/mailman/listinfo/python-list


Installing NETCDF4 in windows using python 3.4

2018-04-12 Thread shalu . ashu50
Hi All,

I have downloaded NETCDF4 module from https://pypi.python.org/pypi/netCDF4 e.g. 
netCDF4-1.3.1-cp34-cp34m-win_amd64.whl

I have installed it using pip install netCDF4-1.3.1-cp34-cp34m-win_amd64.whl

through the command prompt in Spyder. It has successfully installed. 

C:\python3>pip install netCDF4-1.3.1-cp34-cp34m-win_amd64.whl
Processing c:\python3\netcdf4-1.3.1-cp34-cp34m-win_amd64.whl
Requirement already satisfied: numpy>=1.7 in 
c:\python3\winpython-64bit-3.4.4.5qt5\python-3.4.4.amd64\lib\site-packages 
(from netCDF4==1.3.1)
Installing collected packages: netCDF4
  Found existing installation: netCDF4 1.3.2
Uninstalling netCDF4-1.3.2:
  Successfully uninstalled netCDF4-1.3.2
Successfully installed netCDF4-1.3.1


But when I am trying to import, it is giving an error:

import netCDF4 as nc4 Traceback (most recent call last):

File "", line 1, in import netCDF4 as nc4

File 
"C:\python3\WinPython-64bit-3.4.4.5Qt5\python-3.4.4.amd64\lib\site-packages\netCDF4__init__.py",
 line 3, in from ._netCDF4 import *

File "netCDF4_netCDF4.pyx", line 2988, in init netCDF4._netCDF4

AttributeError: type object 'netCDF4._netCDF4.Dimension' has no attribute 
'reduce_cython'

How can I fix it? Suggestions would be appreciated.
-- 
https://mail.python.org/mailman/listinfo/python-list


How to save xarray data to csv

2018-04-16 Thread shalu . ashu50
Hello All,

I have used xarray to merge several netcdf files into one file and then I 
subset the data of my point of interest using lat/long. Now I want to save this 
array data (time,lat,long) into csv file but I am getting an error with my code:

dsmerged = 
xarray.open_mfdataset('F:/NTU_PDF__Work/1_Codes/1_Python/testing/netcdf/*.nc')

#save to netcdf file
dsmerged.to_netcdf('combine.nc')

# Extraction of data as per given coordinates 
#values (lat/long with their upper and lower bound)

fin = netCDF4.Dataset("combine.nc" ,"r")
# print the all the variables in this file
print (fin.variables)
# print the dimensions of the variable
print (fin.variables["clt"].shape)
#Out: (20075, 90, 144)
# retrieve time step from the variable
clt0 = (fin.variables["clt"])
# extract a subset of the full dataset contained in the file
clt0sub = clt0[10:30,20:30]
# xarray to numpy array   
clt1=numpy.array(clt0sub)
# saving data into csv file
with open('combine11.csv', 'wb') as f:
writer = csv.writer(f, delimiter=',')
writer.writerows(enumerate(clt1))

getting this error - TypeError: a bytes-like object is required, not 'str' when 
I am removing "b" the error disappears but the data saving in wrong format 
example:-

0,"[[  99.93312836   99.99977112  100. ...,   98.53624725
99.98111725   99.9799881 ]
 [  99.95301056   99.99489594   99.8474 ...,   99.999870399.99951172
99.97265625]
 [  99.67852783   99.96372986   99.9237 ...,   99.96694946   99.9842453
99.96450806]
 ..., 
 [  78.29571533   45.00857544   24.39345932 ...,   90.86527252
84.48490143   62.53995895]
 [  42.03381348   46.169670122.71044922 ...,   80.88492584
71.15007019   50.95384216]
 [  34.75331879   49.99913025   17.66173935 ...,   57.12231827
62.56645584   40.6435585 ]]"

1,"[[ 100.  100.  100. ...,   99.93876648
99.98928833  100.]
 [  99.9773941   100.   99.4933548  ...,   97.93031311
97.36623383   97.07974243]
 [  98.593490699.7242   99.44548035 ...,   79.59191132
85.77541351   94.40919495]
 ..., 

suggestions would be appreciated
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to save xarray data to csv

2018-04-16 Thread shalu . ashu50
On Tuesday, 17 April 2018 02:01:19 UTC+8, Chris Angelico  wrote:
> On Tue, Apr 17, 2018 at 3:50 AM, Rhodri James  wrote:
> > You don't say, but I assume you're using Python 2.x
> >
> > [snip]
> >
> >> getting this error - TypeError: a bytes-like object is required, not 'str'
> 
> Actually, based on this error, I would suspect Python 3.x.
Yes Chris, I am using 3.x only

 But you're
> right that (a) the Python version should be stated for clarity
> (there's a LOT of difference between Python 3.3 and Python 3.7), and
> (b) the full traceback is very helpful.
> 
> ChrisA

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to save xarray data to csv

2018-04-16 Thread shalu . ashu50

On Tuesday, 17 April 2018 01:56:25 UTC+8, Rhodri James  wrote:
> On 16/04/18 15:55, shalu.ash...@gmail.com wrote:
> > Hello All,
> > 
> > I have used xarray to merge several netcdf files into one file and then I 
> > subset the data of my point of interest using lat/long. Now I want to save 
> > this array data (time,lat,long) into csv file but I am getting an error 
> > with my code:
> 
> 
> You don't say, but I assume you're using Python 2.x

Hi James, I am using WinPython Spyder 3.6. 
> 
> [snip]
> 
> > # xarray to numpy array
> > clt1=numpy.array(clt0sub)
> > # saving data into csv file
> > with open('combine11.csv', 'wb') as f:
> >  writer = csv.writer(f, delimiter=',')
> >  writer.writerows(enumerate(clt1))
> > 
> > getting this error - TypeError: a bytes-like object is required, not 'str'
> 
> Copy and paste the entire traceback please if you want help.  We have 
> very little chance of working out what produced that error without it.
> 
> > when I am removing "b" the error disappears
here i mean [with open('combine11.csv', 'wb') as f:] wb: writing binaries
if i am using "wb" so i m getting "TypeError: a bytes-like object is required, 
not 'str'"

if i am removing "b" and using only "w" so this error disappears and when i am 
writing data into txt/csv so it is just pasting what i am seeing in my console 
window. I mean i have 20045 time steps but i am getting 100.2...like that as 
previously mentioned. Not getting full time steps. It is like printscreen of my 
python console.

My question is how can i save multi-dimentional (3d: time series values, lat, 
long) data (xarrays) into csv. 

Thanks

> 
> Which "b"?  Don't leave us guessing, we might guess wrong.
> 
> > but the data saving in wrong format
> 
> Really?  It looks to me like you are getting exactly what you asked for. 
>   What format were you expecting?  What are you getting that doesn't 
> belong.  I suspect that you don't want the "enumerate", but beyond that 
> I have no idea what you're after.
> 
> -- 
> Rhodri James *-* Kynesim Ltd

-- 
https://mail.python.org/mailman/listinfo/python-list


Problem in extracting and saving multi-dimensional time series data from netcdf file to csv file

2018-04-17 Thread shalu . ashu50
Hi All,

I am using winpython spyder 3.6. I am trying to extract a variable with their 
time series values (daily from 1950 to 2004). The data structure is as follows:


Dimensions: (bnds: 2, lat: 90, lon: 144, time: 20075)
Coordinates:
  * lat (lat) float64 -89.0 -87.0 -85.0 -83.0 -81.0 -79.0 -77.0 ...
  * lon (lon) float64 1.25 3.75 6.25 8.75 11.25 13.75 16.25 18.75 ...
  * time(time) datetime64[ns] 1950-01-01T12:00:00 ...
Dimensions without coordinates: bnds
Data variables:
time_bnds   (time, bnds) datetime64[ns] ...
lat_bnds(time, lat, bnds) float64 ...
lon_bnds(time, lon, bnds) float64 ...
clt (time, lat, lon) float32 ...

Now I am extracting "clt" variable values based on my area of interest using 
lat/long boxes 

(latbounds = [ -13.0 , 31.0 ]# 22 grid numbers
lonbounds = [ 89.75 , 151.25 ]#26 grid numbers

My code is here:

import netCDF4
import xarray as xr
import numpy as np
import csv
import pandas as pd
from pylab import *
import datetime

# NetCDF4-Python can read a remote OPeNDAP dataset or a local NetCDF file:
nc = 
netCDF4.Dataset('clt_day_GFDL-CM3_historical_r1i1p1_19500101-20041231.nc.nc')
nc.variables.keys()


lat = nc.variables['lat'][:]
lon = nc.variables['lon'][:]
time_var = nc.variables['time']
dtime = netCDF4.num2date(time_var[:],time_var.units)

lat_bnds, lon_bnds = [-13.0 , 31.0], [89.75 , 151.25]
# determine what longitude convention is being used [-180,180], [0,360]
print (lon.min(),lon.max())
print (lat.min(),lat.max())

# latitude lower and upper index
latli = np.argmin( np.abs( lat - lat_bnds[0] ) )
latui = np.argmin( np.abs( lat - lat_bnds[1] ) ) 


# longitude lower and upper index
lonli = np.argmin( np.abs( lon - lon_bnds[0] ) )
lonui = np.argmin( np.abs( lon - lon_bnds[1] ) )  
print(lat)

clt_subset = nc.variables['clt'][:,latli:latui , lonli:lonui]

upto here I am able to extract the data but I am not able to save these values 
in csv file. I am also able to save values for one location but when I am going 
with multi-dimentional extracted values so it is giving an error

when i am executing this:

hs = clt_subset[istart:istop,latli:latui , lonli:lonui]
tim = dtime[istart:istop]
print(tim)
# Create Pandas time series object
ts = pd.Series(hs,index=tim,name=clt_subset)

Error: - 

ts = pd.Series(hs,index=tim,name=clt_subset)
Traceback (most recent call last):

  File "", line 1, in 
ts = pd.Series(hs,index=tim,name=clt_subset)

  File 
"C:\python3\WinPython\python-3.6.5.amd64\lib\site-packages\pandas\core\series.py",
 line 264, in __init__
raise_cast_failure=True)

  File 
"C:\python3\WinPython\python-3.6.5.amd64\lib\site-packages\pandas\core\series.py",
 line 3275, in _sanitize_array
raise Exception('Data must be 1-dimensional')

Exception: Data must be 1-dimensional 

Suggestions would be appreciated. Thanks
Vishu
-- 
https://mail.python.org/mailman/listinfo/python-list


How to save multi-dimentional array values into CSV/Test file

2018-04-17 Thread shalu . ashu50
Hi All,

I am using winpy 6.3
I have this array:

code:
clt_subset = nc.variables['clt'][:,latli:latui , lonli:lonui]

print(clt_subset):
[[[  96.07967377   32.581317930.86773872 ...,   99.6185
 99.7711   99.7711]
  [  93.75789642   86.78536987   46.51786423 ...,   99.99756622
 99.99769592   99.99931335]
  [  99.19438171   99.71717834   97.34263611 ...,   99.99707794
 99.99639893   99.93907928]
  ..., 
  [   7.657027241.1814307 4.02125835 ...,   39.58660126
 37.71473694   42.10451508]
  [   9.48283291   18.424989745.22411346 ...,   70.95629883
 72.82741547   72.89440155]
  [  33.297317546.50339508   88.39287567 ...,   98.50241089
 98.47457123   91.32685089]]

 [[  85.40306854   28.19069862   19.56433678 ...,   99.96898651
 99.99860382  100.]
  [  80.49911499   49.17562485   25.18140984 ...,   99.99198151
 99.99337006   99.99979401]
  [  99.982116791.44667816   78.83125305 ...,   99.99027252
 99.99280548   99.5422]
  ..., 

so on..

print (clt_subset.shape)
(20075, 22, 25)

I am not able to save this array into csv file with time series using datetime 
function. The code is here:

# 2. Specify the exact time period you want:
start = datetime.datetime(1950,1,1,0,0,0)
stop = datetime.datetime(2004,12,1,0,0,0)

istart = netCDF4.date2index(start,time_var,select='nearest')
istop = netCDF4.date2index(stop,time_var,select='nearest')
print (istart,istop)

hs = clt_subset[istart:istop,latli:latui , lonli:lonui]
tim = dtime[istart:istop]

ts = pd.Series(hs,index=tim,name=clt_subset)
ts.to_csv('time_series_from_netcdf.csv')

while executing this, saying:

Error-
  File 
"C:\python3\WinPython\python-3.6.5.amd64\lib\site-packages\pandas\core\series.py",
 line 3275, in _sanitize_array
raise Exception('Data must be 1-dimensional')

Exception: Data must be 1-dimensional
-- 
https://mail.python.org/mailman/listinfo/python-list