Problem in defining multidimensional array matrix and regression
Hi, All, I have 6 variables in CSV file. One is rainfall (dependent, at y-axis) and others are predictors (at x). I want to do multiple regression and create a correlation matrix between rainfall (y) and predictors (x; n1=5). Thus I want to read rainfall as a separate variable and others in separate columns, so I can apply the algo. However, I am not able to make a proper matrix for them. Here are my data and codes? Please suggest me for the same. I am new to Python. RF P1 P2 P3 P4 P5 120.235 0.234 -0.012 0.145 21.023 0.233 200.14 0.512 -0.021 0.214 22.21 0.332 185.362 0.147 -0.32 0.136 24.65 0.423 201.895 0.002 -0.12 0.217 30.25 0.325 165.235 0.256 0.001 0.2231.245 0.552 198.236 0.012 -0.362 0.215 32.25 0.333 350.263 0.98-0.85 0.321 38.412 0.411 145.25 0.046 -0.36 0.147 39.256 0.872 198.654 0.65-0.45 0.224 40.235 0.652 245.214 0.47-0.325 0.311 26.356 0.632 214.02 0.18-0.012 0.242 22.01 0.745 147.256 0.652 -0.785 0.311 18.256 0.924 import numpy as np import statsmodels as sm import statsmodels.formula as smf import csv with open("pcp1.csv", "r") as csvfile: readCSV=csv.reader(csvfile) rainfall = [] csvFileList = [] for row in readCSV: Rain = row[0] rainfall.append(Rain) if len (row) !=0: csvFileList = csvFileList + [row] print(csvFileList) print(rainfall) Please suggest me guys Thanks -- https://mail.python.org/mailman/listinfo/python-list
Re: Problem in defining multidimensional array matrix and regression
Hello Peter, Many thanks for your suggestion. Now I am using Pandas & I already did that but now I need to make a multi-dimensional array for reading all variables (5 in this case) at one x-axis, so I can perform multiple regression analysis. I am not getting how to bring all variables at one axis (e.g. at x-axis)? Thanks Vishal On Sunday, 19 November 2017 22:32:06 UTC+5:30, Peter Otten wrote: > shalu.ash...@gmail.com wrote: > > > Hi, All, > > > > I have 6 variables in CSV file. One is rainfall (dependent, at y-axis) and > > others are predictors (at x). I want to do multiple regression and create > > a correlation matrix between rainfall (y) and predictors (x; n1=5). Thus I > > want to read rainfall as a separate variable and others in separate > > columns, so I can apply the algo. However, I am not able to make a proper > > matrix for them. > > > > Here are my data and codes? > > Please suggest me for the same. > > I am new to Python. > > > > RF P1 P2 P3 P4 P5 > > 120.235 0.234 -0.012 0.145 21.023 0.233 > > 200.14 0.512 -0.021 0.214 22.21 0.332 > > 185.362 0.147 -0.32 0.136 24.65 0.423 > > 201.895 0.002 -0.12 0.217 30.25 0.325 > > 165.235 0.256 0.001 0.2231.245 0.552 > > 198.236 0.012 -0.362 0.215 32.25 0.333 > > 350.263 0.98-0.85 0.321 38.412 0.411 > > 145.25 0.046 -0.36 0.147 39.256 0.872 > > 198.654 0.65-0.45 0.224 40.235 0.652 > > 245.214 0.47-0.325 0.311 26.356 0.632 > > 214.02 0.18-0.012 0.242 22.01 0.745 > > 147.256 0.652 -0.785 0.311 18.256 0.924 > > > > import numpy as np > > import statsmodels as sm > > import statsmodels.formula as smf > > import csv > > > > with open("pcp1.csv", "r") as csvfile: > > readCSV=csv.reader(csvfile) > > > > rainfall = [] > > csvFileList = [] > > > > for row in readCSV: > > Rain = row[0] > > rainfall.append(Rain) > > > > if len (row) !=0: > > csvFileList = csvFileList + [row] > > > > print(csvFileList) > > print(rainfall) > > You are not the first to read tabular data from a file; therefore numpy (and > pandas) offer highlevel function to do just that. Once you have the complete > table extracting a specific column is easy. For instance: > > $ cat rainfall.txt > RF P1 P2 P3 P4 P5 > 120.235 0.234 -0.012 0.145 21.023 0.233 > 200.14 0.512 -0.021 0.214 22.21 0.332 > 185.362 0.147 -0.32 0.136 24.65 0.423 > 201.895 0.002 -0.12 0.217 30.25 0.325 > 165.235 0.256 0.001 0.2231.245 0.552 > 198.236 0.012 -0.362 0.215 32.25 0.333 > 350.263 0.98-0.85 0.321 38.412 0.411 > 145.25 0.046 -0.36 0.147 39.256 0.872 > 198.654 0.65-0.45 0.224 40.235 0.652 > 245.214 0.47-0.325 0.311 26.356 0.632 > 214.02 0.18-0.012 0.242 22.01 0.745 > 147.256 0.652 -0.785 0.311 18.256 0.924 > $ python3 > Python 3.4.3 (default, Nov 17 2016, 01:08:31) > [GCC 4.8.4] on linux > Type "help", "copyright", "credits" or "license" for more information. > >>> import numpy > >>> rf = numpy.genfromtxt("rainfall.txt", names=True) > >>> rf["RF"] > array([ 120.235, 200.14 , 185.362, 201.895, 165.235, 198.236, > 350.263, 145.25 , 198.654, 245.214, 214.02 , 147.256]) > >>> rf["P3"] > array([ 0.145, 0.214, 0.136, 0.217, 0.22 , 0.215, 0.321, 0.147, > 0.224, 0.311, 0.242, 0.311]) -- https://mail.python.org/mailman/listinfo/python-list
How to make code space variant with gridded data (lat/long)
Hello all, This code is written for multivariate (multiple independent variables x1,x2,x3..xn and a dependent variable y) time series analysis using logistic regression (correlation and prediction). #Import Libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd #Import Dataset dataset = pd.read_csv(‘precipitation.csv’) x = dataset.iloc[:,[2,3]].values y =dataset.iloc[:,4].values #Split Training Set and Testing Set from sklearn.cross_validation import train_test_split x_train, x_test, y_train, y_test =train_test_split(x,y,test_size=0.25) #Feature Scaling from sklearn.preprocessing import StandardScaler sc_X=StandardScaler() x_train=sc_X.fit_transform(x_train) x_test=sc_X.transform(x_test) #Training the Logistic Model from sklearn.linear_model import LogisticRegression classifier = LogisticRegression() classifier.fit(x_train, y_train) #Predicting the Test Set Result y_pred = classifier.predict(x_test) This code is based on one point location (one lat/long) datasets. Suppose, I am having gridded datasets (which has many points/locations, lat/long, varying in space and time) then How I will implement this code. I am not expertise in python. If somebody can help me in this? If somebody can give me an example or idea so I can implement this code as per my requirement. Thank you in advance. Vishu -- https://mail.python.org/mailman/listinfo/python-list
How to apply LR over gridded time series datasets ?
Hello all, This code is written for multivariate (multiple independent variables x1,x2,x3..xn and a dependent variable y) time series analysis using logistic regression (correlation and prediction). #Import Libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd #Import Dataset dataset = pd.read_csv(‘precipitation.csv’) x = dataset.iloc[:,[2,3]].values y =dataset.iloc[:,4].values #Split Training Set and Testing Set from sklearn.cross_validation import train_test_split x_train, x_test, y_train, y_test =train_test_split(x,y,test_size=0.25) #Feature Scaling from sklearn.preprocessing import StandardScaler sc_X=StandardScaler() x_train=sc_X.fit_transform(x_train) x_test=sc_X.transform(x_test) #Training the Logistic Model from sklearn.linear_model import LogisticRegression classifier = LogisticRegression() classifier.fit(x_train, y_train) #Predicting the Test Set Result y_pred = classifier.predict(x_test) This code is based on one point location (one lat/long) datasets. Suppose, I am having gridded datasets (which has many points/locations, lat/long, varying in space and time) then How I will implement this code. I am not expertise in python. If somebody can help me in this? If somebody can give me an example or idea so I can implement this code as per my requirement. Thank you in advance. Vishu -- https://mail.python.org/mailman/listinfo/python-list
Installing NETCDF4 in windows using python 3.4
Hi All, I have downloaded NETCDF4 module from https://pypi.python.org/pypi/netCDF4 e.g. netCDF4-1.3.1-cp34-cp34m-win_amd64.whl I have installed it using pip install netCDF4-1.3.1-cp34-cp34m-win_amd64.whl through the command prompt in Spyder. It has successfully installed. C:\python3>pip install netCDF4-1.3.1-cp34-cp34m-win_amd64.whl Processing c:\python3\netcdf4-1.3.1-cp34-cp34m-win_amd64.whl Requirement already satisfied: numpy>=1.7 in c:\python3\winpython-64bit-3.4.4.5qt5\python-3.4.4.amd64\lib\site-packages (from netCDF4==1.3.1) Installing collected packages: netCDF4 Found existing installation: netCDF4 1.3.2 Uninstalling netCDF4-1.3.2: Successfully uninstalled netCDF4-1.3.2 Successfully installed netCDF4-1.3.1 But when I am trying to import, it is giving an error: import netCDF4 as nc4 Traceback (most recent call last): File "", line 1, in import netCDF4 as nc4 File "C:\python3\WinPython-64bit-3.4.4.5Qt5\python-3.4.4.amd64\lib\site-packages\netCDF4__init__.py", line 3, in from ._netCDF4 import * File "netCDF4_netCDF4.pyx", line 2988, in init netCDF4._netCDF4 AttributeError: type object 'netCDF4._netCDF4.Dimension' has no attribute 'reduce_cython' How can I fix it? Suggestions would be appreciated. -- https://mail.python.org/mailman/listinfo/python-list
How to save xarray data to csv
Hello All, I have used xarray to merge several netcdf files into one file and then I subset the data of my point of interest using lat/long. Now I want to save this array data (time,lat,long) into csv file but I am getting an error with my code: dsmerged = xarray.open_mfdataset('F:/NTU_PDF__Work/1_Codes/1_Python/testing/netcdf/*.nc') #save to netcdf file dsmerged.to_netcdf('combine.nc') # Extraction of data as per given coordinates #values (lat/long with their upper and lower bound) fin = netCDF4.Dataset("combine.nc" ,"r") # print the all the variables in this file print (fin.variables) # print the dimensions of the variable print (fin.variables["clt"].shape) #Out: (20075, 90, 144) # retrieve time step from the variable clt0 = (fin.variables["clt"]) # extract a subset of the full dataset contained in the file clt0sub = clt0[10:30,20:30] # xarray to numpy array clt1=numpy.array(clt0sub) # saving data into csv file with open('combine11.csv', 'wb') as f: writer = csv.writer(f, delimiter=',') writer.writerows(enumerate(clt1)) getting this error - TypeError: a bytes-like object is required, not 'str' when I am removing "b" the error disappears but the data saving in wrong format example:- 0,"[[ 99.93312836 99.99977112 100. ..., 98.53624725 99.98111725 99.9799881 ] [ 99.95301056 99.99489594 99.8474 ..., 99.999870399.99951172 99.97265625] [ 99.67852783 99.96372986 99.9237 ..., 99.96694946 99.9842453 99.96450806] ..., [ 78.29571533 45.00857544 24.39345932 ..., 90.86527252 84.48490143 62.53995895] [ 42.03381348 46.169670122.71044922 ..., 80.88492584 71.15007019 50.95384216] [ 34.75331879 49.99913025 17.66173935 ..., 57.12231827 62.56645584 40.6435585 ]]" 1,"[[ 100. 100. 100. ..., 99.93876648 99.98928833 100.] [ 99.9773941 100. 99.4933548 ..., 97.93031311 97.36623383 97.07974243] [ 98.593490699.7242 99.44548035 ..., 79.59191132 85.77541351 94.40919495] ..., suggestions would be appreciated -- https://mail.python.org/mailman/listinfo/python-list
Re: How to save xarray data to csv
On Tuesday, 17 April 2018 02:01:19 UTC+8, Chris Angelico wrote: > On Tue, Apr 17, 2018 at 3:50 AM, Rhodri James wrote: > > You don't say, but I assume you're using Python 2.x > > > > [snip] > > > >> getting this error - TypeError: a bytes-like object is required, not 'str' > > Actually, based on this error, I would suspect Python 3.x. Yes Chris, I am using 3.x only But you're > right that (a) the Python version should be stated for clarity > (there's a LOT of difference between Python 3.3 and Python 3.7), and > (b) the full traceback is very helpful. > > ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: How to save xarray data to csv
On Tuesday, 17 April 2018 01:56:25 UTC+8, Rhodri James wrote: > On 16/04/18 15:55, shalu.ash...@gmail.com wrote: > > Hello All, > > > > I have used xarray to merge several netcdf files into one file and then I > > subset the data of my point of interest using lat/long. Now I want to save > > this array data (time,lat,long) into csv file but I am getting an error > > with my code: > > > You don't say, but I assume you're using Python 2.x Hi James, I am using WinPython Spyder 3.6. > > [snip] > > > # xarray to numpy array > > clt1=numpy.array(clt0sub) > > # saving data into csv file > > with open('combine11.csv', 'wb') as f: > > writer = csv.writer(f, delimiter=',') > > writer.writerows(enumerate(clt1)) > > > > getting this error - TypeError: a bytes-like object is required, not 'str' > > Copy and paste the entire traceback please if you want help. We have > very little chance of working out what produced that error without it. > > > when I am removing "b" the error disappears here i mean [with open('combine11.csv', 'wb') as f:] wb: writing binaries if i am using "wb" so i m getting "TypeError: a bytes-like object is required, not 'str'" if i am removing "b" and using only "w" so this error disappears and when i am writing data into txt/csv so it is just pasting what i am seeing in my console window. I mean i have 20045 time steps but i am getting 100.2...like that as previously mentioned. Not getting full time steps. It is like printscreen of my python console. My question is how can i save multi-dimentional (3d: time series values, lat, long) data (xarrays) into csv. Thanks > > Which "b"? Don't leave us guessing, we might guess wrong. > > > but the data saving in wrong format > > Really? It looks to me like you are getting exactly what you asked for. > What format were you expecting? What are you getting that doesn't > belong. I suspect that you don't want the "enumerate", but beyond that > I have no idea what you're after. > > -- > Rhodri James *-* Kynesim Ltd -- https://mail.python.org/mailman/listinfo/python-list
Problem in extracting and saving multi-dimensional time series data from netcdf file to csv file
Hi All, I am using winpython spyder 3.6. I am trying to extract a variable with their time series values (daily from 1950 to 2004). The data structure is as follows: Dimensions: (bnds: 2, lat: 90, lon: 144, time: 20075) Coordinates: * lat (lat) float64 -89.0 -87.0 -85.0 -83.0 -81.0 -79.0 -77.0 ... * lon (lon) float64 1.25 3.75 6.25 8.75 11.25 13.75 16.25 18.75 ... * time(time) datetime64[ns] 1950-01-01T12:00:00 ... Dimensions without coordinates: bnds Data variables: time_bnds (time, bnds) datetime64[ns] ... lat_bnds(time, lat, bnds) float64 ... lon_bnds(time, lon, bnds) float64 ... clt (time, lat, lon) float32 ... Now I am extracting "clt" variable values based on my area of interest using lat/long boxes (latbounds = [ -13.0 , 31.0 ]# 22 grid numbers lonbounds = [ 89.75 , 151.25 ]#26 grid numbers My code is here: import netCDF4 import xarray as xr import numpy as np import csv import pandas as pd from pylab import * import datetime # NetCDF4-Python can read a remote OPeNDAP dataset or a local NetCDF file: nc = netCDF4.Dataset('clt_day_GFDL-CM3_historical_r1i1p1_19500101-20041231.nc.nc') nc.variables.keys() lat = nc.variables['lat'][:] lon = nc.variables['lon'][:] time_var = nc.variables['time'] dtime = netCDF4.num2date(time_var[:],time_var.units) lat_bnds, lon_bnds = [-13.0 , 31.0], [89.75 , 151.25] # determine what longitude convention is being used [-180,180], [0,360] print (lon.min(),lon.max()) print (lat.min(),lat.max()) # latitude lower and upper index latli = np.argmin( np.abs( lat - lat_bnds[0] ) ) latui = np.argmin( np.abs( lat - lat_bnds[1] ) ) # longitude lower and upper index lonli = np.argmin( np.abs( lon - lon_bnds[0] ) ) lonui = np.argmin( np.abs( lon - lon_bnds[1] ) ) print(lat) clt_subset = nc.variables['clt'][:,latli:latui , lonli:lonui] upto here I am able to extract the data but I am not able to save these values in csv file. I am also able to save values for one location but when I am going with multi-dimentional extracted values so it is giving an error when i am executing this: hs = clt_subset[istart:istop,latli:latui , lonli:lonui] tim = dtime[istart:istop] print(tim) # Create Pandas time series object ts = pd.Series(hs,index=tim,name=clt_subset) Error: - ts = pd.Series(hs,index=tim,name=clt_subset) Traceback (most recent call last): File "", line 1, in ts = pd.Series(hs,index=tim,name=clt_subset) File "C:\python3\WinPython\python-3.6.5.amd64\lib\site-packages\pandas\core\series.py", line 264, in __init__ raise_cast_failure=True) File "C:\python3\WinPython\python-3.6.5.amd64\lib\site-packages\pandas\core\series.py", line 3275, in _sanitize_array raise Exception('Data must be 1-dimensional') Exception: Data must be 1-dimensional Suggestions would be appreciated. Thanks Vishu -- https://mail.python.org/mailman/listinfo/python-list
How to save multi-dimentional array values into CSV/Test file
Hi All, I am using winpy 6.3 I have this array: code: clt_subset = nc.variables['clt'][:,latli:latui , lonli:lonui] print(clt_subset): [[[ 96.07967377 32.581317930.86773872 ..., 99.6185 99.7711 99.7711] [ 93.75789642 86.78536987 46.51786423 ..., 99.99756622 99.99769592 99.99931335] [ 99.19438171 99.71717834 97.34263611 ..., 99.99707794 99.99639893 99.93907928] ..., [ 7.657027241.1814307 4.02125835 ..., 39.58660126 37.71473694 42.10451508] [ 9.48283291 18.424989745.22411346 ..., 70.95629883 72.82741547 72.89440155] [ 33.297317546.50339508 88.39287567 ..., 98.50241089 98.47457123 91.32685089]] [[ 85.40306854 28.19069862 19.56433678 ..., 99.96898651 99.99860382 100.] [ 80.49911499 49.17562485 25.18140984 ..., 99.99198151 99.99337006 99.99979401] [ 99.982116791.44667816 78.83125305 ..., 99.99027252 99.99280548 99.5422] ..., so on.. print (clt_subset.shape) (20075, 22, 25) I am not able to save this array into csv file with time series using datetime function. The code is here: # 2. Specify the exact time period you want: start = datetime.datetime(1950,1,1,0,0,0) stop = datetime.datetime(2004,12,1,0,0,0) istart = netCDF4.date2index(start,time_var,select='nearest') istop = netCDF4.date2index(stop,time_var,select='nearest') print (istart,istop) hs = clt_subset[istart:istop,latli:latui , lonli:lonui] tim = dtime[istart:istop] ts = pd.Series(hs,index=tim,name=clt_subset) ts.to_csv('time_series_from_netcdf.csv') while executing this, saying: Error- File "C:\python3\WinPython\python-3.6.5.amd64\lib\site-packages\pandas\core\series.py", line 3275, in _sanitize_array raise Exception('Data must be 1-dimensional') Exception: Data must be 1-dimensional -- https://mail.python.org/mailman/listinfo/python-list