key and ..
Hi all, Sorry for asking such a basic question butI am trying to merge two files(file1 and file2) and do some stuff. Merge the two files by the first column(key). Here is the description of files and what I would like to do. file1 key c1 c2 1 759 939 2 345 154571 3 251 350711 4 3749 22159 5 676 76953 6 46756 file2 key p1p2 1 759939 2 345 154571 3 251 350711 4 3915 23254 5 7676 77953 7 256 4562 create file3 a) merge the two files by (key) that exit in file1 and file2 b) create two variables dcp1 = c1- p1 and dcp2= c2-p2 c) sort file3 by dcp2(descending) and output create file4:- which exist in file1 but not in file2 create file5:- that exist in file2 but not in file1; Desired output files file3 key c1c2 p1 p2 dcp1 dcp2 4 3749 22159 3915 23254 -166 -1095 5676 76953 7676 77953 -7000 -1000 1759939 759939 0 0 2345 154571 345 154571 0 0 3251 350711 251 350711 0 0 file4 key c1 p1 6 46 756 file5 key p1 p2 7 256 4562 Thank you in advance -- https://mail.python.org/mailman/listinfo/python-list
data frame
Hi all, #!/usr/bin/env python import sys import csv import numpy as np import pandas as pd a= pd.read_csv("s1.csv") print(a) size w1 h1 0 512 214 26 1 123 250 34 2 234 124 25 3 334 213 43 4 a45 223 32 5 a12 214 26 I wanted to create a new column by adding the two column values as follows a['test'] = a['w1'] + a['h1'] Traceback (most recent call last): File "/data/apps/Intel/intelpython35/lib/python3.5/site-packages/pandas/indexes/base.py", line 2104, in get_loc return self._engine.get_loc(key) File "pandas/index.pyx", line 139, in pandas.index.IndexEngine.get_loc (pandas/index.c:4152) File "pandas/index.pyx", line 161, in pandas.index.IndexEngine.get_loc (pandas/index.c:4016) File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13153) File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13107) KeyError: 'w1' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "tt.py", line 16, in a['test']=a['w1'] + a['h1'] File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13107) KeyError: 'w1' Can someone help me what the problem is? Thank you in advance -- https://mail.python.org/mailman/listinfo/python-list
Re: data frame
Here is the first few lines of the data s1.csv size,w1,h1 512,214,26 123,250,34 234,124,25 334,213,43 and the script a=pd.read_csv("s1.csv", skipinitialspace=True).keys() print(a) i see the following Index(['size', 'w1', 'h1'], dtype='object') when I wanted to add the two columns; then I get the following message. a=pd.read_csv("s1.csv", skipinitialspace=True).keys() a['test']=a['w1'] + a['h1'] print(a) data/apps/Intel/intelpython35/lib/python3.5/site-packages/pandas/indexes/base.py:1393: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future return getitem(key) Traceback (most recent call last): File "tt.py", line 12, in a['test']=a['w1'] + a['h1'] File "/data/apps/Intel/intelpython35/lib/python3.5/site-packages/pandas/indexes/base.py", line 1393, in __getitem__ return getitem(key) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices On Friday, December 23, 2016 3:09 PM, Peter Otten <__pete...@web.de> wrote: Val Krem via Python-list wrote: > Hi all, > > #!/usr/bin/env python > import sys > import csv > import numpy as np > import pandas as pd > > a= pd.read_csv("s1.csv") > print(a) > > size w1 h1 > 0 512 214 26 > 1 123 250 34 > 2 234 124 25 > 3 334 213 43 > 4 a45 223 32 > 5 a12 214 26 > > I wanted to create a new column by adding the two column values > as follows > > a['test'] = a['w1'] + a['h1'] > > Traceback (most recent call last): > File > "/data/apps/Intel/intelpython35/lib/python3.5/site- packages/pandas/indexes/base.py", > line 2104, in get_loc return self._engine.get_loc(key) File > "pandas/index.pyx", line 139, in pandas.index.IndexEngine.get_loc > (pandas/index.c:4152) File "pandas/index.pyx", line 161, in > pandas.index.IndexEngine.get_loc (pandas/index.c:4016) File > "pandas/src/hashtable_class_helper.pxi", line 732, in > pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13153) > File "pandas/src/hashtable_class_helper.pxi", line 740, in > pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13107) > KeyError: 'w1' > > During handling of the above exception, another exception occurred: > > Traceback (most recent call last): > File "tt.py", line 16, in > a['test']=a['w1'] + a['h1'] > > File "pandas/src/hashtable_class_helper.pxi", line 740, in > pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13107) > KeyError: 'w1' > > Can someone help me what the problem is? > > Thank you in advance Have a look at a.keys(). I suspect that the column name has extra space: >>> pd.read_csv("s1.csv").keys() Index([u'size', u' w1', u' h1'], dtype='object') I that's what you see you can fix it by reading the csv with skipinitialspace=True: >>> pd.read_csv("s1.csv", skipinitialspace=True).keys() Index([u'size', u'w1', u'h1'], dtype='object') -- https://mail.python.org/mailman/listinfo/python-list /data/apps/Intel/intelpython35/lib/python3.5/site-packages/pandas/indexes/base.py:1393: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future return getitem(key) Traceback (most recent call last): File "tt.py", line 12, in a['test']=a['w1'] + a['h1'] File "/data/apps/Intel/intelpython35/lib/python3.5/site-packages/pandas/indexes/base.py", line 1393, in __getitem__ return getitem(key) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices On Friday, December 23, 2016 3:09 PM, Peter Otten <__pete...@web.de> wrote: Val Krem via Python-list wrote: > Hi all, > > #!/usr/bin/env python > import sys > import csv > import numpy as np > import pandas as pd > > a= pd.read_csv("s1.csv") > print(a) > > size w1 h1 > 0 512 214 26 > 1 123 250 34 > 2 234 124 25 > 3 334 213 43 > 4 a45 223 32 > 5 a12 214 26 > > I wanted to create a new column by adding the two column values > as follows > > a['test'] = a['w1'] + a['h1'] > > Traceback (most recent call last): > File > "/data/apps/Intel/intelpython35/lib/python3.5/site- packages/pandas/indexes/base.py", >
Re: data frame
Thank you Peter and Christ. It is was a white space and the fix fixed it. Many thanks. On Friday, December 23, 2016 5:26 PM, Peter Otten <__pete...@web.de> wrote: Val Krem via Python-list wrote: > Here is the first few lines of the data > > > s1.csv > size,w1,h1 > 512,214,26 > 123,250,34 > 234,124,25 > 334,213,43 Did you put these lines here using copy and paste? The fix below depends on the assumption that your data is more like size, w1, h1 512, 214, 26 123, 250, 34 ... > a=pd.read_csv("s1.csv", skipinitialspace=True).keys() You should use the keys() method call for diagnosis only. The final script that might work if your problem is actually space after the commas is import pandas as pd a = pd.read_csv("s1.csv", skipinitialspace=True) a["test"] = a["h1"] + a["w1"] print(a) -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
data
Hi all, I have a sample of data set and would like to summarize in the following way. ID,class,y 1,12,10 1,12,10 1,12,20 1,13,20 1,13,10 1,13,10 1,14,20 2,21,20 2,21,20 2,21,10 2,23,10 2,23,20 2,34,20 2,34,10 2,35,10 I want get the total count by ID, and the the number of classes by ID. The y variable is either 10 or 20 and count each by iD The result should look like as follows. ID,class,count,10's,20's 1,3,7,4,3 2,4,8,4,4 I can do this in two or more steps. Is there an efficient way of doing it? I used pd.crosstab(a['ID'],a['y'],margins=True) and got ID,10's,20's all 1,4,3,7 2,4,4,8 but I want get the class count as well like as follows ID,class,10's,20's,all 1,3,4,3,7 2,4,4,4,8 how do I do it in python? thank you in advance -- https://mail.python.org/mailman/listinfo/python-list
crosstab output
Hi all, How do I access the rows and columns of a data frame crosstab output? Here is code using a sample data and output. a= pd.read_csv("cross.dat", skipinitialspace=True) xc=pd.crosstab(a['nam'],a['x1'],margins=True) print(xc) x10 1 nam A13 2 A21 4 I want to create a variable by adding 2/(3+2) for the first row(A1) and 4/(1+4) for the second row (A2) Final data frame would be A1 3 2 0.4 A2 1 4 0.8 Thank you in advance -- https://mail.python.org/mailman/listinfo/python-list
Read and count
Hi all, I am a new learner about python (moving from R to python) and trying read and count the number of observation by year for each city. The data set look like city year x XC1 2001 10 XC1 2001 20 XC1 2002 20 XC1 2002 10 XC1 2002 10 Yv2 2001 10 Yv2 2002 20 Yv2 2002 20 Yv2 2002 10 Yv2 2002 10 out put will be city xc1 2001 2 xc1 2002 3 yv1 2001 1 yv2 2002 3 Below is my starting code count=0 fo=open("dat", "r+") str = fo.read(); print "Read String is : ", str fo.close() Many thanks -- https://mail.python.org/mailman/listinfo/python-list
Re: Read and count
Thank you very much for the help. First I want count by city and year. City year count Xc1.2001. 1 Xc1.2002. 3 Yv1. 2001. 1 Yv2.2002. 4 This worked fine ! Now I want to count by city only City. Count Xc1. 4 Yv2. 5 Then combine these two objects with the original data and send it to a file called "detout" with these columns: "City", " year ", "x ", "cycount ", "citycount" Many thanks again This worked fine. I tried to count only by city and combine the three objects together City Xc1 4 Yv2 5 Sent from my iPad > On Mar 10, 2016, at 3:11 AM, Jussi Piitulainen > wrote: > > Val Krem writes: > >> Hi all, >> >> I am a new learner about python (moving from R to python) and trying >> read and count the number of observation by year for each city. >> >> >> The data set look like >> city year x >> >> XC1 2001 10 >> XC1 2001 20 >> XC1 2002 20 >> XC1 2002 10 >> XC1 2002 10 >> >> Yv2 2001 10 >> Yv2 2002 20 >> Yv2 2002 20 >> Yv2 2002 10 >> Yv2 2002 10 >> >> out put will be >> >> city >> xc1 2001 2 >> xc1 2002 3 >> yv1 2001 1 >> yv2 2002 3 >> >> >> Below is my starting code >> count=0 >> fo=open("dat", "r+") >> str = fo.read(); >> print "Read String is : ", str >> >> fo.close() > > Below's some of the basics that you want to study. Also look up the csv > module in Python's standard library. You will want to learn these things > even if you end up using some sort of third-party data-frame library (I > don't know those but they exist). > > from collections import Counter > > # collections.Counter is a special dictionary type for just this > counts = Counter() > > # with statement ensures closing the file > with open("dat") as fo: ># file object provides lines >next(fo) # skip header line >for line in fo: ># test requires non-empty string, but lines ># contain at least newline character so ok >if line.isspace(): continue ># .split() at whitespace, omits empty fields >city, year, x = line.split() ># collections.Counter has default 0, ># key is a tuple (city, year), parentheses omitted here >counts[city, year] += 1 > > print("city") > for city, year in sorted(counts): # iterate over keys >print(city.lower(), year, counts[city, year], sep = "\t") > > # Alternatively: > # for cy, n in sorted(counts.items()): > # city, year = cy > # print(city.lower(), year, n, sep = "\t") > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Different sources of file
Hi all, I am made a little progress on using python. I have five files to read from different sources and concatenate them to one file. From each file I want only to pick few column (x1, x2 and x3). However, these columns say x3 was a date in one file it was recorded as a character (2015/12/26) and in the other file it was records (20151226) and in the other file it was recorded as (26122015). How do I standardized these into one form (mmdd-20151126). If there is no date then delete that record 2. The other variable x2. In one of the one files it was recorded as "M" and "F". In the other file x3 is 1 for male and 2 for female. So I want to change all to 1 or 2. if this variable is out of range M / F or 1 or 2 then delete that record 3. After doing all these I want combine all files into one and send it to output. Finally, do some statistics such as number of records read from each file. Distribution of sex and total number of records sent out to a file. Below is my attempt but not great #!/usr/bin/python import sys import csv from collections import Counter N=10 count=0 with open("file1") as f1: for line in f1: count+=1 print("Total Number of records read", count) # I want to see the first few lines of the data file1Name x2x3 Alex1 F 2015/02/11 Alex2 M 2012/01/27 Alex3 F 2011/10/20 Alex4 M . Alex5 N 2003/11/14 file2 Name x2x3 Bob1 1 2010-02-10 Bob2 2 2001-01-07 Bob3 1 2002-10-21 Bob4 2 2004-11-17 bob5 0 2009-11-19 file2 Namex2x3 Alexa1 0 12102013 Alexa2 2 20012007 Alexa3 1 11052002 Alexa4 2 26112004 Alexa5 2 15072009 Output to a file Name x2 x3 Alex1 2 20150211 Alex2 1 20120127 Alex3 2 20111020 Bob11 20100210 Bob22 20010107 Bob31 20021021 Bob42 20041117 Alexa2 2 20070120 Alexa3 1 20020511 Alexa4 2 20041126 Alexa5 2 20090715 -- https://mail.python.org/mailman/listinfo/python-list
file -SAS
Hi all, I am trying to read sas7bdat file using the following from sas7bdat import SAS7BDAT with SAS7BDAT('test.sas7bdat') as f: for row in f: print row ### I want print the first 10 row. how can I do that? I got error message of from sas7bdat import SAS7BDAT ImportError: No module named sas7bdat What did I miss? Val -- https://mail.python.org/mailman/listinfo/python-list
course
Hi all, Is there on line course in Python? I am looking for a level between beginner and intermediate. I would appreciate if you could suggest me? Thank you. -- https://mail.python.org/mailman/listinfo/python-list