key and ..

2016-11-17 Thread Val Krem via Python-list


Hi all,
Sorry for asking such a basic question butI am trying  to merge two files(file1 
and file2) and do some stuff. Merge the two files by the first column(key). 
Here is the description of files and what I would like to do.


file1

key c1   c2
1  759   939
2  345 154571
3  251 350711
4 3749  22159
5  676  76953
6   46756


file2
key  p1p2
1   759939
2   345 154571
3   251 350711
4  3915  23254
5  7676  77953
7   256   4562

create file3
a) merge the two files by (key) that exit in  file1 and file2
b) create two variables dcp1 = c1- p1 and dcp2= c2-p2
c) sort file3 by dcp2(descending) and output

create file4:-  which exist in file1 but not in file2
create file5:-  that exist in file2 but not in file1;


Desired output files

file3
key   c1c2 p1  p2 dcp1   dcp2
4   3749  22159  3915  23254  -166  -1095
5676  76953  7676  77953 -7000  -1000
1759939   759939 0  0
2345 154571   345 154571 0  0
3251 350711   251 350711 0  0

file4
key c1   p1
6   46   756

file5
key p1   p2
7  256  4562



Thank you in advance
-- 
https://mail.python.org/mailman/listinfo/python-list


data frame

2016-12-23 Thread Val Krem via Python-list
Hi all,

#!/usr/bin/env python
import sys
import csv
import numpy as np
import pandas as  pd

a= pd.read_csv("s1.csv")
print(a)

 size   w1   h1
0  512  214   26
1  123  250   34
2  234  124   25
3  334  213   43
4  a45  223   32
5  a12  214   26

I wanted to create a new column by adding the two column values 
as follows

a['test'] = a['w1'] + a['h1']

Traceback (most recent call last):
File 
"/data/apps/Intel/intelpython35/lib/python3.5/site-packages/pandas/indexes/base.py",
 line 2104, in get_loc
return self._engine.get_loc(key)
File "pandas/index.pyx", line 139, in pandas.index.IndexEngine.get_loc 
(pandas/index.c:4152)
File "pandas/index.pyx", line 161, in pandas.index.IndexEngine.get_loc 
(pandas/index.c:4016)
File "pandas/src/hashtable_class_helper.pxi", line 732, in 
pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13153)
File "pandas/src/hashtable_class_helper.pxi", line 740, in 
pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13107)
KeyError: 'w1'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "tt.py", line 16, in 
a['test']=a['w1'] + a['h1']

File "pandas/src/hashtable_class_helper.pxi", line 740, in 
pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13107)
KeyError: 'w1'

Can someone help me what the problem is?

Thank you in advance
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: data frame

2016-12-23 Thread Val Krem via Python-list
Here is the first few lines of the data


s1.csv 
size,w1,h1
512,214,26
123,250,34
234,124,25
334,213,43

and the script

a=pd.read_csv("s1.csv", skipinitialspace=True).keys()
print(a)
i see the following

Index(['size', 'w1', 'h1'], dtype='object')



when I wanted to add the two columns; then I get the following message.

a=pd.read_csv("s1.csv", skipinitialspace=True).keys()
a['test']=a['w1'] + a['h1']
print(a)




data/apps/Intel/intelpython35/lib/python3.5/site-packages/pandas/indexes/base.py:1393:
 VisibleDeprecationWarning: using a non-integer number instead of an integer 
will result in an error in the future
return getitem(key)
Traceback (most recent call last):
File "tt.py", line 12, in 
a['test']=a['w1'] + a['h1']
File 
"/data/apps/Intel/intelpython35/lib/python3.5/site-packages/pandas/indexes/base.py",
 line 1393, in __getitem__
return getitem(key)
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis 
(`None`) and integer or boolean arrays are valid indices



On Friday, December 23, 2016 3:09 PM, Peter Otten <__pete...@web.de> wrote:
Val Krem via Python-list wrote:

> Hi all,
>
> #!/usr/bin/env python
> import sys
> import csv
> import numpy as np
> import pandas as  pd
>
> a= pd.read_csv("s1.csv")
> print(a)
>
>  size  w1  h1
> 0  512  214  26
> 1  123  250  34
> 2  234  124  25
> 3  334  213  43
> 4  a45  223  32
> 5  a12  214  26
>
> I wanted to create a new column by adding the two column values
> as follows
>
> a['test'] = a['w1'] + a['h1']
>
> Traceback (most recent call last):
> File
> "/data/apps/Intel/intelpython35/lib/python3.5/site-
packages/pandas/indexes/base.py",
> line 2104, in get_loc return self._engine.get_loc(key) File
> "pandas/index.pyx", line 139, in pandas.index.IndexEngine.get_loc
> (pandas/index.c:4152) File "pandas/index.pyx", line 161, in
> pandas.index.IndexEngine.get_loc (pandas/index.c:4016) File
> "pandas/src/hashtable_class_helper.pxi", line 732, in
> pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13153)
> File "pandas/src/hashtable_class_helper.pxi", line 740, in
> pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13107)
> KeyError: 'w1'
>
> During handling of the above exception, another exception occurred:
>
> Traceback (most recent call last):
> File "tt.py", line 16, in 
> a['test']=a['w1'] + a['h1']
>
> File "pandas/src/hashtable_class_helper.pxi", line 740, in
> pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13107)
> KeyError: 'w1'
>
> Can someone help me what the problem is?
>
> Thank you in advance

Have a look at a.keys(). I suspect that the column name has extra space:

>>> pd.read_csv("s1.csv").keys()
Index([u'size', u' w1', u' h1'], dtype='object')

I that's what you see you can fix it by reading the csv with
skipinitialspace=True:

>>> pd.read_csv("s1.csv", skipinitialspace=True).keys()
Index([u'size', u'w1', u'h1'], dtype='object')


-- 
https://mail.python.org/mailman/listinfo/python-list


/data/apps/Intel/intelpython35/lib/python3.5/site-packages/pandas/indexes/base.py:1393:
 VisibleDeprecationWarning: using a non-integer number instead of an integer 
will result in an error in the future
return getitem(key)
Traceback (most recent call last):
File "tt.py", line 12, in 
a['test']=a['w1'] + a['h1']
File 
"/data/apps/Intel/intelpython35/lib/python3.5/site-packages/pandas/indexes/base.py",
 line 1393, in __getitem__
return getitem(key)
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis 
(`None`) and integer or boolean arrays are valid indices







On Friday, December 23, 2016 3:09 PM, Peter Otten <__pete...@web.de> wrote:
Val Krem via Python-list wrote:

> Hi all,
> 
> #!/usr/bin/env python
> import sys
> import csv
> import numpy as np
> import pandas as  pd
> 
> a= pd.read_csv("s1.csv")
> print(a)
> 
>  size   w1   h1
> 0  512  214   26
> 1  123  250   34
> 2  234  124   25
> 3  334  213   43
> 4  a45  223   32
> 5  a12  214   26
> 
> I wanted to create a new column by adding the two column values
> as follows
> 
> a['test'] = a['w1'] + a['h1']
> 
> Traceback (most recent call last):
> File
> "/data/apps/Intel/intelpython35/lib/python3.5/site-
packages/pandas/indexes/base.py",
>

Re: data frame

2016-12-24 Thread Val Krem via Python-list
Thank you Peter and  Christ.
It is was a white space and the fix fixed it.
Many thanks.




On Friday, December 23, 2016 5:26 PM, Peter Otten <__pete...@web.de> wrote:
Val Krem via Python-list wrote:

> Here is the first few lines of the data
> 
> 
> s1.csv
> size,w1,h1
> 512,214,26
> 123,250,34
> 234,124,25
> 334,213,43

Did you put these lines here using copy and paste? The fix below depends on 
the assumption that your data is more like

size, w1, h1
512, 214, 26
123, 250, 34
...

> a=pd.read_csv("s1.csv", skipinitialspace=True).keys()

You should use the keys() method call for diagnosis only. The final script 
that might work if your problem is actually space after the commas is

import pandas as  pd

a = pd.read_csv("s1.csv", skipinitialspace=True)
a["test"] = a["h1"] + a["w1"]
print(a)


-- 
https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list


data

2016-12-29 Thread Val Krem via Python-list
Hi all,
I have a sample of data set and would  like to summarize in the following way.


ID,class,y
1,12,10
1,12,10
1,12,20
1,13,20
1,13,10
1,13,10
1,14,20
2,21,20
2,21,20
2,21,10
2,23,10
2,23,20
2,34,20
2,34,10
2,35,10

I want  get  the total count by ID, and the  the number of classes
by ID. The y variable is either 10 or 20 and count each by iD

The result should look like as follows.

ID,class,count,10's,20's
1,3,7,4,3
2,4,8,4,4

I can do this in  two or more steps. Is there an efficient way of doing it?


I used 

pd.crosstab(a['ID'],a['y'],margins=True)
and got

ID,10's,20's all
1,4,3,7
2,4,4,8

but I want get the class count as well like as follows

ID,class,10's,20's,all
1,3,4,3,7
2,4,4,4,8

how do I do it in python?
thank you in advance
-- 
https://mail.python.org/mailman/listinfo/python-list


crosstab output

2017-01-06 Thread Val Krem via Python-list
Hi all,

How do I access the rows and columns of a data frame crosstab output?


Here is code using  a sample data and output.

a= pd.read_csv("cross.dat", skipinitialspace=True)
xc=pd.crosstab(a['nam'],a['x1'],margins=True)

print(xc)

x10  1 
nam 
A13  2 
A21  4

I want to create a variable  by adding 2/(3+2) for the first row(A1)
and 4/(1+4) for the second row (A2)

Final data frame would be
A1 3 2  0.4
A2 1 4  0.8

Thank you in advance
-- 
https://mail.python.org/mailman/listinfo/python-list


Read and count

2016-03-10 Thread Val Krem via Python-list
Hi all,

I am a new learner about python (moving from R to python) and trying  read and 
count the number of observation  by year for each city.


The data set look like
city year  x 

XC1 2001  10
XC1   2001  20
XC1   2002   20
XC1   2002   10
XC1 2002   10

Yv2 2001   10
Yv2 2002   20
Yv2 2002   20
Yv2 2002   10
Yv2 2002   10

out put will be

city
xc1  2001  2
xc1   2002  3
yv1  2001  1
yv2  2002  3


Below is my starting code
count=0
fo=open("dat", "r+")
str = fo.read();
print "Read String is : ", str

fo.close()


Many thanks
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Read and count

2016-03-10 Thread Val Krem via Python-list
Thank you very much for the help.

First I want count by city and year. 
City year count
Xc1.2001.  1
Xc1.2002.  3
Yv1. 2001.  1
Yv2.2002.  4
This worked fine !

Now I want to count by city only
City. Count
Xc1.   4
Yv2.  5

Then combine these two objects with the original data and send it to a file 
called  "detout" with these columns:

"City", " year ", "x ", "cycount ", "citycount"

Many thanks again






This worked fine. I tried to count only by city  and combine the three objects 
together 

City
Xc1  4
Yv2  5



Sent from my iPad 

> On Mar 10, 2016, at 3:11 AM, Jussi Piitulainen 
>  wrote:
> 
> Val Krem writes:
> 
>> Hi all,
>> 
>> I am a new learner about python (moving from R to python) and trying
>> read and count the number of observation by year for each city.
>> 
>> 
>> The data set look like
>> city year  x 
>> 
>> XC1 2001  10
>> XC1   2001  20
>> XC1   2002   20
>> XC1   2002   10
>> XC1 2002   10
>> 
>> Yv2 2001   10
>> Yv2 2002   20
>> Yv2 2002   20
>> Yv2 2002   10
>> Yv2 2002   10
>> 
>> out put will be
>> 
>> city
>> xc1  2001  2
>> xc1   2002  3
>> yv1  2001  1
>> yv2  2002  3
>> 
>> 
>> Below is my starting code
>> count=0
>> fo=open("dat", "r+")
>> str = fo.read();
>> print "Read String is : ", str
>> 
>> fo.close()
> 
> Below's some of the basics that you want to study. Also look up the csv
> module in Python's standard library. You will want to learn these things
> even if you end up using some sort of third-party data-frame library (I
> don't know those but they exist).
> 
> from collections import Counter
> 
> # collections.Counter is a special dictionary type for just this
> counts = Counter()
> 
> # with statement ensures closing the file
> with open("dat") as fo:
># file object provides lines
>next(fo) # skip header line
>for line in fo:
># test requires non-empty string, but lines
># contain at least newline character so ok
>if line.isspace(): continue
># .split() at whitespace, omits empty fields
>city, year, x = line.split()
># collections.Counter has default 0,
># key is a tuple (city, year), parentheses omitted here
>counts[city, year] += 1
> 
> print("city")
> for city, year in sorted(counts): # iterate over keys
>print(city.lower(), year, counts[city, year], sep = "\t")
> 
> # Alternatively:
> # for cy, n in sorted(counts.items()):
> #   city, year = cy
> #   print(city.lower(), year, n, sep = "\t")
> -- 
> https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list


Different sources of file

2016-03-14 Thread Val Krem via Python-list


Hi all,



I am made a little progress on using python.
I have five files to read from different sources  and concatenate them to one 
file.  From each file I want only to pick  few column  (x1, x2  and  x3). 
However, these columns say x3 was  a date in one file it was recorded as a 
character  (2015/12/26)  and in the other file it was records  (20151226)  and 
in the other file it was recorded as  (26122015). How do I standardized these 
into one form (mmdd-20151126). If there is no date then delete that record

2. The other variable x2. In one of the one files it was recorded as  "M" and 
"F". In the other  file  x3  is  1 for male and 2 for female.  So I want to 
change  all to 1 or 2. if this variable is out of range M / F or 1 or 2 then 
delete that record

3.  After doing all these I want combine all files into one  and send it to 
output. 

Finally, do some statistics  such as number of records read from each file. 
Distribution of sex  and total number of records sent out to a file.

Below is my attempt but not great
#!/usr/bin/python
import sys
import csv
from collections import Counter

N=10
count=0
with open("file1") as f1:
for line in f1:
count+=1
print("Total Number of records read", count)
# I want to see the first few lines of the data


file1Name   x2x3
Alex1  F   2015/02/11
Alex2  M   2012/01/27
Alex3  F   2011/10/20
Alex4  M   .
Alex5  N   2003/11/14

file2
Name  x2x3
Bob1  1   2010-02-10
Bob2  2   2001-01-07
Bob3  1   2002-10-21
Bob4  2   2004-11-17
bob5  0   2009-11-19

file2
Namex2x3
Alexa1  0   12102013
Alexa2  2   20012007
Alexa3  1   11052002
Alexa4  2   26112004
Alexa5  2   15072009

Output to a file 
Name x2  x3
Alex1   2   20150211
Alex2   1   20120127
Alex3   2   20111020
Bob11   20100210
Bob22   20010107
Bob31   20021021
Bob42   20041117
Alexa2  2   20070120
Alexa3  1   20020511
Alexa4  2   20041126
Alexa5  2   20090715
-- 
https://mail.python.org/mailman/listinfo/python-list


file -SAS

2016-03-19 Thread Val Krem via Python-list
Hi all,

I am trying to read sas7bdat file using the following



from sas7bdat import SAS7BDAT

with SAS7BDAT('test.sas7bdat') as f:
for row in f:
 print row   ### I want print the first 10 row. how can I do that?


I got error message of 


from sas7bdat import SAS7BDAT
ImportError: No module named sas7bdat

What did I miss?
Val
-- 
https://mail.python.org/mailman/listinfo/python-list


course

2017-06-19 Thread Val Krem via Python-list
Hi all,

Is there  on line course in Python? I am looking for a level between beginner 
and intermediate. I would appreciate if you could  suggest me?

Thank you.
-- 
https://mail.python.org/mailman/listinfo/python-list