Jussi Piitulainen wrote: > Val Krem writes: > >> Hi all, >> >> I am a new learner about python (moving from R to python) and trying >> read and count the number of observation by year for each city. >> >> >> The data set look like >> city year x >> >> XC1 2001 10 >> XC1 2001 20 >> XC1 2002 20 >> XC1 2002 10 >> XC1 2002 10 >> >> Yv2 2001 10 >> Yv2 2002 20 >> Yv2 2002 20 >> Yv2 2002 10 >> Yv2 2002 10 >> >> out put will be >> >> city >> xc1 2001 2 >> xc1 2002 3 >> yv1 2001 1 >> yv2 2002 3 >> >> >> Below is my starting code >> count=0 >> fo=open("dat", "r+") >> str = fo.read(); >> print "Read String is : ", str >> >> fo.close() > > Below's some of the basics that you want to study. Also look up the csv > module in Python's standard library. You will want to learn these things > even if you end up using some sort of third-party data-frame library (I > don't know those but they exist).
With pandas: $ cat sample.txt city year x XC1 2001 10 XC1 2001 20 XC1 2002 20 XC1 2002 10 XC1 2002 10 Yv2 2001 10 Yv2 2002 20 Yv2 2002 20 Yv2 2002 10 Yv2 2002 10 $ python3 Python 3.4.3 (default, Oct 14 2015, 20:28:29) [GCC 4.8.4] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import pandas >>> table = pandas.read_csv("sample.txt", delimiter=r"\s+") >>> table city year x 0 XC1 2001 10 1 XC1 2001 20 2 XC1 2002 20 3 XC1 2002 10 4 XC1 2002 10 5 Yv2 2001 10 6 Yv2 2002 20 7 Yv2 2002 20 8 Yv2 2002 10 9 Yv2 2002 10 [10 rows x 3 columns] >>> table.groupby(["city", "year"])["x"].count() city year XC1 2001 2 2002 3 Yv2 2001 1 2002 4 dtype: int64 > from collections import Counter > > # collections.Counter is a special dictionary type for just this > counts = Counter() > > # with statement ensures closing the file > with open("dat") as fo: > # file object provides lines > next(fo) # skip header line > for line in fo: > # test requires non-empty string, but lines > # contain at least newline character so ok > if line.isspace(): continue > # .split() at whitespace, omits empty fields > city, year, x = line.split() > # collections.Counter has default 0, > # key is a tuple (city, year), parentheses omitted here > counts[city, year] += 1 > > print("city") > for city, year in sorted(counts): # iterate over keys > print(city.lower(), year, counts[city, year], sep = "\t") > > # Alternatively: > # for cy, n in sorted(counts.items()): > # city, year = cy > # print(city.lower(), year, n, sep = "\t") -- https://mail.python.org/mailman/listinfo/python-list