py_genetic wrote: > I have an H5 file with one group (off the root) and two large main > tables and I'm attempting to aggragate my data into 50+ new groups (off > the root) with two tables per sub group. > > sys info: > PyTables version: 1.3.2 > HDF5 version: 1.6.5 > numarray version: 1.5.0 > Zlib version: 1.2.3 > BZIP2 version: 1.0.3 (15-Feb-2005) > Python version: 2.4.2 (#1, Jul 13 2006, 20:16:08) > [GCC 4.0.1 (Apple Computer, Inc. build 5250)] > Platform: darwin-Power Macintosh (v10.4.7) > Byte-ordering: big > > Ran all pytables tests included with package and recieved an OK. > > > Using the following code I get one of three errors: > > 1. Illegal Instruction > > 2. Malloc(): trying to call free() twice > > 3. Bus Error > > I believe all three stem from the same issue, involving a malloc() > memory problem in the pytable c libraries. I also believe this may be > due to how I'm attempting to write my sorting script. > > The script executes fine and all goes well until I'm sorting about > group 20 to 30 and I throw one of the three above errors depending on > how/when I'm flush() close() the file. When I open the file after the > error using h5ls all tables are in perfact order up to the crash and if > I continue from the point every thing runs fine until python throws the > same error again after another 10 sorts or so. The somewhat random > crashing is what leads me to believe I have a memory leak or my method > of doing this is incorrect. > > Is there a better way to aggragate data using pytables/python? Is there > a better way to be doing this? This seems strait forward enough. > > Thanks, > Conor > > #function to agg state data from main neg/pos tables into neg/pos state > tables > > import string > import tables > > > def aggstate(state, h5file): > > print state > > class PosRecords(tables.IsDescription): > sic = tables.IntCol(0, 1, 4, 0, None, 0) > numsic = tables.IntCol(0, 1, 4, 0, None, 0) > empsiz = tables.StringCol(1, '?', 1, None, 0) > salvol = tables.StringCol(1, '?', 1, None, 0) > popcod = tables.StringCol(1, '?', 1, None, 0) > state = tables.StringCol(2, '?', 1, None, 0) > zip = tables.IntCol(0, 1, 4, 0, None, 1) > > class NegRecords(tables.IsDescription): > sic = tables.IntCol(0, 1, 4, 0, None, 0) > numsic = tables.IntCol(0, 1, 4, 0, None, 0) > empsiz = tables.StringCol(1, '?', 1, None, 0) > salvol = tables.StringCol(1, '?', 1, None, 0) > popcod = tables.StringCol(1, '?', 1, None, 0) > state = tables.StringCol(2, '?', 1, None, 0) > zip = tables.IntCol(0, 1, 4, 0, None, 1) > > > > group1 = h5file.createGroup("/", state+"_raw_records", state+" raw > records") > > table1 = h5file.createTable(group1, "pos_records", PosRecords, state+" > raw pos record table") > table2 = h5file.createTable(group1, "neg_records", NegRecords, state+" > raw neg record table") > > table = h5file.root.raw_records.pos_records > point = table1.row > for x in table.iterrows(): > if x['state'] == state: > point['sic'] = x['sic'] > point['numsic'] = x['numsic'] > point['empsiz'] = x['empsiz'] > point['salvol'] = x['salvol'] > point['popcod'] = x['popcod'] > point['state'] = x['state'] > point['zip'] = x['zip'] > > point.append() > > h5file.flush() > > table = h5file.root.raw_records.neg_records > point = table2.row > for x in table.iterrows(): > if x['state'] == state: > point['sic'] = x['sic'] > point['numsic'] = x['numsic'] > point['empsiz'] = x['empsiz'] > point['salvol'] = x['salvol'] > point['popcod'] = x['popcod'] > point['state'] = x['state'] > point['zip'] = x['zip'] > > point.append() > > > h5file.flush() > > > > states = > ['AL','AK','AZ','AR','CA','CO','CT','DC','DE','FL','GA','HI','ID','IL','IN','IA','KS','KY','LA','ME','MD','MA','MI','MN','MS','MO','MT','NE','NV','NH','NJ','NM','NY','NC','ND','OH','OK','OR','PA','RI','SC','SD','TN','TX','UT','VT','VA','WA','WV','WI','WY'] > > h5file = tables.openFile("200309_data.h5", mode = 'a') > > for i in xrange(len(states)): > aggstate(states[i], h5file) > > h5file.close()
The problem with my above posting is that h5file.flush() should be table.flush() (flush the table not the whole object) although h5file.flush() is an actual method I don't believe it correctly writes to the tables, it causes all types of issues as time goes on and I think overlaps .close() causing more issues. I also flushed the table1 and table2 after I created the new group and table1 and table2 each iteration, things are stable now, pytables is great. -- http://mail.python.org/mailman/listinfo/python-list