On Wed, Jul 30, 2014 at 5:57 PM, Vincent Davis <vinc...@vincentdavis.net> wrote: > > On Wed, Jul 30, 2014 at 6:28 PM, Vincent Davis <vinc...@vincentdavis.net> > wrote: > >> The real slow part seems to be >> for n in drugs: >> df[n] = >> df[['MED1','MED2','MED3','MED4','MED5']].isin([drugs[n]]).any(1) >> > > I was wrong, this is fast, it was selecting the columns that was slow. > using >
And that shows why profiling is important - before attempting to optimize :). > keep_col = ['PATCODE', 'PATWT', 'VDAYR', 'VMONTH', 'MED1', 'MED2', > 'MED3', 'MED4', 'MED5'] > df = df[keep_col] > > took the time down from 19sec to 2 sec. > On Wed, Jul 30, 2014 at 5:57 PM, Steven D'Aprano < steve+comp.lang.pyt...@pearwood.info> wrote: > ['a', 'b', 'c', 'd', 'e', ..., 'zzz'] > > that is, a total of 26 + 26**2 + 26**3 = 18278 items. Now suppose you > delete item 0, 'a': > > => ['b', 'c', 'd', 'e', ..., 'zzz'] > > Python has to move the remaining 18278 items across one space. Then you > delete 'b': > Really minor issue: I believe this should read 18277 items :). > => ['c', 'd', 'e', ..., 'zzz'] > > I'm not familiar with pandas and am not sure about the exact syntax > needed, but something like: > > new_df = [] # Assuming df is a list. > for col in df: > if col.value in keep_col: > new_df.append(col) > Another way to write this, using a list expression (untested): new_df = [col for col in df if col.value in keep_col] Also note that, while the code shows keep_col is fairly short, you may also see performance gains if keep_col is a set ( O(1) lookup performance) rather than a list ( O(n) lookup performance ). You would do this by using: keep_col = set(('PATCODE', 'PATWT', 'VDAY', 'VMONTH', 'VYEAR', 'MED1', 'MED2', 'MED3', 'MED4', 'MED5')) rather than your existing: keep_col = ['PATCODE', 'PATWT', 'VDAY', 'VMONTH', 'VYEAR', 'MED1', 'MED2', 'MED3', 'MED4', 'MED5'] This can apply anywhere you use the "in" operator. Note, however, that generating the set is a bit slower, so you'd want to make sure the set is made outside of a large loop. Chris
-- https://mail.python.org/mailman/listinfo/python-list