On Wed, Jul 30, 2014 at 5:57 PM, Skip Montanaro
wrote:
> > df = pd.read_csv('nhamcsopd2010.csv' , index_col='PATCODE',
> low_memory=False)
> > col_init = list(df.columns.values)
> > keep_col = ['PATCODE', 'PATWT', 'VDAY', 'VMONTH', 'VYEAR', 'MED1',
> 'MED2', 'MED3', 'MED4', 'MED5']
> > for col in
On Thursday, July 31, 2014 7:58:59 AM UTC+5:30, Skip Montanaro wrote:
> As I am learning (often painfully) with pandas and JavaScript+(d3 or
> jQuery), loops are the enemy. You want to operate on large chunks of
> data simultaneously. In pandas, those chunks are thinly disguised
> numpy arrays. In
On Wed, Jul 30, 2014 at 8:11 PM, Chris Kaynor wrote:
> Another way to write this, using a list expression (untested):
> new_df = [col for col in df if col.value in keep_col]
As I am learning (often painfully) with pandas and JavaScript+(d3 or
jQuery), loops are the enemy. You want to operate on l
(Now that I'm on a real keyboard, more complete responses are a bit easier.)
Regarding the issue of missing columns from keep_col, you could create
sets of what you have and what you want, and toss the rest:
toss_these = list(set(df.columns) - set(keep_col))
del df[toss_these]
Or something to th
On Wed, 30 Jul 2014 18:57:15 -0600, Vincent Davis wrote:
> On Wed, Jul 30, 2014 at 6:28 PM, Vincent Davis
> wrote:
>
>> The real slow part seems to be
>> for n in drugs:
>> df[n] =
>> df[['MED1','MED2','MED3','MED4','MED5']].isin([drugs[n]]).any(1)
>>
>>
> I was wrong, this is fast, it was
On Wed, Jul 30, 2014 at 5:57 PM, Vincent Davis
wrote:
>
> On Wed, Jul 30, 2014 at 6:28 PM, Vincent Davis
> wrote:
>
>> The real slow part seems to be
>> for n in drugs:
>> df[n] =
>> df[['MED1','MED2','MED3','MED4','MED5']].isin([drugs[n]]).any(1)
>>
>
> I was wrong, this is fast, it was sel
On Wed, 30 Jul 2014 17:04:04 -0600, Vincent Davis wrote:
> I know this is a general python list and I am asking about pandas but
> this question is probably not great for asking on stackoverflow. I have
> a list of files (~80 files, ~30,000 rows) I need to process with my
> current code it is take
On Wed, Jul 30, 2014 at 6:28 PM, Vincent Davis
wrote:
> The real slow part seems to be
> for n in drugs:
> df[n] =
> df[['MED1','MED2','MED3','MED4','MED5']].isin([drugs[n]]).any(1)
>
I was wrong, this is fast, it was selecting the columns that was slow.
using
keep_col = ['PATCODE', 'PATWT'
> df = pd.read_csv('nhamcsopd2010.csv' , index_col='PATCODE',
low_memory=False)
> col_init = list(df.columns.values)
> keep_col = ['PATCODE', 'PATWT', 'VDAY', 'VMONTH', 'VYEAR', 'MED1',
'MED2', 'MED3', 'MED4', 'MED5']
> for col in col_init:
> if col not in keep_col:
> del df[col]
I'm n
On 31/07/2014 00:04, Vincent Davis wrote:
I know this is a general python list and I am asking about pandas but
this question is probably not great for asking on stackoverflow.
I have a list of files (~80 files, ~30,000 rows) I need to process with
my current code it is take minutes for each file
I know this is a general python list and I am asking about pandas but this
question is probably not great for asking on stackoverflow.
I have a list of files (~80 files, ~30,000 rows) I need to process with my
current code it is take minutes for each file. Any suggestions of a fast
way. I am try to
11 matches
Mail list logo