Hello Folks, I started using Pandas and am running into some issues while using it.
Primarily my question is how to: - Set primary key or index in dataframe - How to construct join of two dataframes More precisely: 1. Doing sql queries on dataframe takes lot more time than what I would expect (same query on excel filter is much faster) I am setting the primary key as import pandas as pd df_oi = df_o.set_index(['c_br_code', 'n_srno', 'c_item_code']) where, df_o = pd.read_csv('order_c.csv', encoding='latin1') and values in red are the columns and their combination is the primary key. similary I have another csv file for which I construct df_gi and then I do a join query like this: sq="select * from df_oi join df_gi on df_oi.c_br_code = df_gi.c_order_br_code and df_oi.n_srno = df_gi.n_order_no and df_oi.c_item_code = df_gi.c_item_code" But this query never ends. It takes 5 GB of memory where as the two csv files are 250 and 350 MB respectively. Also if I do any select query on df_oi it still takes lot of time (much more than excel filters). So I am sure I am missing something there. Can you please help. Thanks _______________________________________________ BangPypers mailing list BangPypers@python.org https://mail.python.org/mailman/listinfo/bangpypers