Taking two and more data frames and extracting data on unique keys in python

Danish Hussain Sun, 05 Jun 2016 04:07:03 -0700

1
down vote
favorite
Its a bit long question have patience. firstly I have 2 data frames one in 
which i have name of a guy and pages liked by him in columns. So no. of columns 
will be different for different person here is the example. 1st column is the 
name of user.Then pages liked by him is stored across the row.So no. of columns 
for 'random guy' will be different from 'mank rion'. 'BlackBick' , '500 
Startups' e.t.c are name of the page. let say name of this data frame is 
User_page


random guy      BlackBuck            GiveMeSport    Analytics Ninja 
mank nion       DJ CHETAS            Celina Jaitly  Gurkeerat Singh
pop rajuel      WOW Editions         500 Startups   Biswapati Sarkar
Roshan ghai     MensXP               No Abuse       the smartian 
Now I have another Data frame in which is kind of same as upper one but in the 
place of page's name there is a category of page.you might now there are 
different category of pages on fb. so let say 'BlacBuck''s category is 
'Transport/Freight'. There are pages with same name and different category.That 
is why i cant use name directly as key this is how my data frame looks like.Let 
say name of this data frame User_category.

random guy      Transport/Freight    Sport      Insurance Company 
mank nion       Arts/Entertainment   Actress    Actor/Director
pop rajuel      Concert Tour         App Page   Actor/Director
Roshan ghai     News/Media Website   Community  Public Figure  
Now I have two more Data frames. one in which I have name of fb pages as 1st 
column and 162 more columns with some tag for each page there is value 1 for 
i*j element if ith page comes in to jth tag otherwise left empty so it will 
look like.let say name of this dataframe is Page_tag

    name of page              tag 1        tag2        tag3
    BlackBuck                     1          1             
    GiveMeSport                   1                      1
    Analytics Ninja               1                      1
    DJ CHETAS                                1           1
the another one have name of categories as 1st column and same 162 as further. 
like this. let say name of this dataframe is Category_tag.

   category_name              tag 1        tag2        tag3
    Sport                                     1           1
    App Page                      1                       1
    Actor/Director                1                                        
    Public Figure                         1               1
Now what I have to get the tag counts for each user from pages he has liked. 
for that first I have to first check that the page which he has liked where 
exist in data frame of Page_tag which is 3rd dataframe in my question if it 
exist there take the counts of tags that how many times a specific tags 
appeared for that user.this is first step if not found the name of page as no. 
of pages in Page_tag dataframe(3rd one) is limited. I will go to category of 
page (from 2nd dataframe in this question) for the pages which are left out and 
for that category i will count the tags count for the specific user from 
dataframe named Category_tags(4th dataframe in this question) and sum the tag 
count and my output something like this. Output

username             tag1                   tag2           tag3 
random guy              1                      2             2 
mank nion               2                      1             3
pop rajuel              4                      0             2 
Roshan ghai             0                      2             1
a i*j element on this dataframe shows no. times that the jth tag appears for 
ith user. I have written code for this and more in R i am stuck in this 
particular step. The code of R wasnt optimal as i used loops many time. I 
wanted to rhis optimally, hopefully can be done in pandas. Please me know if 
clarification is needed. Any help will be appreciated. Thank you.
-- 
https://mail.python.org/mailman/listinfo/python-list

Taking two and more data frames and extracting data on unique keys in python

Reply via email to