Hi Hemant, the solution is really quite similar, and the logic is identical:
library(readr) library(dplyr) library(stringr) library(tidyr) data_help <- read_csv("data_help.csv") cat_help <- read_csv("cat_help.csv") # Helper function to split the Items and create a data_frame split_items <- function(items){ x <- items$Items_purchased_on_Receipts %>% str_split(pattern = ",") %>% unlist(use.names = FALSE) data_frame(Item = x, Purchase_ID = items$Purchase_ID) } data_help <- data_help %>% mutate(Purchase_ID = 1:n()) %>% group_by(Purchase_ID) %>% do(split_items(.)) cat_help %>% gather("Foo", "Item") %>% filter(!is.na(Item)) %>% left_join(data_help, by = "Item") %>% group_by(Foo, Purchase_ID) %>% summarise(Item = paste(Item, collapse = ", ")) %>% spread(key = "Foo", value = "Item") HTH Ulrik On Wed, 30 Aug 2017 at 13:22 Hemant Sain <hemantsai...@gmail.com> wrote: > by using these two tables we have to create third table in this format > where categories will be on the top and transaction will be in the rows, > > On 30 August 2017 at 16:42, Hemant Sain <hemantsai...@gmail.com> wrote: > >> Hello Ulrik, >> Can you please once check this code again on the following data set >> because it doesn't giving same output to me due to absence of quantity,a >> compare to previous demo data set becaue spiting is getting done on the >> basis of quantity and in real data set quantity is missing. so please use >> following data set and help me out please consider this mail is my final >> email i won't bother you again but its about my job please help me >> . >> >> Note* the file I'm attaching is very confidential >> >> On 30 August 2017 at 15:02, Ulrik Stervbo <ulrik.ster...@gmail.com> >> wrote: >> >>> Hi Hemant, >>> >>> Does this help you along? >>> >>> table_1 <- textConnection("Item_1;Item_2;Item_3 >>> 1KG banana;300ML milk;1kg sugar >>> 2Large Corona_Beer;2pack Fries; >>> 2 Lux_Soap;1kg sugar;") >>> >>> table_1 <- read.csv(table_1, sep = ";", na.strings = "", >>> stringsAsFactors = FALSE, check.names = FALSE) >>> >>> table_2 <- >>> textConnection("Toiletries;Fruits;Beverages;Snacks;Vegetables;Clothings;Dairy >>> Products >>> Soap;banana;Corona_Beer;King Burger;Pumpkin;Adidas Sport Tshirt XL;milk >>> Shampoo;Mango;Red Label Whisky;Fries;Potato;Nike Shorts Black L;Butter >>> Showergel;Oranges;grey Cocktail;cheese pizza;Tomato;Puma Jersy red >>> M;sugar >>> Lux_Soap;;2 Large corona Beer;;Cheese;Toothpaste") >>> >>> table_2 <- read.csv(table_2, sep = ";", na.strings = "", >>> stringsAsFactors = FALSE, check.names = FALSE) >>> >>> library(tidyr) >>> library(dplyr) >>> >>> table_2 <- gather(table_2, "Category", "Item") >>> >>> table_1 <- gather(table_1, "Foo", "Item") %>% >>> filter(!is.na(Item)) >>> >>> table_1 <- separate(table_1, col = "Item", into = c("Quantity", "Item"), >>> sep = " ") >>> >>> table_3 <- left_join(table_1, table_2, by = "Item") %>% >>> mutate(Item = paste(Quantity, Item)) %>% >>> select(-Quantity) >>> >>> table_3 %>% >>> group_by(Foo, Category) %>% >>> summarise(Item = paste(Item, collapse = ", ")) %>% >>> spread(key = "Category", value = "Item") >>> >>> You need to figure out how to handle words written with different cases >>> and how to get the quantity in an universal way. For the code above, I >>> corrected these things by hand in the example data. >>> >>> HTH >>> Ulrik >>> >>> On Wed, 30 Aug 2017 at 10:16 Hemant Sain <hemantsai...@gmail.com> wrote: >>> >>>> Hey PIKAL, >>>> It's not a homework neithe that is the real dataset i have signer NDA >>>> for >>>> my company so that i can share the original data file, Actually I'm >>>> working >>>> on a market basket analysis task but not able to convert my existing >>>> data >>>> table to appropriate format so that i can apply Apriori algorithm using >>>> R, >>>> and this is very important me to get it done because I'm an intern and >>>> if i >>>> won't get it done they will not going to hire me as a full-time >>>> employee. >>>> i tried everything by myself but not able to get it done. >>>> your precious 10-15 can save my upcoming years. so please if you can >>>> please >>>> help me through this. >>>> i want another dataset based on first two dataset i have mentioned . >>>> >>>> Thanks >>>> >>>> On 30 August 2017 at 12:49, PIKAL Petr <petr.pi...@precheza.cz> wrote: >>>> >>>> > Hi >>>> > >>>> > It seems to me like homework, there is no homework policy on this help >>>> > list. >>>> > >>>> > What do you want to do with your table 3? It seems to me futile. >>>> > >>>> > Anyway, some combination of melt, merge, cast and regular expressions >>>> > could be employed in such task, but it could be rather tricky. >>>> > >>>> > But be aware that >>>> > >>>> > Suger does not match sugar (I wonder that sugar is dairy product) >>>> > >>>> > and you mix uppercase and lowercase letters which could be also >>>> > problematic, when matching words. >>>> > >>>> > Cheers >>>> > Petr >>>> > >>>> > > -----Original Message----- >>>> > > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of >>>> Hemant >>>> > Sain >>>> > > Sent: Wednesday, August 30, 2017 8:28 AM >>>> > > To: r-help@r-project.org >>>> > > Subject: [R] Dataframe Manipulation >>>> > > >>>> > > i want to do a market basket analysis and I’m trying to create a >>>> dataset >>>> > for that >>>> > > i have two tables, one table contains daily transaction of products >>>> in >>>> > which >>>> > > each row of table shows item purchased by the customer, The second >>>> table >>>> > > contains parent group under those products are fallen, for example >>>> under >>>> > fruit >>>> > > category there are several fruits like mango, banana, apple etc. >>>> > > i want to create a third table in which parent group are mentioned >>>> as >>>> > header >>>> > > which can be extracted from Table 2, and all the rows represent >>>> > transaction of >>>> > > products >>>> > > >>>> > > with their names, and if there is no transaction for any parent >>>> category >>>> > then >>>> > > the cell supposed to fill as NA. please help me with R or C/c++ >>>> code( R >>>> > would be >>>> > > >>>> > > preferred) here I’m attaching you all three tables for better >>>> reference >>>> > i have >>>> > > first two tables and i want to get a table like table 3 >>>> > > >>>> > > Tables are explained in the attached doc. >>>> > > >>>> > > -- >>>> > > hemantsain.com >>>> > >>>> > ________________________________ >>>> > Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a >>>> jsou >>>> > určeny pouze jeho adresátům. >>>> > Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě >>>> > neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho >>>> kopie >>>> > vymažte ze svého systému. >>>> > Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento >>>> email >>>> > jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat. >>>> > Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou >>>> modifikacemi >>>> > či zpožděním přenosu e-mailu. >>>> > >>>> > V případě, že je tento e-mail součástí obchodního jednání: >>>> > - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření >>>> > smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu. >>>> > - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně >>>> přijmout; >>>> > Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany >>>> > příjemce s dodatkem či odchylkou. >>>> > - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve >>>> > výslovným dosažením shody na všech jejích náležitostech. >>>> > - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za >>>> > společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně >>>> zmocněn >>>> > nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi >>>> tohoto >>>> > emailu případně osobě, kterou adresát zastupuje, předloženy nebo >>>> jejich >>>> > existence je adresátovi či osobě jím zastoupené známá. >>>> > >>>> > This e-mail and any documents attached to it may be confidential and >>>> are >>>> > intended only for its intended recipients. >>>> > If you received this e-mail by mistake, please immediately inform its >>>> > sender. Delete the contents of this e-mail with all attachments and >>>> its >>>> > copies from your system. >>>> > If you are not the intended recipient of this e-mail, you are not >>>> > authorized to use, disseminate, copy or disclose this e-mail in any >>>> manner. >>>> > The sender of this e-mail shall not be liable for any possible damage >>>> > caused by modifications of the e-mail or by delay with transfer of the >>>> > email. >>>> > >>>> > In case that this e-mail forms part of business dealings: >>>> > - the sender reserves the right to end negotiations about entering >>>> into a >>>> > contract in any time, for any reason, and without stating any >>>> reasoning. >>>> > - if the e-mail contains an offer, the recipient is entitled to >>>> > immediately accept such offer; The sender of this e-mail (offer) >>>> excludes >>>> > any acceptance of the offer on the part of the recipient containing >>>> any >>>> > amendment or variation. >>>> > - the sender insists on that the respective contract is concluded only >>>> > upon an express mutual agreement on all its aspects. >>>> > - the sender of this e-mail informs that he/she is not authorized to >>>> enter >>>> > into any contracts on behalf of the company except for cases in which >>>> > he/she is expressly authorized to do so in writing, and such >>>> authorization >>>> > or power of attorney is submitted to the recipient or the person >>>> > represented by the recipient, or the existence of such authorization >>>> is >>>> > known to the recipient of the person represented by the recipient. >>>> > >>>> >>>> >>>> >>>> -- >>>> hemantsain.com >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> >> >> -- >> hemantsain.com >> > > > > -- > hemantsain.com > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.