Hello Ulrik, Can you please explain this code means how and what this code is doing because I'm not able to understand it, if you can explain it i can use it in future by doing some Lil bit manipulation.
Thanks data_help <- data_help %>% mutate(Purchase_ID = 1:n()) %>% group_by(Purchase_ID) %>% do(split_items(.)) cat_help %>% gather("Foo", "Item") %>% filter(!is.na(Item)) %>% left_join(data_help, by = "Item") %>% group_by(Foo, Purchase_ID) %>% summarise(Item = paste(Item, collapse = ", ")) %>% spread(key = "Foo", value = "Item") On 31 August 2017 at 13:17, Ulrik Stervbo <ulrik.ster...@gmail.com> wrote: > Hi Hemant, > > the solution is really quite similar, and the logic is identical: > > library(readr) > library(dplyr) > library(stringr) > library(tidyr) > > data_help <- read_csv("data_help.csv") > cat_help <- read_csv("cat_help.csv") > > # Helper function to split the Items and create a data_frame > split_items <- function(items){ > x <- items$Items_purchased_on_Receipts %>% > str_split(pattern = ",") %>% > unlist(use.names = FALSE) > > data_frame(Item = x, Purchase_ID = items$Purchase_ID) > } > > data_help <- > data_help %>% > mutate(Purchase_ID = 1:n()) %>% > group_by(Purchase_ID) %>% > do(split_items(.)) > > cat_help %>% gather("Foo", "Item") %>% > filter(!is.na(Item)) %>% > left_join(data_help, by = "Item") %>% > group_by(Foo, Purchase_ID) %>% > summarise(Item = paste(Item, collapse = ", ")) %>% > spread(key = "Foo", value = "Item") > > HTH > Ulrik > > On Wed, 30 Aug 2017 at 13:22 Hemant Sain <hemantsai...@gmail.com> wrote: > >> by using these two tables we have to create third table in this format >> where categories will be on the top and transaction will be in the rows, >> >> On 30 August 2017 at 16:42, Hemant Sain <hemantsai...@gmail.com> wrote: >> >>> Hello Ulrik, >>> Can you please once check this code again on the following data set >>> because it doesn't giving same output to me due to absence of quantity,a >>> compare to previous demo data set becaue spiting is getting done on the >>> basis of quantity and in real data set quantity is missing. so please use >>> following data set and help me out please consider this mail is my final >>> email i won't bother you again but its about my job please help me >>> . >>> >>> Note* the file I'm attaching is very confidential >>> >>> On 30 August 2017 at 15:02, Ulrik Stervbo <ulrik.ster...@gmail.com> >>> wrote: >>> >>>> Hi Hemant, >>>> >>>> Does this help you along? >>>> >>>> table_1 <- textConnection("Item_1;Item_2;Item_3 >>>> 1KG banana;300ML milk;1kg sugar >>>> 2Large Corona_Beer;2pack Fries; >>>> 2 Lux_Soap;1kg sugar;") >>>> >>>> table_1 <- read.csv(table_1, sep = ";", na.strings = "", >>>> stringsAsFactors = FALSE, check.names = FALSE) >>>> >>>> table_2 <- >>>> textConnection("Toiletries;Fruits;Beverages;Snacks;Vegetables;Clothings;Dairy >>>> Products >>>> Soap;banana;Corona_Beer;King Burger;Pumpkin;Adidas Sport Tshirt XL;milk >>>> Shampoo;Mango;Red Label Whisky;Fries;Potato;Nike Shorts Black L;Butter >>>> Showergel;Oranges;grey Cocktail;cheese pizza;Tomato;Puma Jersy red >>>> M;sugar >>>> Lux_Soap;;2 Large corona Beer;;Cheese;Toothpaste") >>>> >>>> table_2 <- read.csv(table_2, sep = ";", na.strings = "", >>>> stringsAsFactors = FALSE, check.names = FALSE) >>>> >>>> library(tidyr) >>>> library(dplyr) >>>> >>>> table_2 <- gather(table_2, "Category", "Item") >>>> >>>> table_1 <- gather(table_1, "Foo", "Item") %>% >>>> filter(!is.na(Item)) >>>> >>>> table_1 <- separate(table_1, col = "Item", into = c("Quantity", >>>> "Item"), sep = " ") >>>> >>>> table_3 <- left_join(table_1, table_2, by = "Item") %>% >>>> mutate(Item = paste(Quantity, Item)) %>% >>>> select(-Quantity) >>>> >>>> table_3 %>% >>>> group_by(Foo, Category) %>% >>>> summarise(Item = paste(Item, collapse = ", ")) %>% >>>> spread(key = "Category", value = "Item") >>>> >>>> You need to figure out how to handle words written with different cases >>>> and how to get the quantity in an universal way. For the code above, I >>>> corrected these things by hand in the example data. >>>> >>>> HTH >>>> Ulrik >>>> >>>> On Wed, 30 Aug 2017 at 10:16 Hemant Sain <hemantsai...@gmail.com> >>>> wrote: >>>> >>>>> Hey PIKAL, >>>>> It's not a homework neithe that is the real dataset i have signer NDA >>>>> for >>>>> my company so that i can share the original data file, Actually I'm >>>>> working >>>>> on a market basket analysis task but not able to convert my existing >>>>> data >>>>> table to appropriate format so that i can apply Apriori algorithm >>>>> using R, >>>>> and this is very important me to get it done because I'm an intern and >>>>> if i >>>>> won't get it done they will not going to hire me as a full-time >>>>> employee. >>>>> i tried everything by myself but not able to get it done. >>>>> your precious 10-15 can save my upcoming years. so please if you can >>>>> please >>>>> help me through this. >>>>> i want another dataset based on first two dataset i have mentioned . >>>>> >>>>> Thanks >>>>> >>>>> On 30 August 2017 at 12:49, PIKAL Petr <petr.pi...@precheza.cz> wrote: >>>>> >>>>> > Hi >>>>> > >>>>> > It seems to me like homework, there is no homework policy on this >>>>> help >>>>> > list. >>>>> > >>>>> > What do you want to do with your table 3? It seems to me futile. >>>>> > >>>>> > Anyway, some combination of melt, merge, cast and regular expressions >>>>> > could be employed in such task, but it could be rather tricky. >>>>> > >>>>> > But be aware that >>>>> > >>>>> > Suger does not match sugar (I wonder that sugar is dairy product) >>>>> > >>>>> > and you mix uppercase and lowercase letters which could be also >>>>> > problematic, when matching words. >>>>> > >>>>> > Cheers >>>>> > Petr >>>>> > >>>>> > > -----Original Message----- >>>>> > > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of >>>>> Hemant >>>>> > Sain >>>>> > > Sent: Wednesday, August 30, 2017 8:28 AM >>>>> > > To: r-help@r-project.org >>>>> > > Subject: [R] Dataframe Manipulation >>>>> > > >>>>> > > i want to do a market basket analysis and I’m trying to create a >>>>> dataset >>>>> > for that >>>>> > > i have two tables, one table contains daily transaction of >>>>> products in >>>>> > which >>>>> > > each row of table shows item purchased by the customer, The second >>>>> table >>>>> > > contains parent group under those products are fallen, for example >>>>> under >>>>> > fruit >>>>> > > category there are several fruits like mango, banana, apple etc. >>>>> > > i want to create a third table in which parent group are mentioned >>>>> as >>>>> > header >>>>> > > which can be extracted from Table 2, and all the rows represent >>>>> > transaction of >>>>> > > products >>>>> > > >>>>> > > with their names, and if there is no transaction for any parent >>>>> category >>>>> > then >>>>> > > the cell supposed to fill as NA. please help me with R or C/c++ >>>>> code( R >>>>> > would be >>>>> > > >>>>> > > preferred) here I’m attaching you all three tables for better >>>>> reference >>>>> > i have >>>>> > > first two tables and i want to get a table like table 3 >>>>> > > >>>>> > > Tables are explained in the attached doc. >>>>> > > >>>>> > > -- >>>>> > > hemantsain.com >>>>> > >>>>> > ________________________________ >>>>> > Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a >>>>> jsou >>>>> > určeny pouze jeho adresátům. >>>>> > Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě >>>>> > neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a >>>>> jeho kopie >>>>> > vymažte ze svého systému. >>>>> > Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento >>>>> email >>>>> > jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat. >>>>> > Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou >>>>> modifikacemi >>>>> > či zpožděním přenosu e-mailu. >>>>> > >>>>> > V případě, že je tento e-mail součástí obchodního jednání: >>>>> > - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření >>>>> > smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu. >>>>> > - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně >>>>> přijmout; >>>>> > Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze >>>>> strany >>>>> > příjemce s dodatkem či odchylkou. >>>>> > - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve >>>>> > výslovným dosažením shody na všech jejích náležitostech. >>>>> > - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za >>>>> > společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně >>>>> zmocněn >>>>> > nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi >>>>> tohoto >>>>> > emailu případně osobě, kterou adresát zastupuje, předloženy nebo >>>>> jejich >>>>> > existence je adresátovi či osobě jím zastoupené známá. >>>>> > >>>>> > This e-mail and any documents attached to it may be confidential and >>>>> are >>>>> > intended only for its intended recipients. >>>>> > If you received this e-mail by mistake, please immediately inform its >>>>> > sender. Delete the contents of this e-mail with all attachments and >>>>> its >>>>> > copies from your system. >>>>> > If you are not the intended recipient of this e-mail, you are not >>>>> > authorized to use, disseminate, copy or disclose this e-mail in any >>>>> manner. >>>>> > The sender of this e-mail shall not be liable for any possible damage >>>>> > caused by modifications of the e-mail or by delay with transfer of >>>>> the >>>>> > email. >>>>> > >>>>> > In case that this e-mail forms part of business dealings: >>>>> > - the sender reserves the right to end negotiations about entering >>>>> into a >>>>> > contract in any time, for any reason, and without stating any >>>>> reasoning. >>>>> > - if the e-mail contains an offer, the recipient is entitled to >>>>> > immediately accept such offer; The sender of this e-mail (offer) >>>>> excludes >>>>> > any acceptance of the offer on the part of the recipient containing >>>>> any >>>>> > amendment or variation. >>>>> > - the sender insists on that the respective contract is concluded >>>>> only >>>>> > upon an express mutual agreement on all its aspects. >>>>> > - the sender of this e-mail informs that he/she is not authorized to >>>>> enter >>>>> > into any contracts on behalf of the company except for cases in which >>>>> > he/she is expressly authorized to do so in writing, and such >>>>> authorization >>>>> > or power of attorney is submitted to the recipient or the person >>>>> > represented by the recipient, or the existence of such authorization >>>>> is >>>>> > known to the recipient of the person represented by the recipient. >>>>> > >>>>> >>>>> >>>>> >>>>> -- >>>>> hemantsain.com >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> ______________________________________________ >>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide http://www.R-project.org/ >>>>> posting-guide.html <http://www.r-project.org/posting-guide.html> >>>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> >>> >>> >>> -- >>> hemantsain.com >>> >> >> >> >> -- >> hemantsain.com >> > -- hemantsain.com On 31 August 2017 at 13:17, Ulrik Stervbo <ulrik.ster...@gmail.com> wrote: > Hi Hemant, > > the solution is really quite similar, and the logic is identical: > > library(readr) > library(dplyr) > library(stringr) > library(tidyr) > > data_help <- read_csv("data_help.csv") > cat_help <- read_csv("cat_help.csv") > > # Helper function to split the Items and create a data_frame > split_items <- function(items){ > x <- items$Items_purchased_on_Receipts %>% > str_split(pattern = ",") %>% > unlist(use.names = FALSE) > > data_frame(Item = x, Purchase_ID = items$Purchase_ID) > } > > data_help <- > data_help %>% > mutate(Purchase_ID = 1:n()) %>% > group_by(Purchase_ID) %>% > do(split_items(.)) > > cat_help %>% gather("Foo", "Item") %>% > filter(!is.na(Item)) %>% > left_join(data_help, by = "Item") %>% > group_by(Foo, Purchase_ID) %>% > summarise(Item = paste(Item, collapse = ", ")) %>% > spread(key = "Foo", value = "Item") > > HTH > Ulrik > > On Wed, 30 Aug 2017 at 13:22 Hemant Sain <hemantsai...@gmail.com> wrote: > >> by using these two tables we have to create third table in this format >> where categories will be on the top and transaction will be in the rows, >> >> On 30 August 2017 at 16:42, Hemant Sain <hemantsai...@gmail.com> wrote: >> >>> Hello Ulrik, >>> Can you please once check this code again on the following data set >>> because it doesn't giving same output to me due to absence of quantity,a >>> compare to previous demo data set becaue spiting is getting done on the >>> basis of quantity and in real data set quantity is missing. so please use >>> following data set and help me out please consider this mail is my final >>> email i won't bother you again but its about my job please help me >>> . >>> >>> Note* the file I'm attaching is very confidential >>> >>> On 30 August 2017 at 15:02, Ulrik Stervbo <ulrik.ster...@gmail.com> >>> wrote: >>> >>>> Hi Hemant, >>>> >>>> Does this help you along? >>>> >>>> table_1 <- textConnection("Item_1;Item_2;Item_3 >>>> 1KG banana;300ML milk;1kg sugar >>>> 2Large Corona_Beer;2pack Fries; >>>> 2 Lux_Soap;1kg sugar;") >>>> >>>> table_1 <- read.csv(table_1, sep = ";", na.strings = "", >>>> stringsAsFactors = FALSE, check.names = FALSE) >>>> >>>> table_2 <- >>>> textConnection("Toiletries;Fruits;Beverages;Snacks;Vegetables;Clothings;Dairy >>>> Products >>>> Soap;banana;Corona_Beer;King Burger;Pumpkin;Adidas Sport Tshirt XL;milk >>>> Shampoo;Mango;Red Label Whisky;Fries;Potato;Nike Shorts Black L;Butter >>>> Showergel;Oranges;grey Cocktail;cheese pizza;Tomato;Puma Jersy red >>>> M;sugar >>>> Lux_Soap;;2 Large corona Beer;;Cheese;Toothpaste") >>>> >>>> table_2 <- read.csv(table_2, sep = ";", na.strings = "", >>>> stringsAsFactors = FALSE, check.names = FALSE) >>>> >>>> library(tidyr) >>>> library(dplyr) >>>> >>>> table_2 <- gather(table_2, "Category", "Item") >>>> >>>> table_1 <- gather(table_1, "Foo", "Item") %>% >>>> filter(!is.na(Item)) >>>> >>>> table_1 <- separate(table_1, col = "Item", into = c("Quantity", >>>> "Item"), sep = " ") >>>> >>>> table_3 <- left_join(table_1, table_2, by = "Item") %>% >>>> mutate(Item = paste(Quantity, Item)) %>% >>>> select(-Quantity) >>>> >>>> table_3 %>% >>>> group_by(Foo, Category) %>% >>>> summarise(Item = paste(Item, collapse = ", ")) %>% >>>> spread(key = "Category", value = "Item") >>>> >>>> You need to figure out how to handle words written with different cases >>>> and how to get the quantity in an universal way. For the code above, I >>>> corrected these things by hand in the example data. >>>> >>>> HTH >>>> Ulrik >>>> >>>> On Wed, 30 Aug 2017 at 10:16 Hemant Sain <hemantsai...@gmail.com> >>>> wrote: >>>> >>>>> Hey PIKAL, >>>>> It's not a homework neithe that is the real dataset i have signer NDA >>>>> for >>>>> my company so that i can share the original data file, Actually I'm >>>>> working >>>>> on a market basket analysis task but not able to convert my existing >>>>> data >>>>> table to appropriate format so that i can apply Apriori algorithm >>>>> using R, >>>>> and this is very important me to get it done because I'm an intern and >>>>> if i >>>>> won't get it done they will not going to hire me as a full-time >>>>> employee. >>>>> i tried everything by myself but not able to get it done. >>>>> your precious 10-15 can save my upcoming years. so please if you can >>>>> please >>>>> help me through this. >>>>> i want another dataset based on first two dataset i have mentioned . >>>>> >>>>> Thanks >>>>> >>>>> On 30 August 2017 at 12:49, PIKAL Petr <petr.pi...@precheza.cz> wrote: >>>>> >>>>> > Hi >>>>> > >>>>> > It seems to me like homework, there is no homework policy on this >>>>> help >>>>> > list. >>>>> > >>>>> > What do you want to do with your table 3? It seems to me futile. >>>>> > >>>>> > Anyway, some combination of melt, merge, cast and regular expressions >>>>> > could be employed in such task, but it could be rather tricky. >>>>> > >>>>> > But be aware that >>>>> > >>>>> > Suger does not match sugar (I wonder that sugar is dairy product) >>>>> > >>>>> > and you mix uppercase and lowercase letters which could be also >>>>> > problematic, when matching words. >>>>> > >>>>> > Cheers >>>>> > Petr >>>>> > >>>>> > > -----Original Message----- >>>>> > > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of >>>>> Hemant >>>>> > Sain >>>>> > > Sent: Wednesday, August 30, 2017 8:28 AM >>>>> > > To: r-help@r-project.org >>>>> > > Subject: [R] Dataframe Manipulation >>>>> > > >>>>> > > i want to do a market basket analysis and I’m trying to create a >>>>> dataset >>>>> > for that >>>>> > > i have two tables, one table contains daily transaction of >>>>> products in >>>>> > which >>>>> > > each row of table shows item purchased by the customer, The second >>>>> table >>>>> > > contains parent group under those products are fallen, for example >>>>> under >>>>> > fruit >>>>> > > category there are several fruits like mango, banana, apple etc. >>>>> > > i want to create a third table in which parent group are mentioned >>>>> as >>>>> > header >>>>> > > which can be extracted from Table 2, and all the rows represent >>>>> > transaction of >>>>> > > products >>>>> > > >>>>> > > with their names, and if there is no transaction for any parent >>>>> category >>>>> > then >>>>> > > the cell supposed to fill as NA. please help me with R or C/c++ >>>>> code( R >>>>> > would be >>>>> > > >>>>> > > preferred) here I’m attaching you all three tables for better >>>>> reference >>>>> > i have >>>>> > > first two tables and i want to get a table like table 3 >>>>> > > >>>>> > > Tables are explained in the attached doc. >>>>> > > >>>>> > > -- >>>>> > > hemantsain.com >>>>> > >>>>> > ________________________________ >>>>> > Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a >>>>> jsou >>>>> > určeny pouze jeho adresátům. >>>>> > Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě >>>>> > neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a >>>>> jeho kopie >>>>> > vymažte ze svého systému. >>>>> > Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento >>>>> email >>>>> > jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat. >>>>> > Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou >>>>> modifikacemi >>>>> > či zpožděním přenosu e-mailu. >>>>> > >>>>> > V případě, že je tento e-mail součástí obchodního jednání: >>>>> > - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření >>>>> > smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu. >>>>> > - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně >>>>> přijmout; >>>>> > Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze >>>>> strany >>>>> > příjemce s dodatkem či odchylkou. >>>>> > - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve >>>>> > výslovným dosažením shody na všech jejích náležitostech. >>>>> > - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za >>>>> > společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně >>>>> zmocněn >>>>> > nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi >>>>> tohoto >>>>> > emailu případně osobě, kterou adresát zastupuje, předloženy nebo >>>>> jejich >>>>> > existence je adresátovi či osobě jím zastoupené známá. >>>>> > >>>>> > This e-mail and any documents attached to it may be confidential and >>>>> are >>>>> > intended only for its intended recipients. >>>>> > If you received this e-mail by mistake, please immediately inform its >>>>> > sender. Delete the contents of this e-mail with all attachments and >>>>> its >>>>> > copies from your system. >>>>> > If you are not the intended recipient of this e-mail, you are not >>>>> > authorized to use, disseminate, copy or disclose this e-mail in any >>>>> manner. >>>>> > The sender of this e-mail shall not be liable for any possible damage >>>>> > caused by modifications of the e-mail or by delay with transfer of >>>>> the >>>>> > email. >>>>> > >>>>> > In case that this e-mail forms part of business dealings: >>>>> > - the sender reserves the right to end negotiations about entering >>>>> into a >>>>> > contract in any time, for any reason, and without stating any >>>>> reasoning. >>>>> > - if the e-mail contains an offer, the recipient is entitled to >>>>> > immediately accept such offer; The sender of this e-mail (offer) >>>>> excludes >>>>> > any acceptance of the offer on the part of the recipient containing >>>>> any >>>>> > amendment or variation. >>>>> > - the sender insists on that the respective contract is concluded >>>>> only >>>>> > upon an express mutual agreement on all its aspects. >>>>> > - the sender of this e-mail informs that he/she is not authorized to >>>>> enter >>>>> > into any contracts on behalf of the company except for cases in which >>>>> > he/she is expressly authorized to do so in writing, and such >>>>> authorization >>>>> > or power of attorney is submitted to the recipient or the person >>>>> > represented by the recipient, or the existence of such authorization >>>>> is >>>>> > known to the recipient of the person represented by the recipient. >>>>> > >>>>> >>>>> >>>>> >>>>> -- >>>>> hemantsain.com >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> ______________________________________________ >>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide http://www.R-project.org/ >>>>> posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> >>> >>> >>> -- >>> hemantsain.com >>> >> >> >> >> -- >> hemantsain.com >> > -- hemantsain.com [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.