Why 0's in the data frame? Shouldn't that be 1 (vendor with that account)? Bert Bert Gunter
"The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Nov 17, 2020 at 3:29 PM Tom Woolman <twool...@ontargettek.com> wrote: > Hi Bill. Sorry to be so obtuse with the example data, I was trying > (too hard) not to share any actual values so I just created randomized > values for my example; of course I should have specified that the > random values would not provide the expected problem pattern. I should > have just used simple dummy codes as Bill Dunlap did. > > So per Bill's example data for Data1, the expected (hoped for) output > should be: > > Vendor Account Num_Vendors_Sharing_Bank_Acct > 1 V1 A1 0 > 2 V2 A2 3 > 3 V3 A2 3 > 4 V4 A2 3 > > > Where the new calculated variable is Num_Vendors_Sharing_Bank_Acct. > The value is 3 for V2, V3 and V4 because they each share bank account > A2. > > > Likewise, in the Data2 frame, the same logic applies: > > Vendor Account Num_Vendors_Sharing_Bank_Acct > 1 V1 A1 0 > 2 V2 A2 3 > 3 V3 A2 3 > 4 V1 A2 3 > 5 V4 A3 0 > 6 V2 A4 0 > > > > > > > Thanks! > > > Quoting Bill Dunlap <williamwdun...@gmail.com>: > > > What should the result be for > > Data1 <- data.frame(Vendor=c("V1","V2","V3","V4"), > > Account=c("A1","A2","A2","A2")) > > ? > > > > Must each vendor have only one account? If not, what should the result > be > > for > > Data2 <- data.frame(Vendor=c("V1","V2","V3","V1","V4","V2"), > > Account=c("A1","A2","A2","A2","A3","A4")) > > ? > > > > -Bill > > > > On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman <twool...@ontargettek.com> > > wrote: > > > >> Hi everyone. I have a dataframe that is a collection of Vendor IDs > >> plus a bank account number for each vendor. I'm trying to find a way > >> to count the number of duplicate bank accounts that occur in more than > >> one unique Vendor_ID, and then assign the count value for each row in > >> the dataframe in a new variable. > >> > >> I can do a count of bank accounts that occur within the same vendor > >> using dplyr and group_by and count, but I can't figure out a way to > >> count duplicates among multiple Vendor_IDs. > >> > >> > >> Dataframe example code: > >> > >> > >> #Create a sample data frame: > >> > >> set.seed(1) > >> > >> Data <- data.frame(Vendor_ID = sample(1:10000), Bank_Account_ID = > >> sample(1:10000)) > >> > >> > >> > >> > >> Thanks in advance for any help. > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.