> On Dec 15, 2016, at 8:46 AM, Steven Nagy <nst...@gmail.com> wrote: > > I tried to send this email, but it didn't go through. I guess pictures are > not allowed to send through HTML formatted emails? > I'm re-sending it again without the picture, just comment there instead as > placeholder. > > Thanks, > Steven > > > From: Steven Nagy [mailto:nst...@gmail.com] > Sent: Monday, December 12, 2016 10:50 PM > To: 'Bert Gunter' <bgunter.4...@gmail.com> > Cc: 'R-help' <r-help@r-project.org> > Subject: RE: [R] Need some help with regular expression > > Hi Bert and all, > > Sorry I was too busy at work and didn't have much time to continue this > until now. > So I studied "?regexp" and I can understand your regular expression now: > sub(".*: *([[:alnum:]]* *-> *STU|STU *-> *[[:alnum:]]*).*","\\1",x) > > But I also wanted to split up these results in 2 columns, so your previous > command would give me this result: > [1] "NMA -> STU" "STU -> REG" "-> STU" > > and I wanted to further split them up to show this: > From To > NMA STU > STU REG > STU
So one more step: > strsplit( sub(".*: *([[:alnum:]]* *-> *STU|STU *-> > *[[:alnum:]]*).*","\\1",x), split="-> ") [[1]] [1] "NMA " "STU" [[2]] [1] "STU " "REG" [[3]] [1] "" "STU" > Well, maybe 2: > sapply( strsplit( sub(".*: *([[:alnum:]]* *-> *STU|STU *-> > *[[:alnum:]]*).*","\\1",x), split="-> "), "[",1 ) [1] "NMA " "STU " "" > sapply( strsplit( sub(".*: *([[:alnum:]]* *-> *STU|STU *-> > *[[:alnum:]]*).*","\\1",x), split="-> "), "[",2 ) [1] "STU" "REG" "STU" > > I still don’t quite understand the backreferences, and how could I have 2 > backreferences, one for the left side of the “->” sign and one for the right > side? > > So it seems like I need to apply the “sub” function twice, similar how I > used the “strapply” function twice in my original post: > strapply(strapply(a, "(file://w+ -> STU|STU -> file://w+)", c, backref = -1, > perl = TRUE), "(file://w+) -> (file://w+)", c, backref = -2, perl = TRUE) > > or maybe there would be a more simple way of using only 1 “sub” function and > 2 backreferences? > > Also I’m not sure what do I do after I get the data? How could I represent > the member type changes graphically? We need to analyze the behavior of > switching from STU to another type or from another type to STU. > Google Analytics has a nice chart under Behavior Flow, or Users Flow, and it > looks like this: > <here was my picture from Google Analytics - it's from Behavior Flow or > Users Flow showing flows from one category to another one and further to > another one> > > > > Is there any graphical representation in R that is similar to this? > > Thanks a lot, > Steven > > -----Original Message----- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Bert Gunter > Sent: Sunday, November 20, 2016 10:05 PM > To: Aliz Csonka <mailto:lyzae...@gmail.com> > Cc: R-help <mailto:r-help@r-project.org> > Subject: Re: [R] Need some help with regular expression > > Although others may respond, I think you will do much better studying > ?regexp, which will answer all your questions. I believe the effort you will > make figuring it out will pay dividends for your future R/regular expression > usage that you cannot gain from my direct explanation. > > Good luck. > > Best, > Bert > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Sun, Nov 20, 2016 at 6:40 PM, Steven Nagy <mailto:nst...@gmail.com> > wrote: >> Thanks a lot Bert. That's amazing. I am very new to both R and regular >> expressions. I don't really understand the regular expression that you >> used below. >> And looks like I don't even need any special library, like the >> "gsubfn" for the strapply function. >> I was trying to use the regexr.com website to analyze your regular >> expression, but it doesn't seem to match any text there. >> Can you explain me the regular expression that you used? >> ".*: *([[:alnum:]]* *-> *STU|STU *-> *[[:alnum:]]*).*" >> So the dot in the front means any character and the star after that >> means that it can repeat 0 or more times, right? >> Then followed by a colon character ":" and a space, and what is the >> next star after that? It means that the sequence before that again can >> repeat 0 or more times? >> And what are the double square brackets? >> Is ":alnum:" specific to R? I don't think "regexr.com" understands >> that. Or maybe that site is for regular expressions in Javascript, and >> the syntax is different in R? >> >> Thank you, >> Steven >> >> -----Original Message----- >> From: Bert Gunter [mailto:bgunter.4...@gmail.com] >> Sent: Sunday, November 20, 2016 2:15 PM >> To: Steven Nagy <mailto:nst...@gmail.com> >> Cc: R-help <mailto:r-help@r-project.org> >> Subject: Re: [R] Need some help with regular expression >> >> If I understand you correctly, I think you are making it more complex >> than necessary. Using your example (thanks!!), the following should >> get you >> started: >> >> >>> x<- c("Name.MEMBER_TYPE: NMA -> STU ; CATEGORY: -> 1 ; CITY: >>> MISSISSAUGA -> Mississauga ; ZIP: L5N1H9 -> L5N 1H9 ; COUNTRY: CAN -> >>> ; MEMBER_STATUS: -> N", "Name.MEMBER_TYPE: STU -> REG ; CATEGORY: 1 >>> ->","Name.MEMBER_TYPE: -> STU") >>> >>> x >> [1] "Name.MEMBER_TYPE: NMA -> STU ; CATEGORY: -> 1 ; CITY: >> MISSISSAUGA -> Mississauga ; ZIP: L5N1H9 -> L5N 1H9 ; COUNTRY: CAN -> >> ; >> MEMBER_STATUS: -> N" >> >> [2] "Name.MEMBER_TYPE: STU -> REG ; CATEGORY: 1 ->" >> [3] "Name.MEMBER_TYPE: -> STU" >>> >>> sub(".*: *([[:alnum:]]* *-> *STU|STU *-> *[[:alnum:]]*).*","file://1",x) >> [1] "NMA -> STU" "STU -> REG" "-> STU" >> >> >> I am sure that you can get things to the form you desire in one go >> with some fiddling of the above, but it was easier for me to write the >> regex to pick out the pieces you wanted and leave the rest to you. >> Others may have slicker ways to do it, of course. >> >> HTH >> >> Cheers, >> Bert >> >> >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along >> and sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> On Sat, Nov 19, 2016 at 8:06 PM, Steven Nagy <mailto:nst...@gmail.com> > wrote: >>> I tried out a regular expression on this website: >>> >>> http://regexr.com/3en1m >>> >>> >>> >>> So the input text is: >>> >>> "Name.MEMBER_TYPE: -> STU" >>> >>> >>> >>> The regular expression is: ((?:\w+|\s) -> STU|STU -> (?:\w+|\s)) >>> >>> And it returns: >>> >>> " -> STU" >>> >>> >>> >>> but when I use in R, it doesn't return the same result: >>> >>> strapply(c, "((?:\\w+|\\s) -> STU|STU -> (?:\\w+|\\s))", c, backref = >>> -1, perl = TRUE) >>> >>> returns: >>> "Name.MEMBER_TYPE: -> STU" >>> >>> >>> >>> >>> >>> Here is what I was trying to do: >>> >>> >>> >>> I need to extract some values from a log table, and I created a >>> regular expression that helps me with that. >>> >>> The log table has cells with values like: >>> >>> a = "Name.MEMBER_TYPE: NMA -> STU ; CATEGORY: -> 1 ; CITY: >>> MISSISSAUGA -> Mississauga ; ZIP: L5N1H9 -> L5N 1H9 ; COUNTRY: CAN -> >>> ; MEMBER_STATUS: -> N" >>> >>> or >>> b = "Name.MEMBER_TYPE: STU -> REG ; CATEGORY: 1 ->" >>> >>> so I needed to extract the values that a STU member type is changing >>> from and to, so I needed NMA, STU in the 1st case or STU, REG in the >>> 2nd >> case. >>> >>> I came up with this expression which worked in both cases: >>> >>> strapply(strapply(a, "(file://w+ -> STU|STU -> file://w+)", c, backref = > -1, >>> perl = TRUE), "(file://w+) -> (file://w+)", c, backref = -2, perl = TRUE) >>> >>> >>> >>> But I had a 3rd case when the source member type was blank: >>> >>> c = "Name.MEMBER_TYPE: -> STU" >>> >>> and in that case it returned an error: >>> >>> strapply(strapply(c, "(file://w+ -> STU|STU -> file://w+)", c, backref = > -1, >>> perl = TRUE), "(file://w+) -> (file://w+)", c, backref = -2, perl = TRUE) >>> >>> Error: is.character(x) is not TRUE >>> >>> >>> >>> I found that the error is because this returns NULL: >>> >>> strapply(c, "(file://w+ -> STU|STU -> file://w+)", c, backref = -1, perl > = >>> TRUE) >>> >>> >>> >>> >>> >>> So I tried to modify the regular expression to match any word or >>> blank >>> space: >>> >>> strapply(c, "((?:\\w+|\\s) -> STU|STU -> (?:\\w+|\\s))", c, backref = >>> -1, perl = TRUE) >>> >>> >>> >>> but this returned me the whole value of "c": >>> >>> "Name.MEMBER_TYPE: -> STU" >>> >>> and I only needed " -> STU" as it shows on the website regxr.com >>> >>> >>> >>> Is the result wrong on the regxr.com website or strapply returns the >>> wrong result? >>> >>> >>> >>> Thanks, >>> >>> Steven >>> >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> mailto:R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > mailto:R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.