Hello, I need some help with regex. I have this to sentences. I need to extract both "49MU6300" and "LE32S5970" and put them in a new colum "SKU".
A) SMART TV UHD 49'' CURVO 49MU6300 B) SMART TV HD 32'' LE32S5970 DataFrame for testing: ecommerce <- data.frame(a = c(1,2), producto = c("SMART TV UHD 49'' CURVO 49MU6300", "SMART TV HD 32'' LE32S5970")) I'm using gsub like this: 1.- This would capture A as intended but only "32S5970" from B (missing "LE"). ecommerce$sku <- gsub("(.*)([0-9]{2}[a-zA-Z]{1,2}[0-9]{2,4})(.*)", "\\2", ecommerce$producto) 2.- This would capture "LE32S5970" but not "49MU6300". ecommerce$sku <- gsub("(.*)([a-zA-Z]{2}[0-9]{2}[a-zA-Z]{1,2}[0-9]{2,4})(.*)", "\\2", ecommerce$producto) 3.- If I make the 2 first letter optional with: ecommerce$sku <- gsub("(.*)([a-zA-Z]?{2}[0-9]{2}[a-zA-Z]{1,2}[0-9]{2,4})(.*)", "\\2", ecommerce$producto) "49MU6300" is capture, but again only "32S5970" from B (missing "LE"). What should I do? How would you approche it? [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.