Emeka writes: > Hello All, > > import urllib.request > import re > > url = 'https://www.everyday.com/ > > > > req = urllib.request.Request(url) > resp = urllib.request.urlopen(req) > respData = resp.read() > > > paragraphs = re.findall(r'\[(.*?)\]',str(respData)) > for eachP in paragraphs: > print("".join(eachP.split(',')[1:-2])) > print("\n") > > > > I got the below: > "Coke - Yala Market Branch""NO. 113 IKU BAKR WAY YALA""" > But what I need is > > 'Coke - Yala Market Branch NO. 113 IKU BAKR WAY YALA' > > How to I achieve the above?
A couple of things you could do to understand your problem and work around it: Change your code to print(eachP). Change your "".join to "!".join to see where the commas were. Experiment with data of that form in the REPL. Sometimes it's good to print repr(datum) instead of datum, though not in this case. But are you trying to extract and parse paragraphs from a JSON response? Do not use regex for that at all. Use json.load or json.loads to parse it properly, and access the relevant data by indexing: x = json.loads('{"foo":[["Weather Forecast","It\'s Rain"],[]]}') x ==> {'foo': [['Weather Forecast', "It's Rain"], []]} x['foo'] ==> [['Weather Forecast', "It's Rain"], []] x['foo'][0] ==> ['Weather Forecast', "It's Rain"] -- https://mail.python.org/mailman/listinfo/python-list