As you all have seen on my intro post, I am in a project using Python (which I'm learning as I go) using the wikimedia API to pull data from wiktionary.org. I want to parse the json and output, for now, just the definition of the word.
Wiktionary is wikimedia's dictionary. My requirements for v1 Query the api for the definition for table (in the python script). Pull the proper json Parse the json output the definition only What's happening? I run the script and, maybe I don't know shit from shinola, but it appears I composed it properly. I wrote the script to do the above. The wiktionary json file denotes a list with this character # and sublists as ## but numbers them On Wiktionary, the definitions are denoted like: 1. blablabla 1. blablabla 2. blablablablabla 2. balbalbla 3. blablabla 1. blablabla I wrote my script to alter it so that the sublist are letters 1. blablabla a. blablabla b. blablabla 2. blablabla and so on /snip At this point, the script stops after it assesses the first line_counter and sub_counter. The code is below, please tell me which stupid mistake I made (I'm sure it's simple). Am I making a bad approach? Is there an easier method of parsing json than the way I'm doing it? I'm all ears. Be kind, i'm really new at python. Environment is emacs. import requests import re search_url = 'https://api.wikimedia.org/core/v1/wiktionary/en/search/page' search_query = 'table' parameters = {'q': search_query} response = requests.get(search_url, params=parameters) data = response.json() page_id = None if 'pages' in data: for page in data['pages']: title = page.get('title', '').lower() if title == search_query.lower(): page_id = page.get('id') break if page_id: content_url = f'https://api.wikimedia.org/core/v1/wiktionary/en/page/ {search_query}' response = requests.get(content_url) page_data = response.json() if 'source' in page_data: content = page_data['source'] cases = {'noun': r'\{en-noun\}(.*?)(?=\{|\Z)', 'verb': r'\{en-verb\}(.*?)(?=\{|\Z)', 'adjective': r'\{en-adj\}(.*?)(?=\{|\Z)', 'adverb': r'\{en-adv\}(.*?)(?=\{|\Z)', 'preposition': r'\{en-prep\}(.*?)(?=\{|\Z)', 'conjunction': r'\{en-con\}(.*?)(?=\{|\Z)', 'interjection': r'\{en-intj\}(.*?)(?=\{|\Z)', 'determiner': r'\{en-det\}(.*?)(?=\{|\Z)', 'pronoun': r'\{en-pron\}(.*?)(?=\{|\Z)' #make sure there aren't more word types } def clean_definition(text): text = re.sub(r'\[\[(.*?)\]\]', r'\1', text) text = text.lstrip('#').strip() return text print(f"\n*** Definition for {search_query} ***") for word_type, pattern in cases.items(): match = re.search(pattern, content, re.DOTALL) if match: lines = [line.strip() for line in match.group(1).split('\n') if line.strip()] definition = [] main_counter = 0 sub_counter = 'a' for line in lines: if line.startswith('##*') or line.startswith('##:'): continue if line.startswith('# ') or line.startswith('#\t'): main_counter += 1 sub_counter = 'a' cleaned_line = clean_definition(line) definition.append(f"{main_counter}. {cleaned_line}") elif line.startswith('##'): cleaned_line = clean_definition(line) definition.append(f" {sub_counter}. {cleaned_line}") sub_counter = chr(ord(sub_counter) + 1) if definition: print(f"\n{word_type.capitalize()}\n") print("\n".join(definition)) break else: print("try again beotch") Thanks, Daniel -- https://mail.python.org/mailman/listinfo/python-list