Re: Script stops running with no error

Thomas Passin via Python-list Wed, 28 Aug 2024 20:01:39 -0700

On 8/28/2024 8:07 PM, dn via Python-list wrote:

On 29/08/24 10:32, Thomas Passin via Python-list wrote:
On 8/28/2024 5:09 PM, Daniel via Python-list wrote:
As you all have seen on my intro post, I am in a project using Python
(which I'm learning as I go) using the wikimedia API to pull data from
wiktionary.org. I want to parse the json and output, for now, just the
definition of the word.
Wiktionary is wikimedia's dictionary.

My requirements for v1

Query the api for the definition for table (in the python script).
Pull the proper json
Parse the json
output the definition only
You need to check at each part of the code to see if you are gettingor producing what you think you are. You also should create a textconstant containing the JSON input you expect to get. Make sure youcan process that. Start simple - one main item. Then two mainitems. Then two main items with one sub item. And so on.
I'm not sure what you want to produce in the end but this seemsawfully complex to be starting with. Also you aren't taking advantageof the structure inherent in the JSON. If the data response isn't toobig, you can probably take it as is and use the Python JSON reader toproduce a Python data structure. It should be much easier (andfaster) to process the data structure than to repeatedly scan allthose lines of data with regexes.
Good effort so far!
Further to @Thomas: the code does seem to be taking the long way around!How can we illustrate that, and improve life?
The Wiktionary docs at https://developer.wikimedia.org/use-content/discuss how to use their "Developer Portal". Worth reading!
As part of the above, we find the "API:Data formats" page (https://www.mediawiki.org/wiki/API:Data_formats) which offers a simple example(more simple than your objectives):
api.php?action=query&titles=Main%20page&format=json

which produces:

{
   "query": {
     "pages": {
       "217225": {
         "pageid": 217225,
         "ns": 0,
         "title": "Main page"
       }
     }
   }
}

Does this look like a Python dict[ionary's] output to you?

It is, (more discussion at the web.ref)
- but it is wrapped into a JSON payload.


To give more detail:

import json
from pprint import pprint

DATA = """{
  "query": {
    "pages": {
      "217225": {
        "pageid": 217225,
        "ns": 0,
        "title": "Main page"
      }
    }
  }
}"""

data_dict = json.loads(DATA)
pprint(data_dict)

Easy. If you have a really big file it can be fearfully slow so it mayor may not be a good approach for this problem.

Or you could parse out the data with JSONpath (which I have never usedbut it's the right kind of approach):


https://pypi.org/project/jsonpath-ng/

Another possibility: JMESPath:

https://python.land/data-processing/working-with-json/jmespath

These kind of approaches also handle the parsing for you and help inconstructing queries.

There are various ways of dealing with JSON-formatted data. You'realready using requests. Perhaps leave such research until later.
So, as soon as "page_data" is realised from "response", print() it (perabove: make sure you're actually seeing what you're expecting to see).Computers have this literal habit of doing what we ask, not what we want!
PS the pprint/pretty printer library offers a neater way of outputting a"nested" data-structure (https://docs.python.org/3/library/pprint.html).
Thereafter, make as much use of the returned dict/list structure as can.At each stage of the 'drilling-down' process, again, print() it (to makesure ...)
In this way the code will step-through the various 'layers' of data-organisation. That observation and stepping-through of 'layers' is ahint that the code should (probably) also be organised by 'layer'! Forexample, the first for-loop finds a page which matches the search-key.This could be abstracted into a (well-named) function.
Thus, you can write a test-harness which provides the function with somesample input (which you know from earlier print-outs!) and can ensure(with yet another print()) that the returned-result is as-expected!
NB the test-data and check-print() should be outside the function.Please take these steps as-read or as 'rules'. Once your skills expand,you will likely become ready to learn about unit-testing, pytest, etc.At which time, such ideas will 'fall into place'.
BTW/whilst that 'unit' is in-focus: how many times will the current codecompute search_query.lower()? How many times (per function call) will"search_query" be any different from previous calls? So, should thatcomputation be elsewhere?(won't make much difference to execution time, but a coding-skill:consider whether to leave computation until the result is actuallyneeded (lazy-evaluation), or if early-computation will save unnecessaryrepeated-computation)
Similarly, 'lift' constants such as "cases" out of (what will become)functions and put them towards the top of the script. This means thatall such 'definition' and 'configuration' settings will be foundtogether in one easy-to-find location AND makes the functional codeeasier to read.
Now, back to the question: where is the problem arising? Do you know ordo you only know that what comes-out at the end is unattractive/unacceptable?
The idea of splitting the code into functions (or "units") is not onlythat you could test each and thereby narrow-down the location of theproblem (and so that we don't have to read so much code in a bid tohelp) but that when you do ask for assistance you will be able toprovide only the pertinent code AND some sample input-data withexpected-results!
(although, if all our dreams come true, you will answer your own question!)
OK, is that enough by way of coding-tactics (not to mention the web-research) to keep you on-track for a while?


--
https://mail.python.org/mailman/listinfo/python-list

Re: Script stops running with no error

Reply via email to