Re: Finding lines in .txt file that contain keywords from two different set()

DL Neil via Python-list Sun, 08 Sep 2019 12:48:39 -0700

On 9/09/19 4:02 AM, A S wrote:

My problem is seemingly profound but I hope to make it sound as simplified as 
possible.....Let me unpack the details..:

...

These are the folders used for a better reference ( 
https://drive.google.com/open?id=1_LcceqcDhHnWW3Nrnwf5RkXPcnDfesq ). The files 
are found in the folder.

The link resulted in a 404 page (for me - but then I don't use Google).So, without any sample data...

> 1. I have one folder of Excel (.xlsx) files that serve as a datadictionary.

>
> -In Cell A1, the data source name is written in between brackets
>

> -In Cols C:D, it contains the data field names (It could be in eithercol C or D in my actual Excel sheet. So I had to search both columns

>
> -*Important: I need to know which data source the field names come from
>

> 2. I have another folder of Text (.txt) files that I need to parsethrough to find these keywords.

Recommend you start with a set of test data/directories. For the firstrun, have one of each type of file, where the keywords correlate. Thusprove that the system works when you know it should.

Next, try the opposite, to ensure that it equally-happily ignores, whenit should.

Then expand to having multiple records, so that you can see what happenswhen some files correlate, and some don't.

ie take a large problem and break it down into smaller units. This is a"top-down" method.

An alternate design approach (which works very well in Python - see also"PyTest") is to embrace the principles of TDD (Test-Driven Development).This is a process that builds 'from the ground, up'. In this, we designa small part of the process - let's call it a function/method: first wecode some test data *and* the expected answer, eg if one input is 1 andanother is 2 is their addition 3? (running such a test at this stagewill fail - badly!); and then we write some code - and keep perfectingit until it passes the test.

Repeat, stage-by-stage, to build the complete program - meantime, everychange you make to the code should be tested against not just 'its own'test, but all of the tests which originally related to some othersmaller unit of the whole. In this way, 'new code' can be shown to break(or not - hopefully) previously implemented, tested, and 'proven' code!

Notice how you have broken-down the larger problem in the description(points 1 to 5, above)! Design the tests similarly, to *only* test onesmall piece of the puzzle (often you will have to 'fake' or "mock"data-inputs to the process, particularly if code to produce that unit'sinput has yet to be written, but regardless 'mock data' is thoroughlycontrolled and thus produces (more) predictable results) - plus, it'smuch easier to spot errors and omissions when you don't have to wadethrough a mass of print-outs that (attempt to) cover *everything*! (IMHO)

Plus, when a problem is well-confined, there's less example code anddata to insert into list questions, and the responses will beequally-focussed!

Referring back to the question: it seems that the issue is either thatthe keywords are not being (correctly) picked-out of the sets of files(easy tests - for *only* those small section of the code!), or that thelogic linking the key-words is faulty (another *small* test, easilycoded - and at first fed with 'fake' key-words which prove the varioustest cases, and thus, when run, (attempt to) prove your logic and code!)



--
Regards =dn
--
https://mail.python.org/mailman/listinfo/python-list

Re: Finding lines in .txt file that contain keywords from two different set()

Reply via email to