My problem is seemingly profound but I hope to make it sound as simplified as possible.....Let me unpack the details..:
1. I have one folder of Excel (.xlsx) files that serve as a data dictionary. -In Cell A1, the data source name is written in between brackets -In Cols C:D, it contains the data field names (It could be in either col C or D in my actual Excel sheet. So I had to search both columns -*Important: I need to know which data source the field names come from 2. I have another folder of Text (.txt) files that I need to parse through to find these keywords. These are the folders used for a better reference ( https://drive.google.com/open?id=1_LcceqcDhHnWW3Nrnwf5RkXPcnDfesq ). The files are found in the folder. This is the code I have thus far...: import os, sys from os.path import join import re import xlrd from xlrd import open_workbook import openpyxl from openpyxl.reader.excel import load_workbook import xlsxwriter #All the paths dict_folder = 'C:/Users/xxxx/Documents/xxxx/Test Excel' text_folder = 'C:/Users/xxxx/Documents/xxxx/Text' words = set() fieldset = set() for file in os.listdir(dict_folder): if file.endswith(".xlsx"): wb1 = load_workbook(join(dict_folder, file), data_only = True) ws = wb1.active #Here I am reading and printing all the data source names set(words) in the excel dictionaries: cellvalues = ws["A1"].value wordsextract = re.findall(r"\((.+?)\)", str(cellvalues)) results = wordsextract[0] words.add(results) print(results) for rowofcellobj in ws["C" : "D"]: for cellobj in rowofcellobj: #2. Here I am printing all the field names in col C & D in the excel dictionaries: data = re.findall(r"\w+_.*?\w+", str(cellobj.value)) if data != []: fields = data[0] fieldset.add(fields) print(fieldset) #listing = str.remove("") #print(listing) #Here I am reading the name of each .txt file to the separate .xlsx file: for r, name in enumerate(os.listdir(text_folder)): if name.endswith(".txt"): print(name) #Reading .txt file and trying to make the sentence into words instead of lines so that I can compare the individual .txt file words with the .xlsx file txtfilespath = os.chdir("C:/Users/xxxx/Documents/xxxx/Text") #Here I am reading and printing all the words in the .txt files and compare with the excel Cell A1: for name in os.listdir(txtfilespath): if name.endswith(".txt"): with open (name, "r") as texts: # Read each line of the file: s = texts.read() print(s) #if .txt files contain.....() or select or from or words from sets..search that sentence and extract the common fields result1 = [] parens = 0 buff = "" for line in s: if line == "(": parens += 1 if parens > 0: buff += line if line == ")": parens -= 1 if not parens and buff: result1.append(buff) buff = "" set(result1) #Here, I include other keywords other than those found in the Excel workbooks checkhere = set() checkhere.add("Select") checkhere.add("From") checkhere.add("select") checkhere.add("from") checkhere.add("SELECT") checkhere.add("FROM") # k = list(checkhere) # print(k) #I only want to read/ extract the lines containing brackets () as well as the keywords in the checkhere set. So that I can check capture the source and field in each line: #I tried this but nothing was printed...... for element in checkhere: if element in result1: print(result1) My desired output for the code that could not be printed when I tried is: (/* 1.select_no., biiiiiyyyy FROM apple_x_Ex_x */ proc sql; "TRUuuuth") (/* 1.xxxxx FROM xxxxx*/ proc sql; "TRUuuuth") (SELECT abc AS abc1, ab33_2_ AS mon, a_rr, iirir_vf, jk_ff, sfa_jfkj FROM &orange..xxx_xxx_xxE where (asre(kkk_ix as format 'xxxx-xx') gff &bcbcb_hhaha.) and (axx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv.) ) (/* 1.select_no. FROM apple_x_Ex_x */ proc sql; "TRUuuuth") (SELECT abc AS kfcccc, mcfg_2_ AS dokn, b_rr, jjhj_vf, jjjk_hj, fjjh_jhjkj FROM &bfbd..pear_xxx_xxE where (afdfe(kkffk_ix as format 'xxxxd-xx') gdaff &bcdadabcb_hdahaha.) and (axx(xx_ix as format 'xxxx-xx') lec &jgjsdfdf_vnv.) ) After which, if I'm able to get the desired output above, I will then compare these lines against the word set() and the fieldset set(). Any help would really be appreciated here..thank you -- https://mail.python.org/mailman/listinfo/python-list