Since a few days I've been experimenting with a construct that enables me to send the sourcecode of the web page I'm reading through a Python script and then into a new tab in Mozilla. The new tab is automatically opened so the process feels very natural, although there's a lot of reading, filtering and writing behind the scene.
I want to do three things with this post: A) Explain the process so that people can try it for themselves and say "Hey stupid, I've been doing the same thing with greasemonkey for ages", or maybe "You're great, this is easy to see, since the crux of the biscuit is the apostrophe." Both kind of comments are very welcome. B) Explain why I want such a thing. C) If this approach is still valid after all the before, ask help for writing a better Python htmlfilter.py So here we go: A) Explain the process We need : - mozilla firefox http://en-us.www.mozilla.com/en-US/ - add-on viewsourcewith https://addons.mozilla.org/firefox/394/ - batch file (on windows): (htmfilter.bat) d:\python25\python.exe D:\Python25\Scripts\htmlfilter.py "%1" > out.html start out.html - a python script: #htmfilter.py import sys def htmlfilter(fname, skip = []): f = file(fname) data = f.read() L = [] for i,x in enumerate(data): if x == '<': j = i elif x =='>': L.append((j,i)) R = list(data) for i,j in reversed(L): s = data[i:j+1] for x in skip: if x in s: R[i:j+1] = ' ' break return ''.join(R) def test(): if len(sys.argv) == 2: skip = ['div','table'] fname = sys.argv[1].strip() print htmlfilter(fname,skip) if __name__=='__main__': test() Now install the htmlfilter.py file in your Python scripts dir and adapt the batchfile to point to it. To use the viewsourcewith add-on to open the batchfile: Go to some webpage, left click and view the source with the batchfile. B) Explain why I want such a thing. OK maybe this should have been the thing to start with, but hey it's such an interesting technique it's almost a waste no to give it a chance before my idea is dissed :-) Most web pages I visit lately are taking so much room for ads (even with adblocker installed) that the mere 20 columns of text that are available for reading are slowing me down unacceptably. I have tried clicking 'print this' or 'printer friendly' or using 'no style' from the mozilla menu and switching back again for other pages but it was tedious to say the least. Every webpage has different conventions. In the end I just started editing web pages' source code by hand, cutting out the beef and saving it as a html file with only text, no scripts or formatting. But that was also not very satisfying because raw web pages are *big*. Then I found out I often could just replace all 'table' or 'div' elements with a space and the page -although not very html compliant any more- still loads and often the text looks a lot better. This worked for at least 50 percent of the pages and restored my autonomy and independence in reading web pages! (Which I do a lot by the way, maybe for most people the problem is not very irritating, because they don't read as much? Tell me that too, I want to know :-) C) Ask help writing a better Python htmlfilter.py Please. You see the code for yourself, this must be done better :-) A. -- http://mail.python.org/mailman/listinfo/python-list