Steve Hayes wrote: > On Sat, 23 May 2015 19:01:55 +1000, Chris Angelico <ros...@gmail.com> > wrote: > >>On Sat, May 23, 2015 at 4:46 PM, savitha devi <savith...@gmail.com> wrote: >>> I am developing a web scraper code using HTMLParser. I need to extract >>> text/email address from java script with in the HTMLCode.I am beginner level >>> in python coding and totally lost here. Need some help on this. The java >>> script code is as below: >>> >>> <script type='text/javascript'> >>> //<!-- >>> document.getElementById('cloak48218').innerHTML = ''; >>> var prefix = 'ma' + 'il' + 'to'; >>> var path = 'hr' + 'ef' + '='; >>> var addy48218 = 'info' + '@'; >>> addy48218 = addy48218 + 'tsv-neuried' + '.' + >>> 'de'; >>> document.getElementById('cloak48218').innerHTML += '<a ' + path + '\'' + >>> prefix + ':' + addy48218 + '\'>' + addy48218+'<\/a>'; >>> //--> >> >>This is deliberately being done to prevent scripted usage. What >>exactly are you needing to do this for? > > To sell addresses to spammers, of course.
The boob that uses this javascripted obfuscation (by slicing up the URL across variables and using concatenation within a variable) hasn't a clue that the javascript or user clicking on a URL will still have to eventually go to the destination so it will still get blocked. Duh! Nothing is actually cloaked by the javascript (it's just another means of building up the <A> tag) and the URL string (even if it used a decimal value instead of IP-dotted) still has to connect to somewhere and that gets detected and blocked. Slicing up a URL across variables and concantenation within a variable is a child's ploy to obfuscate. Apparently savitha can't even distinguish between an e-mail address and a URL string. -- https://mail.python.org/mailman/listinfo/python-list