On May 27, 2002 at 14:20, John Belmonte wrote: > > The AddressModifyCode works on the raw data. As for using > > "." is address obfuscation, it is a very weak form since any > > decent address harvester would expand entity references before > > doing detection. Why not use something like: > > > > <AddressModifyCode> > > s/\./ dot /g; > > s/\@/ AT /g; > > </AddressModifyCode> > > I think both entities and dot/at are equally weak against harvesters. > Entities have the advantage of maintaining address appearance.
Well, all obfuscations are really weak. Either the people who write the harvesters are not too bright, or they are, and are harvesting the addresses by de-obfuscating the data. Or, they do not care since they get alot of hits regular hits anyway. Since entity reference resolution is a standard thing to do when parsing HTML/XML, it seems to be the weakest of all obfuscations. Munging the address is better since it requires some analysis by the harvester developer to determine what heuristics should be added to de-obfuscate data. Resolving entity references is a no-brainer. If you really need want to use entity references, modify mhonarc's htmlize() routine to convert '.'s, '@'s, et. al. to entity references. --ewh --ewh