Those of us who have websites usually want to be able to use our site to communicate our email address to those reading it. However, we've known for a long time now that spammers use bots to crawl the web, harvesting email addresses from pages. Having your email address visible in plain text on the web is therefore a pretty certain way to get a whole load of spam in your inbox.
Most of us who are aware of this therefore try to find ways to communicate our email address to our readers without having it visible in plain text. The usual approach taken is by presenting the information in such a way as the last step required to actually know the email address is difficult for a computer to achieve (at least in an automated way) but easy for a human.
Examples of this include using extra punctuation or text, which should obviously be removed, such as the text "NOSPAM" directly after the @, or surrounding the @ with curly brackets. Other people spell out the punctuation in English, so an address might contain the words "at bham dot ac dot uk", for example. These are still fairly easy for a computer to parse though, given that the designer of the bot has thought of the possibility ahead of time.
However, thanks to Zeyn Saigol and Dan Winterstein, I've now been introduced to another, slightly cleverer way of obfuscating an email address (or any text generally to be honest), in such a way as it renders completely normally for the human, but the source of the web page contains nothing useful for a bot. The basic idea is to construct the email address on the fly from variables, containing character codes for the bits that might give it away as being an email address. Then write this out as HTML.
This could be implemented however you like, but in this example, I've used a javascript function. First, create a small function and place it either in a <script> element in the <head> section of your page, or somewhere else you might want to store javascript functions, such as an included .js file. My function looks like this:
function email() {
domain1 = "letter";
domain2 = "boxes."
domain3 = "org";
addr = "p" + "r" + "lewis" + "@" + domain1 + domain2 + domain3;
return("<a href='mailto:"+addr+"'>"+addr+"</a>");
}
You can probably choose the level of obfuscation to suit yourself in this function, but the key thing is that it returns a mailto link.
Then, whenever you want to include your email in your page, so long as you have included this function somewhere, you can just do this:
<script type="text/javascript">
document.write(email());
</script>
So you can see how this looks, here's it in action:
If you want to roll your own one of these, you can see a full list of the character codes here.
Of course, any crawler which actually interprets the javascript and renders the page will still be able to harvest the accurate text, but I don't think many bots will bother to do this. Please let me know if you think this is a misguided opinion!
Comments
well, every method that
well, every method that became published soon was worthless. I implemented several anti-blogspam measures I heard about only to see them being tackled a few months later.
Hope I haven't upset anyone
Hope I haven't upset anyone by writing this down then!
Though at least it will serve to require those operating bots to use more computational resources to harvest their emails.
If I remember correctly, I've
If I remember correctly, I've seen this scheme before. Around the year '99-00, to be precise. Instead of using javascript to write it out, folks used to use html elements, which uses the same character codes. For some reason, it didn't become very popular. I'm not sure if that's because it was immediately 'grokked' by the bots or because folks just can't be bothered to use character codes.
I've used
I've used http://hivelogic.com/enkoder for a while now. Of course, the more it gets used the bigger a target it becomes but it is massively obfurscated.
There's an obvious accessibility issue with using javascript to convey information. Where it's not available I'd have an image or 'si@guess the rest'.
Routinely using 'mail AT domain dot com' seems dumb. It's no harder to harvest than 'mail@domain.com'. I keep meaning to add spam poison to my site to screw up harvesters, like http://www.spampoison.com but since the effect is very indirect so I don't bother.
Like your CAPTCHA.
Post new comment