Doing some scraping with perl, I got a page served by CloudFlare that obfuscate emails to avoid bots collecting this data. Of course, What I need? yes, the emails…
This measures just raise the difficulty to get the information but not make it impossible.
Googling a little I found a 2 years old post from Jesse Li who explains the obfuscation process and solve this with JavaScript.
function hex_at(str, index) {
var r = str.substr(index, 2);
return parseInt(r, 16);
}
function decrypt(ciphertext) {
var output = "";
var key = hex_at(ciphertext, 0);
for(var i = 2; i < ciphertext.length; i += 2) {
var plaintext = hex_at(ciphertext, i) ^ key;
output += String.fromCharCode(plaintext);
}
output = decodeURIComponent(escape(output));
return output;
}
With a little of programming translation I did it with Perl.
#!/usr/bin/perl -w
my $encodedEmail = $ARGV[0];
sub unDecode {
return my $val = sprintf("%d", hex(substr $_[0], $_[1], 2));
}
sub deCrypt {
my $email = "";
my $key = unDecode($_[0], 0);
for ( my $i = 2; $i < length $_[0] ; $i += 2) {
my $a = unDecode($_[0], $i);
my $char = chr(int($a) ^ int($key));
$email = $email . $char;
}
return $email;
}
print deCrypt($encodedEmail) . "\n";
At this point I just integrated to me main scraper program and VOILA!
This code is on my github