The Unforgiving Minute
Morals are a personal affair; in the war of righteousness every man fights for his own hand.
Robert Louis Stevenson

Tuesday, June 5, 2007

The much-delayed blogging privacy post

Since no less than 3 of my friends have recently had run-ins with online stalkers/psychos/mutants, I decided it might be a good idea to post my own humble little guide to blogging privacy. This post is targeted at folks who use (or are thinking about using) a web-hosting package to maintain a blog on their own domain, using a blogging package like WordPress or Moveable Type. If you’re on Google Blogger, WordPress.com, Blog-City, or a similar site, this guide won’t be as helpful.This is BY NO MEANS a comprehensive resource and I am NOT a professional web-jockey. I consider myself a reasonably well-informed amateur. Everything here is accurate to the best of my knowledge, but that knowledge is limited. If you see something stupid/wrong/dangerous in this article, or if I leave out something important, please comment or email me and I’ll make corrections.

Basically, this is a collection of tips for people who blog anonymously/pseudonymously and want to keep it that way. This stuff should be especially useful if you have, or anticipate having, problems with some nutjob who wants to stalk or harrass you via the internet.

Alright, so got a domain name, bought a web hosting package, and set up your blogging software. I’ll assume that you were smart enough to not use your full legal name as your domain name or to post all your personal information on your “About” page. There’s a few more non-obvious things you should do to help preserve your anonymity and to make life easier if you attract your very own net-nutjob.

Before I dig into the meat ‘n’ potatoes, here’s the most important thing I can tell you: 100% privacy on the Internet DOES NOT EXIST. Common sense and a little technical savvy can help, but there’s always someone out there who knows more than you. If you’re in the Witness Protection Program or have a crazy ex-spouse who will stop at nothing to track you down, you might wanna rethink the whole public blogging thing. Likewise, if you’re going to write inflammatory things about hot-button issues that tend to attract nutcases (abortion, religious fundamentalism, etc.), you should probably do some further homework on protecting your identity before the death threats start rolling in.

Alright, on with the show…

WHOIS

When you register a domain name (e.g., yournamehere.com), the registrar (the company that sold you your domain name; it may or may not be the same company that hosts your site) enters some information about the domain into a WHOIS database. If you want to know all about the wonderful world of WHOIS, Wikipedia has a good entry on the topic. For our purposes, it’s enough to know that your WHOIS entry generally contains whatever personal information you used to buy the domain, and it can be viewed by anyone with an Internet connection. If you’re planning to blog under a pseudonym, you may notice a slight problem here: your name, address, phone number and email address will be listed in a publicly-accessible database!

What to do? Most domain registrars (companies like GoDaddy.com, NameCheap.com, etc.) offer a WHOIS-protection service whereby they enter their own information into the WHOIS database, instead of yours. This service may cost you a few bucks per year or it may be bundled in with your hosting plan. When you register your domain or set up your site, you’ll want to make sure you activate this protection right away. If your site is already up and running, you should to contact your domain registrar and see about adding this protection. Better late than never.

.htaccess

The magical .htaccess file is a simple text document that lets you do all kinds of weird and wonderful things with your website. The web is full of guides like this one where you can learn the detailed ins and outs. For this article, I’ll just cover two really useful things your .htaccess file can do for you. Note that this section ONLY applies if your hosting company uses Apache as their web server software. When in doubt, ask someone who knows.

First, you can use your .htaccess file to block visitors from a certain IP address or a block of IP addresses (how do you know which IP addresses to block? We’ll cover that later). Why would you want to do this? Well, if your site attracts the attention of some nutbag who decides to give you a hard time, you can block his IP address to keep him from even seeing your site, let alone polluting your blog with unwanted comments.

Note that this is NOT a foolproof technique. A tech-savvy individual could use a variety of tricks to get around a simple IP ban, but probably 95% of the general population isn’t that knowledgeable. And hey, a “403 Forbidden” message is a pretty good way of telling someone to get lost.

Second, you can use your .htaccess file to prevent people from randomly browsing through the directories of your site. This could be important if you use your website to host some personal files that you want to keep hidden from regular visitors. For example, let’s say you take pictures from Christmas dinner and post them on your site for your relatives to enjoy. You even build a nice little gallery for them at http://www.not-a-real.site/pictures/xmasphotos.html. Even if you don’t EVER post a public link to that page, someone poking around on your site could easily try to access http://www.not-a-real.site/pictures/ and see every file in that directory.

Yes, people really do this. In my logfiles I see the occasional entry from someone trying to access directories like “photos”, “pictures”, “videos”, “private”, etc. You get the idea. As an aside, there’s another lesson to take from this: you might want to set up a second domain name for personal stuff and keep it COMPLETELY separate from your blog. Domain names are cheap and most hosting packages let you run multiple domains at no extra charge.

The solution to this problem is really simple; by adding a single line of text to your .htaccess file you can disable directory listing. Then, if some snooper tries to access http://www.not-a-real.site/pictures/, he’ll get a nice “403 Forbidden” message instead of links to your family photos.

So how do you actually put this into practice? here’s a little example. We’ll say that you want to (a) block some nutjob coming from the IP address 123.456.789.123, and (b) prevent directory browsing across your entire site. To do this, you’re going to use a text editor (If you’re on Windows, Notepad will work fine. On Mac OS X, TextEdit is perfect.) to create a new text file that looks like this:

order allow,deny
deny from 123.456.789.123
allow from all

Options -Indexes

The first three lines block that specific IP address, the last line turns off directory browsing.

Save the file with the name “htaccess”, WITHOUT the period at the beginning, because some computers treat filenames starting with a period as invisible files. When you upload the file to your website, change the name to “.htaccess”, WITH the period at the beginning. Your .htaccess file should be uploaded to the root (highest) directory of your website.

There’s much more you can do with .htaccess, including cool things like password protection, but that’s beyond the scope of this post.

robots.txt, Caching, and the Wayback Machine

When you make a website available on the internet, not all of your visitors will be humans; some will be software programs called bots that automatically browse through your site so it can be indexed by search engines like Yahoo and Google. Some of these programs actually save complete copies of your site and then make the copies available from their own servers (check out the Internet Archive’s Wayback Machine for an example of this; Google and Yahoo also offer cached pages, but without the timeline functionality). Fortunately, you have a few ways of controlling bots’ behavior on your site.

By creating a text file named robots.txt and uploading it to your website, you get to boss around the bots, telling them what they can and can’t do.

And here you have to make a judgement call. If you want search engines to send traffic to your blog, first you have to let their bots visit your site. On the other hand, allowing bots in means that copies of your site could end up getting stored all over the ‘net, which could potentially lead to headaches down the road.

To complicate matters further, this isn’t an all-or-nothing game; you can choose which bots to allow in and what they’re allowed to see. For all the gritty details, check out robotstxt.org and Wikipedia. If you just want to keep ALL robots off your site (and remember, this will keep people from finding your site via search engines!), just create a text file that looks like this:

User-agent: *
Disallow: /

Save it as “robots.txt” and upload it to the root directory of your website, and you’re done. Oh, and you should know that compliance with robots.txt is voluntary; reputable companies generally obey it, but they don’t HAVE to.

BUT WAIT, THERE’S MORE! It’s also possible to have your cake and eat it, too. If you want your site to get indexed but NOT cached, you can add a little snippet of HTML to the <head> section of your pages. The necessary code is:

<meta name=”robots” content=”noarchive”>

For XHTML, change that to:

<meta name=”robots” content=”noarchive” />

If you’re using WordPress for your blogging, that should go in the ‘header.php’ file of your theme.

And why should you care about this stuff? Well, one of my friends recently had to scour the ‘net, deleting blog entries and comments to stop a crazy ex-boyfriend from keeping tabs on her. That’s a difficult enough job; having to deal with cached copies scattered all over the internet makes it just about impossible.

Server Logs and Page Counters

It’s useful to know who’s visiting your site and what they’re doing there. Pretty much everyone seems to use Sitemeter. I don’t, due to (a) its flakiness and lack of tech support (mine just stopped working completely, twice; support requests went unanswered) (b) potential privacy concerns raised by Jeff of Alphecca, and (c) their habit of blocking out the last octet of visitors’ IP addresses unless you pony up the bucks for their premium service. I switched to StatCounter and don’t have any complaints so far.

Even better, your hosting provider should offer some means of monitoring your server logs; it may be enabled by default, or you might have to turn it on yourself. I suggest you look into it; logs can come in handy.

Aside from the fun of seeing how many visitors you get and where they’re coming from, keeping an eye on your logs can help you out if you catch the fancy of one of the web’s many miscreants. I check mine routinely to spot people trying to poke around in the dark corners of my site (as covered in the .htaccess section above). Such visitors usually get blacklisted via .htaccess.

Logs are also invaluable for dealing with trolls. During the recent festivities involving a pretentious, verbose white supremacist who trolled several of my blog-buddies, I noticed the guy cropping up in my logs. In fact, he was spending HOURS reading through everything here. Not exactly normal, but still harmless. Then he started trying to bait me and I didn’t particularly care to play along. I had his IP address from my logs, so it took me all of 30 seconds to blacklist him. Don’t feed the trolls, kids!

And perhaps most importantly, should you find yourself in the scary position of having a real, live psycho on your hands, someone who crosses the line from annoying to outright threatening, a nice stack of server logs could be useful evidence should you be forced to get the police or courts involved. Fortunately, it’s never happened to me and I don’t expect it will. But hey, I’m prepared.

So that’s it, kids. If you’re a professional IT geek, you probably already knew all of this. But if you’re a casual blogger who cares about online privacy, hopefully you found something helpful amidst my gibberish. Questions/comments/corrections are most welcome, as always.

posted by TD at 9:51 pm  

4 Comments »

  1. I would also recommend that html and images be turned off in any email clients and web-based services that a person uses. The reason is that it can be used to fetch an image (or something else) from a server. This will result in your IP address being in that server’s logs.

    On its own, it’s not a major threat, but it can be used to find the general location of a person. Coupled with other information, it can be a threat.

    Comment by Alcibiades — June 6, 2007 @ 1:24 am

  2. This is very, very useful information and I’m sure tons of people don’t know about blogging privacy. Consider please providing your expertise in article form to our site. It’s just the kind of information there’s not enough of.
    Thanks for sharing

    Comment by Sus — June 22, 2007 @ 6:16 am

  3. Tips to add to TD’s privacy post…

    TD, over at Unforgiving Minute, posted a few tips for those people who “…blog anonymously/pseudonymously and want to keep it that way. ” I wanted to add a few more tips, but decided to make it into a blog entry instead. I suggest that…

    Trackback by Standard Mischief — July 29, 2007 @ 12:21 pm

  4. [...] you’re considering the switch (and you should be!) I’d also suggest reading my piece on protecting your privacy as a blogger. posted by TD at 10:47 pm [...]

    Pingback by The Unforgiving Minute » A most generous offer. — August 5, 2008 @ 10:47 pm

RSS feed for comments on this post. TrackBack URI

Leave a comment

18 queries. 1.600 seconds. Powered by WordPress

Valid XHTML 1.0 Transitional Valid CSS!


Stats