Encryption in your software without key pairs

Any discussion of encryption from a programmer’s perspective almost inevitably leads to public key encryption. This elaborate handshaking process ensures a totally private connection between two distinct parties, and is the basis for SSL/TLS encryption. The most common encryption programs are GnuPG and OpenPGP.

However, what about when the only party involved is yourself? Ever run into a situation where you’re own software is the only thing encrypting and decrypting a set of files? After all, at some point your code has to read and use both public and private keys. In cases where the sender and recipient are the same party in the same location, the entire benefit of public/private key pairs goes out the window, and all the complex handshaking illustrated above becomes pretty meaningless.

Note that I said “same location;” if the data is encrypted on one server, then delivered and decrypted on another, key pairs still offer extra security. A breach on the server using the public key won’t gain the attackers access to the data, as they would still require the private key to decrypt the data.

So assuming we have a piece of software that performs encryption and decryption in a single location, it makes sense to use a single key. This single-passphrase encryption is called symmetrical encryption. The problem: the vast majority of encryption discussion out there cover key pairs, or symmetrical on an individual, “one-off” basis. However you can use GPG to perform symmetrical encryption that…

  • Uses strong encryption, like AES 256-bit
  • Does not make any prompts
  • Tucks passphrase safely away from web access (in the case of a web app).
  • Is just as strong as key pairs (assuming you maintain the security of the key)

Enough chatter, here’s the command line call:

gpg --quiet --no-tty --cipher-algo AES256 --passphrase-file /secure/path/.passphrase -c important_file

--quiet and --no-tty
Ensure that GPG doesn’t output anything to the terminal, including errors. These should be added after you’ve thoroughly tested your setup.

--cipher-algo
Allows you to choose which encryption method to use. GPG uses CAST5 by default, which is good, but not nearly as strong as AES 256-bit, which we use here by including AES256 after the parameter.

Using AES256 also allows us to avoid getting the WARNING: message was not integrity protected warning message when we decrypt our files. This warning only appears when doing symmetrical encryption using a cipher that’s 64 bits or smaller. A cursory web search reveals a lot of people run into this issue. Switching algorithms to AES256 is enough to avoid the problem entirely, as the cipher is now 256 bits in length. Alternatively, you can pass the --force-mdc parameter.

--passphrase-file
Tells GPG that your passphrase is stored in the specified file. You can name this file whatever you want, and locate it wherever you’d like. However, the user under which you perform the encryption must have access to the file. So in the case of a web-based program, you probably need to grant read access to your web server user (e.g. apache or www-data). Read access is all you need, and should be chmod’d to something like 640 or lower. 400 (read only by user) is ideal.

You can further improve the situation with two small extra steps. First, make sure the file sits outside the web root of your site (i.e. not under public_html, www, or whatever). Second, prefix the entire file name with a period. Looking at the example again we see that the file is actually called .passphrase. This only makes Linux consider the file hidden, and thus invisible to typical navigation. But while it isn’t true security, a little bit of “security through obscurity” on top proper permissions and location doesn’t hurt.

Finally, remember that since you are storing the passphrase in a file, you have almost no limits on the length and complexity of the password. Maximize that benefit by picking a really complex passphrase. No words, upper and lowercase, symbols. Better yet, let GPG do the work for you:

Here’s a quick hack for generating a very secure passphrase using GnuPG itself. The passphrase will not be easy to remember or type, but it will be very secure. The hack generates 16 random binary bytes using GnuPG then converts them to base64, again using GnuPG. The final sed command strips out the headers leaving a single line that can be used as a passphrase:

gpg --gen-random 1 16 | gpg --enarmor | sed -n 5p

You can easily pipe this text directly into your new passphrase file:

gpg --gen-random 1 16 | gpg --enarmor | sed -n 5p > /secure/path/.passphrase

Extract email addresses from tags

Ran into another cool hurdle today for my Fwd:Vault development. When I grab the message content to archive it in the system, first thing I do is scrub it out to ensure that (a) it displays properly, and (b) there are no misbehaving characters. I grab both plain text and HTML email formats (if present), so the scrubbing process is a little different in each case. For the plain text, I take some extra steps to ensure there is no HTML whatsoever. Naturally, at one point this involves a call to PHP’s ultra-useful strip_tags() function.

However, in the course of testing today, I realized that when a message is forwarded, sometimes the forward header will encode the email address, which gets stripped when I process the message. Allow me to demonstrate. Here’s the body an example message that someone might send to Fwd:Vault for safe keeping…

---------- Forwarded message ----------
From: "Office Flirt" <flirt@example.com>
Date: Wed, Jan 14, 2009 at 10:14 AM
Subject: Delete those images
To: you@example.com

My boss is sniffing around. I want you to delete those pictures I sent you right away.

Signed,
Office Flirt

Obviously you’re tucking this one away in Fwd:Vault to provide a little CYA-insurance when the boss calls you into his office. Good call. Now, before today, this message would come out of the scrubbing process looking like this:

---------- Forwarded message ----------
From: Office Flirt
Date: Wed, Jan 14, 2009 at 10:14 AM
Subject:
To: you@example.com
...

Look at the bolded red line. The email address is gone. You don’t have any other copies of it, so your boss doesn’t believe your story, and you get the blame. You’re forced to attend one of those god-awful sexual harassment classes. Fail.

So, what happened? Remember, you are looking at the body of a message in plain text. That “Forwarded message” block at the beginning is just part of the body text. So when the text was scrubbed by strip_tags(), the function picked it up as just another tag, which it dutifully removed.

To handle this situation, I came up with a piece of code that will look for email addresses in “tagged format” — i.e. surrounded by < and > — and remove the surrounding symbols, leaving us with harmless text.

$test = 'some surrounding text';
$test = preg_replace( "/(\<)(.+@[^\(\);:,<>]+\.[a-zA-Z]{2,4})(\>)/",
                      ' $2 ',
                      $test);
$test = preg_replace('/[\s]+/', '', $test);
echo $test;

Let’s break this down. First, we have a regular expression that identifies email addresses: .+@[^\(\);:,<>]+\.[a-zA-Z]{2,4}. This is the same expression set in the example on the Quanetic Software Regular Expression Tester (an excellent tool). We surround that in parentheses to isolate it as a subpattern. Then on either end of the expression, we tack on more regex voodoo to look for tag syntax: (\<) and (\>). These also get parentheses to identify them as subpatterns. Once its finished, we have an expression that will only match addresses wrapped in tagging structure.

The second argument in preg_replace() is the replacement, or what we should replace any matches with. In this case, we’ve isolated the address from the tags using subpatterns. So all we need to do is make a single call to the proper reference, which is $2, because its the second set of parentheses in the expression. Confused? You can learn about subpatterns on the PHP manual page for preg_replace().

Note the spaces around the $2 in the second argument. Sometimes the address will not have any spaces between the person’s name and the actual address. This could lead to the address being combined with the name which, in the case of Fwd:Vault, would screw up our search indexing. So we add spaces during the replace, then make a second call to preg_replace() to eliminate extra spaces: $test = preg_replace('/[\s]+/', '', $test);.

Legal Disclaimer: In case you do end up using Fwd:Vault when it launches, I’m fairly certain the service wouldn’t be liable in this silly hypothetical. Just make sure you read the terms before you sign up if you play the field at your office. Sorry to everyone going “duh” right now; it’s a sue-happy world.

Update: When I went to implement this change today, I discovered that the code was catching newlines (\n or \r) in the crossfire. It was actually due to the second call to preg_replace(), the “\s” character class includes not only spaces but line terminators as well. Oops. The revised version looks like this:

$body_text = preg_replace('/[ ]{2,}/', ' ', $body_text);

Versatile random string generator

A cursory glance around the web will reveal a ton of PHP-based random string generators. With enough looking you’ll find generators that do any of the following:

  • Strings with letters
  • Strings with numbers
  • Strings with letters and numbers
  • Uppercase, lowercase
  • Fixed, variable length strings
  • Option to include symbols

Problem is, none of them ever incorporated all this functionality. Every generator was a hodgepodge, e.g. some forced inclusion of numbers, or allowed either upper or lowercase, not both. All of these are great options, and it would be great to have all of them at your disposal in one tight function.

No more! I got so sick of finding shortcomings that I finally just put it all together myself. The following function allows you to choose a string length, as well as the character sets to use when building your random string. You can even include a set more than once, giving greater usage weight to certain characters. Finally, complete flexibility!

function koehl_generator($length = 10, $charsets = 'lower') {
  $upper = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
  $lower = 'abcdefghijklmnopqrstuvwxyz';
  $numbers = '1234567890';
  $symbols = '!@#$%^&*()-_=+<>,.?/:;[]{}|~';
 
  if (!is_array($charsets)) $charsets = array($charsets);
  $charset_pool = array();
  foreach ($charsets as $set) {
    $charset_pool[] = $$set;
  }
  $max = (sizeof($charset_pool) - 1);
  $v = '';
  for ($i = 0; $i < $length; $i++) {
    $this_pool = $charset_pool[rand(0, $max)];
    $v .= $this_pool[(rand() % strlen($this_pool))];
  }
  return $v;
}
 
// usage examples
echo koehl_generator(10, 'upper') . '<br />';
echo koehl_generator(10, 'lower') . '<br />';
echo koehl_generator(10, 'numbers') . '<br />';
echo koehl_generator(10, 'symbols') . '<br />';
echo koehl_generator(10, array('upper', 'lower')) . '<br />';
// order of the array in the second argument does not matter
echo koehl_generator(15, array('lower', 'upper', 'numbers')) . '<br />'; 
echo koehl_generator(15, array('lower', 'numbers', 'upper')) . '<br />'; 
echo koehl_generator(15, array('upper', 'lower', 'numbers', 'symbols')) . '<br />';
echo koehl_generator(15, array('lower', 'symbols')) . '<br />';
// note how often letters appear in the next one
echo koehl_generator(20, array('lower', 'lower', 'numbers')) . '<br />';

Adding your own custom set is easy too. Include a new set following the syntax of the existing ones, then call your new set by variable name in the second argument…

// add this to the top of the function...
$the_basics = 'abc123';
 
// then use it like so...
echo koehl_generator(5, 'the_basics') . '<br />';
echo koehl_generator(10, array('the_basics', 'symbols')) . '<br />';

If you find this useful, please be sure to give me some link love, just a reference URL to this page in your code would be fine. Using the share buttons below would be great as well!


Smash bugs, don’t treat symptoms

I previously discussed why certain “automagical” features can sometimes facilitate the creation of crappy code. However they only create a possibility of crappy code. Today I want to warn you against a practice that will create crappy code 100% of the time.

First a scenario – you have written a program in your language of choice. It’s fairly complex, partially because of the basic needs of your client or employer, and partly because every project is a moving target to a certain extent. At some point in the logic flow, your code behaves aberrantly; let’s keep it really simple and say that it’s outputting dashes instead of spaces in a block of text. “Well these shouldn’t be here at this point,” you think. “That text was scrubbed out when it came out of the database.” You confirm the scrubbing occurs, and check some things along the way to the output. Everything checks out.

However there’s a huge nebulous area that you conveniently sidestep, a big ol’ chunk of code written by Larry. The same Larry who got fired last month for half-assing that reporting module for the marketing team. A piece of his code still sits between your perfect database setup and your equally perfect outputting logic. You don’t want to touch Larry’s code with a 10-foot pole. “The problem MUST be in there,” you decide. “I’ll just undo the text change on the other side and be done with it.”

In other words, you treated the symptom, and didn’t solve problem. This time-saving decision, while fairly innocent on its own, has far-reaching consequences for both your software and your own career as a developer. None of them are good.

Because you did not identify for certain where the problem lies, you have absolutely zero guarantee that Larry’s code is the problem at all. It could very well be in Larry’s code, but you didn’t look everywhere so you can’t say for sure.

The symptom you treated may well lead to a much larger problem. Perhaps it’s not only replacing spaces with dashes, but also truncating the text beyond a certain length. You won’t see that until a long-enough string passes by, and it may not pass by a person’s eyes for even longer. That’s the nasty thing about bugs, a human being must find and remove them. No matter what your philosopher-slash-uber coder friend says, the Matrix and its self-making code does not exist, so get in there and clean up the mess.

In short, with that one move, you’ve started down the path of writing crappy code. Keep taking that shortcut, and it won’t be long before you’re fired too, because your code will be a bug-ridden mess. Kind of like Larry, right? That because it’s the same path taken by Larry and every other lazy coder you’ve ever known.

The good news is that this path is easily avoidable: don’t be lazy. Do the work right the first time, stick with your syntax rules, and get to the root of every problem, every time.

Also, if the hypothetical scenario sounds eerily familiar, you might want to finish reading this post and go double-check that page slug creation code you wrote.


Why DRM will always be a bad idea

As if you really needed to hear this, but in case you…

  1. Don’t know what DRM is
  2. Are a music/movie industry person suffering a cranial-rectum issue
  3. Are a moron (most likely a #2 in denial)

…here’s a fancy visual:
steal this image


Automating SSH or SFTP in scripts

Recently I needed to automate copying a MySQL database to a backup server. We keep a copy of our site and DB on this box in the event that our main systems go down, or there’s a problem with our internet connection. It’s kind of like a poor man’s colocation setup. I actually prefer the setup over true colocation for the vast majority of small and medium-sized business, because it’s far simpler and requires far less overhead and continuous support.

When I started searching around for resources on how to automate the SFTP connection, I was hit in the face with tons of dead ends. Several Google searches were spitting back mailing list and forum archives of plenty of questions regarding how to create backup scripts that connect to a remote server via SFTP. If you are in this boat, read on.

Here’s the problem. At some point in the development of SFTP, the writers decided that storing access credentials in files as part of an automated process was a very bad idea. So they coded SFTP to bypass the password challenge when invoked from a script (aka the -b flag, which runs commands from a file).

Instead, they recommend that you create a private key pair between the two systems. This preemptive measure handshake eliminates the need for passwords entirely, making your code a bit simpler. It’s fairly easy to do but, of course, most developers groan at the thought of having to learn yet another technique, and looks for ways around the restriction. I did both, and recommend the key pair approach. I’ll describe both here, and let you decide for yourself.

SSH/SFTP connection without passwords

The following example is borrowed from an article on The Linux Problem Base, but there are several out there explaining the same approach.

First log in on A as user a and generate a pair of authentication keys. Do not enter a passphrase:

a@A:~> ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/a/.ssh/id_rsa):
Created directory '/home/a/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/a/.ssh/id_rsa.
Your public key has been saved in /home/a/.ssh/id_rsa.pub.
The key fingerprint is:
3e:4f:05:79:3a:9f:96:7c:3b:ad:e9:58:37:bc:37:e4 a@A

Now use ssh to create a directory ~/.ssh as user b on system B. The directory may already exist, which is fine:

a@A:~> ssh b@B mkdir -p .ssh
b@B's password:

Finally append a’s new public key to b@B:.ssh/authorized_keys and enter b’s password one last time. Note that you must be looking at the local directory in which you saved the key in step one.

cd /home/a/
a@A:~> cat .ssh/id_rsa.pub | ssh b@B 'cat >> .ssh/authorized_keys'
b@B's password:

From now on you can log into B as b from A as a without password. Try this to confirm:

a@A:~> ssh b@B hostname

It should return the hostname of system B without prompting for a password.

It’s painless and easy, and every SSH connection you make going forward requires less typing. However, if you still really want to use a password, you have two options.

Utilize the -o Flag

So the SFTP team made their stance clear, and backed it up with action. However, it’s not impossible to bypass the restriction. The -o flag allows you to access all the options available in the sshd_config file, so you can change any of them on the fly. Here we need to disable the batchmode directive, so your SFTP call would look something like this:

sftp -o "batchmode no" -b /tmp/bat user@host

I found this one on a random forum post, and it comes with an important warning:

Note that it must come *before* -b, which may be surprising – this is
due to ssh processing -o options as if they were read from the config
file – ssh_config(5) again:

The only problem here is that the password challenge gets sent back out to the command line, requiring normal keyboard interaction.

Use SSHPass

So if that’s still not good enough for you, check out a SourceForge project called SSHPass. From the link:

Sshpass is a tool for non-interactivly performing password authentication with SSH’s so called “interactive keyboard password authentication”. Most user should use SSH’s more secure public key authentiaction instead.

SSHPass is available from default Debian apt servers; I couldn’t find anything reliable on its availability through yum.

Proceed at your own risk. If you server allows any sort of public access, even to a large handful of outside users, I strongly recommend going the key route.

Update Feb 27, 2009: Reader pointed out that OpenSSH has a shortcut function, ssh-copy-id, to install your public key on a remote machine. Nice.


Turn off AVG e-mail signature

I am a huge fan of AVG Anti-Virus Free Edition. It provides the same level of virus protection as the pay-for packages—don’t be fooled, the differences are bells and whistles—and because it’s trimmed down it eats up less resources. I always install it as part of a comprehensive approach using several free Windows security tools.

However, how many times have you seen this at the bottom of an e-mail?

No virus found in this incoming message.
Checked by AVG – http://www.avg.com
Version: 8.0.169 / Virus Database: 270.6.14/1647 – Release Date: 9/2/2008 6:02 AM

This is especially great in e-mail conversations, where I’ll see this same text stacked up five, ten, fifteen times at the bottom. It’s a tremendous waste of space and makes scanning an e-mail conversation difficult to say the least. I don’t need to know that you have anti-virus software installed. If I did, I wouldn’t have installed it on my own computer. Duh.

AVG does this by default, and they don’t make it obvious at install time how to disable it. It may have something to do with the shameless plug they put in there, but I could be wrong.

If you want to get rid of the stupid thing, here’s how to do it in version 8…

  1. Double-click the AVG icon in your taskbar to bring up the control panel window.
  2. Go the menu bar, choose Tools, then Advanced Settings…
  3. Choose E-mail Scanner from the left-hand menu
  4. In the right-hand pane, clear the checkbox labeled Certify e-mail.
  5. Click the OK button at the bottom