Building a complex system? Take easy steps.

After launching Fwd:Vault last month, it’s been a race to add the necessary features and functions to take the service broader. First on the list was more subscription tiers. I launched with just two: free and “unlimited everything.” I did this because, well, it was easy.

Your instinct may be to dismiss my decision as laziness, but hear me out. I built most of the base site with just 1 state: free (remember that unlimited free beta period last year?). That allowed me to — rightly — focus purely on features, functions, bugs, etc. Dealing with subscription tiers at the same time would have clouded everything, slowing everything down and likely leading to more rewriting. Staying focused allowed me to get the cornerstone stuff right before building on top of it.

I applied the same thought process when it came time to offer paid options. The game plan has always been to have three paid options, plus the free account. However instead of initially coding four possible user states, I started with just two: free or paid.

This makes my job as a developer much more focused. There’s a LOT of logic in a service like Fwd:Vault focused explicitly on subscriptions: access permissions, showing/hiding upgrade options, setting quota restrictions, security checks to prevent hackarounds from unscrupulous users. The functionality of almost every page is affected by the user’s free/paying status, and don’t even get me started on the work it takes to process credit cards. You have to be doubly triply careful when dealing with people’s personal data like that. On and on. Getting the basics in place takes a lot of forethought and coding.

So instead of thinking about all this stuff in four dimensions — free, option 1, option 2, option 3 — I can cover most everything in just two — free or paid — and then come back later to fill in the holes for the other tiers.

Complexity is your enemy as a developer. Each task must be as tightly focused as possible. The tighter your focus, the less chance you’ll have to introduce bugs. Adding more later may require rewrites, but they are far far easier than rewriting the big sloppy mess you get when biting off more than you can chew.

With the basic subscription and tier logic in place, it’s a far simpler matter to expand the options out to infinity (though we’ll start with four). Expect to see the new pricing options in a few weeks.

Looking for more to read? There’s a new post on the Fwd:Vault Blog that details the most unobtrusive disk defragmenting process I’ve seen (that I also use for my own systems).


jQuery 1.4 released

The latest and greatest version of jQuery, version 1.4, was released on January 14, the birthday of jQuery’s original launch. Bugfixes and improvements abound!

The jQuery team has put together a site devoted to the new version, called the 14 days of jQuery, covering the major version changes as well as infrastructure updates coinciding with the new release. For example, the documentation site has been completely redesigned, and been moved to it’s own subdomain home, api.jquery.com. Links from the primary jquery.com site should be updated within the next week. With video demos of new features, Q&A’s with the core team (including founder John Resig), it’s well-worth checking out for every jQuery developer.


A vet’s perspective on Michael Vick

Everyone at this point knows the story of Michael Vick (quick summary if you don’t: he beat dogs, then got his high-paying job back after getting out of prison). Now the Philadelphia Eagles have given this guy a damn award.

I could sit here and articulate this guy’s rise-fall-rise-again story as a sign of the apocalypse, but I think most people would find it more interesting to hear this one from a veterinarian’s perspective. My wife Jennifer has a writeup on the matter, so if you ever wanted to know what someone in the industry thinks of Vick, I encourage you to check it out.


MySQL founder Michael Widenius concerned about sale to Oracle

In case you haven’t heard, Sun is being bought by Oracle. After dancing around the issue in blog posts over the past 8 months, MySQL developer-founder Michael “Monty” Widenius finally comes out and adamantly opposes MySQL’s role in the sale.

In a Dec. 12 blog post, Widenius tries to rally open source MySQL supporters in an effort to seek assurances from Oracle that the project will, in fact, stay open source. He makes a good case for a future Oracle decision to limit or close off the open source elements:

Oracle [has] to lower prices all the time to compete with MySQL when companies start new projects. Some companies even migrate existing projects from Oracle to MySQL to save money. Of course Oracle has a lot more features, but MySQL can already do a lot of things for which Oracle is often used…So I just don’t buy it that Oracle will be a good home for MySQL. A weak MySQL is worth about one billion dollars per year to Oracle, maybe more. A strong MySQL could never generate enough income for Oracle that they would want to cannibalize their real cash cow.

Anyone who’s loosely familiar with open source software knows that the community can execute the almighty fork, just pick up the code and go. But Widenius believes the code is only a portion of the equation, and that the economy around MySQL is vastly more important. Richard Stallman penned a letter in conjunction with Knowledge Ecology International (KEI) and the Open Rights Group (ORG) that succinctly describes the issue:

MySQL is made available to the public in two parallel ways. Most users obtain it as free/libre software under the GNU General Public License (GPL) version 2; the code is released in this way gratis. MySQL is also available under a different, proprietary license for a fee.

This approach was able to provide (1) an attractive platform for developers looking to use FLOSS, and secured MySQL enormous mind share, particularly in supporting content rich web pages and other Internet applications, and (2) the ability for paying clientèle to combine and distribute MySQL in customizations that they do not want to make available to the public as free/libre software under the GPL. With excellent management and considerable trust within the user community, MySQL became the gold standard for web based FLOSS database applications.

Bolding my emphasis, which is the key here. Most MySQL users don’t need licenses, for two reasons. First, other OSS projects naturally play very nicely with MySQL’s matching open source license. Second, websites that use proprietary code in conjunction with MySQL are clear because nothing is actually distributed, users simply visit a site. My company Fwd:Vault is a perfect example.

The remaining clients, who write software that gets distributed (think boxed software in a store), must utilize MySQL’s second, fee-based proprietary license. This is where the money is, and is the true engine that has powered MySQL’s rise over the last 20 years.

As any business owner can tell you, replicating a strong consumer base and community climate is nearly impossible. “If it would be easy to take over MySQL by just forking it,” says Widenius, “Sun would never have bought MySQL and Oracle would have forked MySQL a long time ago instead of now trying to buy it as part of the SUN deal.”

Now this whole system get’s handed to Oracle, who has a directly competing product and feels major price pressure due to MySQL’s free offering. I agree with Widenius on the eventual outcome, but he doesn’t have a legal leg to stand on here. He sold MySQL AB to Sun, and they can do whatever they want with it. If Sun gets swallowed by Oracle, MySQL goes alogn with it. That’s how businesses work. He can argue all day that the Sun deal was predicated on their track record for positively supporting FLOSS projects, but his control over MySQL’s future was out the door the moment the Sun deal was closed.

I’m a huge OSS proponent, but I’m a capitalist first. If the EC doesn’t find the sale to be monopolistic — keep in mind the USDOJ already approved the deal — then I wish Oracle the best of luck with their new purchase.

That being said, capitalism favors the huge MySQL install base in the longer term. If Oracle removes MySQL “the open source database” from the OSS environment, they’re going to leave a massive hole in the market, a hole that cannot be filled with Oracle’s overpriced high-end database software. A new product will rise to fill the void. Maybe it will be a MySQL fork, maybe it will be something new, but it will happen. MySQL did it once, why can’t someone else do it again?

And when you acknowledge the likelihood of that potential outcome, it makes Widenius’ entire protest seem self-interested. He’s not necessarily concerned with the open source database community, but his position within it. I have no doubt that his intentions are at least in part altruistic — replacing MySQL would be a torturous process — but I’m sure he’d rather see his baby leading the pack than some neophyte.

In short, if he’s just trying to protect his turf, is his mindset really any different from Oracle?

For me, the entire issue is summarized in the introduction of his protest post, “I have spent the last 27 years creating and working on MySQL and I hope, together with my team of MySQL core developers, to work on it for many more years.”

If that was the case, you shouldn’t have sold it off in the first place.


Run your servers without timezone offsets

I recently made the decision to store times on Fwd:Vault systems in Greenwich Mean Time, or GMT. I decided to do this because I have time-sensitive events happening along several dimensions. Email coming into the system has several timestamps associated with it: the user’s initial delivery, relay from their mail server, and receipt by the Fwd:Vault mail server. Payment receipts come into Fwd:Vault from our billing provider, which gets stored in my system and made available to the user.

Up until now, my server time was set for the US Eastern, where both I and the server physically reside. Then I started building the code to display local time based on a user’s selected timezone.

Ugh.

Here’s the problem: displaying local time requires at least one time conversion, from server time to the user’s timezone. If the time is initially set to anything other than no-offset GMT, you have two calculations to do, from the server timezone to GMT, then GMT to user timezone. You can do it, of course, but who really wants to write even more code?

Now add to this equation the fact that most data-delivery systems have settled on sending time data in GMT. A very good practice, to be sure, but presents the need to do another timezone conversion when the data come into your systems. Going back to my example, I had to convert payment times from GMT to US Eastern before dropping them into my database.

Finally, add to the mix the potential for time data coming in from more than one source with more than one offset. Again back to my case, payment data is GMT, as is the Twitter feed I store and display on the site. Meanwhile, email was set to US Eastern. This matched the server and MySQL database where all the data ends up residing, so I was still looking at just one time conversion. But what happens down the road, when my server configuration changes, or I move to another timezone?

Tying this information to me makes as much sense as tying it to any one of my users. It’s the same rationale that data service providers use when delivering GMT time data, it applies to me, and it applies to you too.

I’m just too lazy to try and keep all that timezone switching straight in my head.

If you find yourself in the same scenario, save your sanity and your future support efforts. If you run a website that (a) displays time-sensitive data, and (b) allows users to create an account, you really owe it to everyone involved to store time in a neutral fashion and adjust time displays according to the user’s selected timezone.


Get domain out of any URL string (yes, really)

It’s a common problem with no single right answer: extract the top domain (e.g. example.com) from a given string, which may or may not be a valid URL. I had need of such functionality recently and found answers around the web lacking. So if you ever “just wanted the domain name” out of a string, give this a shot…

<?php
function get_top_domain($url, $remove_subdomains = 'all') {
  $host = strtolower(parse_url($url, PHP_URL_HOST));
  if ($host == '') $host = $url;
  switch ($remove_subdomains) {
    case 'www':
      if (strpos($host, 'www.') === 0) {
        $host = substr($host, 4);
      }
      return $host;
    case 'all':
    default:
      if (substr_count($host, '.') > 1) {
        preg_match("/^.+\.([a-z0-9\.\-]+\.[a-z]{2,4})$/", $host, $host);
        if (isset($host[1])) {
          return $host[1];
        } else {
          // not a valid domain
          return false;
        }
      } else {
        return $host;
      }
    break;
  }
}
 
// some examples
var_dump(get_top_domain('http://www.validurl.example.com/directory', 'all'));
var_dump(get_top_domain('http://www.validurl.example.com/directory', 'www'));
var_dump(get_top_domain('domain-string.example.com', 'all'));
var_dump(get_top_domain('domain-string.example.com/nowfails', 'all'));
var_dump(get_top_domain('finds the domain url.example.com', 'all'));
var_dump(get_top_domain('12.34.56.78', 'all'));
?>

Most of the examples are simply proofs, but I want to draw attention to the string in example #4, 'domain-string.example.com/nowfails'. This is not a valid URL, so the call to parse_url() fails, forcing the script to use the entire original string. In turn, the path part of the string causes the regex to break, causing a complete failout (return false;).

Is there a way to account for this? Surely, however I’m not about to tap that massive keg of exceptions (i.e. just a slash, slash plus path, slash plus another domain in a human-readable string, etc).

No regex for validating URL’s or email addresses is ever perfect; the “strict” RFC requirements are too damn broad. So I did what I always do: chose “what works” over “what’s technically right.” This one requires any 2-4 characters for a the top level domain (TLD), so it doesn’t allow for the .museum TLD, and doesn’t check to see if the provided TLD is actually valid. If you need to do further verification, that’s on you. Here’s the current full list of valid TLD’s provided by the IANA.

If you need to modify the regex at all, I highly recommend you read this article about email address regex first for two reasons:

  1. There’s a ton of overlap between email and URL regex matching
  2. It will point out all the gotcha’s in your “better” regex theory that you didn’t think about

Future of Web Apps London 2009 video index

The Future of Web Apps conference is so right up my alley it’s almost stupid that I couldn’t attend. Web development with a focus on business: customer service, driving traffic, marketing, sales… It’s essentially the event for geeks who want to go from the basement to the corner office. Fortunately, Ryan Carson and the team at Carsonified are kind enough to freely distribute some the presentations made at this year’s London event.

I couldn’t find an index of all of them, and I wanted to watch them all in chronological order, so here you go. If there are videos for the presentations I’m missing (here’s the full presentation schedule), please let me know so I can link them.

Taking your Site from One to One Million Users by Kevin Rose

Introducing Atlas: A Visual Development Tool for creating Web Applications by Francisco Tolmasky

Start-up Metrics that Matter by Dave McClure

Branding and Marketing Essentials for Your Web App by Alex Hunter

Now is the Time to Cash in on Your Passion by Gary Vaynerchuk

The Future of HTML5 by Bruce Lawson

You-Centric: The Future of Browsing by Aza Raskin

The Future of the Cloud by Simon Wardley


Get HTTP status code of cURL call in PHP

With all the fancy cURL-based API’s out there these days (Facebook and Twitter immediately come to mind), using cURL to directly access and manipulate data is becoming quite common. However like all programming, there’s always the chance for an error to occur, and thus these calls must be immediately followed by error checks to ensure everything went as planned.

Most decent API’s will return their own custom errors when an internal problem occurs, but that does not account for issues dealing directly with the connection. So before your application goes looking for API-based errors, they should first check the returned HTTP status code to ensure the connection itself went well.

For example, Twitter-specific error messages are always paired with a “400 Bad Request” status. The message is of course helpful, but it’s far easier (as you’ll see) to find the status code from the response headers and then code for the exceptions as necessary, using the error text for logging and future debugging.

Anyway, the HTTP status code, also called the “response code,” is a number that corresponds with the result of an HTTP request. Your browser gets these codes every time you access a webpage, and cURL calls are no different. The following codes are the most common (excerpted from the Wikipedia entry on the subject)…

  • 200 OK
    Standard response for successful HTTP requests. The actual response will depend on the request method used. In a GET request, the response will contain an entity corresponding to the requested resource. In a POST request the response will contain an entity describing or containing the result of the action.
  • 301 Moved Permanently
    This and all future requests should be directed to the given URI.
  • 400 Bad Request
    The request contains bad syntax or cannot be fulfilled.
  • 401 Unauthorized
    Similar to 403 Forbidden, but specifically for use when authentication is possible but has failed or not yet been provided. The response must include a WWW-Authenticate header field containing a challenge applicable to the requested resource.
  • 403 Forbidden
    The request was a legal request, but the server is refusing to respond to it. Unlike a 401 Unauthorized response, authenticating will make no difference.
  • 404 Not Found
    The requested resource could not be found but may be available again in the future. Subsequent requests by the client are permissible.
  • 500 Internal Server Error
    A generic error message, given when no more specific message is suitable.

So now that we know what we’re looking for, how do we go about actually getting them? Fortunately, PHP’s cURL support makes performing these checks pretty easy, they just don’t make the process plain. We need a function called curl_getinfo(). It returns an array full of useful information, but we only need to know the status number. Fortunately, we can set the arguments so that we only get this number back, like so…

// must set $url first. Duh...
$http = curl_init($url);
// do your curl thing here
$result = curl_exec($http);
$http_status = curl_getinfo($http, CURLINFO_HTTP_CODE);
echo $http_status;

curl_getinfo() returns data for the last curl request, so you must execute the cURL call first, then call curl_getinfo(). The key is the second argument; the predefined constant CURLINFO_HTTP_CODE tells the function to forego all the extra data, and just return the HTTP code as a string.

Echoing out the variable $http_status gets us the status code number, typically one of those outlined above.


Mentioned in recent IT World article

I was recently quoted in an article over at IT World, discussing underused developer tools (e.g. security testers). My quote is on page 2:

http://www.itworld.com/development/74088/developer-tools-you-dont-use-and-why-you-dont-use-them

Also FYI I am on vacation the rest of this week; return to our regular schedule next Monday.


What 255 characters looks like

“Should I use TEXT or VARCHAR field here?”

I’ve lost count of the number of times that I asked myself this question when putting together database structures. Since the maximum a VARCHAR can hold is 255, it becomes a question of whether or not the data you’re saving will be any longer than that. Sometimes that’s an easy call (phone number = VARCHAR; email body = TEXT), other times its blurry (verbose error logs, foreign-language data sets, user-submitted comments, etc).

“So what? Why not just use TEXT and be done with it?”

It’s true that in most cases it won’t make a difference. However, if you need to index and search the field, you should think carefully before blindly using TEXT. The data in TEXT type fields are stored outside the table itself, using only a few bytes for pointer information. This means that TEXT fields are not indexed, while VARCHAR fields are. This can have a tremendous effect on your SQL query speeds, as generally larger TEXT fields increase query time exponentially. Even if we take indexing out of the picture, the external storage of TEXT fields means that you’ll still see generally faster searches with VARCHAR.

Which brings us back to the original problem: when is a 255 character cap good enough? See for yourself. Below you’ll find a block of lorem ispum text that’s exactly 255 characters long (spaces count):

Lorem ipsum dolor sit amet, nonummy ligula volutpat hac integer nonummy. Suspendisse ultricies, congue etiam tellus, erat libero, nulla eleifend, mauris pellentesque. Suspendisse integer praesent vel, integer gravida mauris, fringilla vehicula lacinia non

If you’re like me, you’ll look at that and say, “That’s a lot more than I thought.”

Another way to look at it: RFC2822 says that a subject line may contain 998 total characters, with a max of 78 per line. Most mail clients don’t support multi-line subjects, so 78 characters is the practical limit you’ll find in most cases.

So if you ever find yourself doing that fuzzy-string-length-guestimation math in your head, bookmark this page to add a visual to the guesswork as well.


Next Page »