Keywords meta tags, the definitive answer from Google

Vindication! I gave up on the keywords meta tag long ago, but plenty of “SEO gurus” say you should still fill it with a moderate amount of garbage loosely relating to your site. Like I said the other day, anyone can do a decent job at SEO by utilizing some common sense and elbow grease.

This video was posted yesterday to the Google Webmaster Help Channel on YouTube.

Matt Cutts has spoken, that is all.


You need to know SEO

I’ll admit it, I was real lazy getting on the SEO train. It took starting my own company for me to finally start paying attention. SEO for my previous major site work was handled for me. ClassicWines had other staff dedicated to the issue, and Destination ImagiNation had such a huge network of affiliate sites that the SEO literally handled itself.

Even this blog went unattended in the SEO category. I had posts tagged, I submitted the site to the search engines and Technorati, I installed Platinum SEO Pack, I figured that was good enough.

It came to me shortly after the early soft launch of Fwd:Vault. I had dutifully installed Google Analytics to monitor traffic. I logged into the service for the first time a few weeks after things were rolling, and my search results sucked. I showed up for one term: “fwd”. I used their keyword tool to see what they were primarily pulling off the site, and most of it was the legalese from the policy pages. That’s when I knew this would require some serious attention.

If you find yourself in the same position, you owe it to yourself to get educated. The benefits for a startup are obvious, and I don’t know an existing company that wouldn’t like more traffic. Plus SEO knowledge/ability is a great resume booster.

On board? Great! Here’s how to get started.

First some reading. These are all SEO-related blogs that currently reside in my Google Reader setup.

Now that you’ve got the info, let’s get our hands on some utility sites. I’m not going to explain how you use these sites, that should be obvious from your self-education outlined above.

So there you go. Take all that stuff, add a few weeks of study, and you’ll know all you need to do a decent SEO job.

Sidebar: hiring outside help
I really do mean “decent.” SEO specialists can claim they know the voodoo better than you, but most of that is smoke these days. SEO is not quite the wild west it was in the late 90′s and early zeros; effective practices have become more standardized and the tools to maximize that effectiveness more available. Speaking practically, most will provide access to network relationships you can leverage for link sharing, subscriptions to the more expensive SEO tools (the Enterprise version of SEM Rush costs $500/month), and their own cocktail of page optimizations.

Nonetheless, they definitely bring a wealth of experience to the table, just as any other expert would. So you should look at hiring an SEO specialist just as would any other position. Just as small businesses keep their own books until they’re big enough to warrant an accountant, you owe it to yourself (and your wallet) to give it a shot. Look for outside help if your own efforts prove fruitless. If nothing else, you’ll be more educated and ready to negotiate with your SEO specialist.


Archive your entire Twitter timeline

My code for displaying Twitter posts on your site is pretty handy, but it does have drawbacks. Each page load involves calling a remote URL, downloading a resulting XML file, and parsing the results, increasing your load times and using bandwidth. To minimize the impact, you can really only display a handful of the most recent posts.

Plus, the downloaded stream is never saved. Google does index Twitter, but the thoroughness and benefit to you are subject to much speculation.

We can solve both problems by locally storing and serving Twitter posts ourselves. Once you have them in your own system, you can display as many of them as you want without expensive external URL lookups. Plus, with the content centrally located on your site, getting Google to index and apply it to your rankings is straightforward.

Note for SEO geeks:
Yes, I am aware that displaying and indexing Twitter posts on your own site does technically fall under the category of duplicate content, so save your typing.

Given the disparate nature of Twitter content and the utter disconnect from my sites, I’m not too concerned about incurring a penalty for it. Your opinion and experience may vary. You should at least familiarize yourself with Google’s rules for duplicate content. If your paranoid, consider applying canonicalization to pages that display large portions of a Twitter timeline.

Let’s get started
The end of the post includes a link to download all the code, as well as a link to a live demo.

I am assuming that you’ve got a standard PHP/MySQL stack for your site, ideally running on Linux, super-ideally Debian (Digg uses it for a reason, you know).

I am also assuming that you know how to use it; bring a decent understanding of SQL, PHP, and basic web programming. Here’s your first test: the demo assumes your PHP installation is version 5 and includes the Simple XML libraries.

First, here’s the SQL INSERT command for the table that our example will use. Apply this to your database:

CREATE TABLE IF NOT EXISTS twitter (
  `id` bigint(10) unsigned NOT NULL,
  `created_at` datetime NOT NULL,
  `source` varchar(255) NOT NULL,
  `in_reply_to_screen_name` varchar(255) NOT NULL,
  `text` varchar(255) NOT NULL,
  UNIQUE KEY `id` (id)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Now let’s have a look at the class, which is the meat of the entire thing:

class Twitter {
  public function __construct($twitter_id) {
    $this->id = (int)$twitter_id;
  }
 
  public function user_timeline($page, $count = '200', $since_id = '') {
    $url = 'http://twitter.com/statuses/user_timeline/' . $this->id . '.xml?count=' . $count . '&page=' . $page;
    if ($since_id && $since_id != '') {
      $url .= '&since_id=' . $since_id;
    }
    $c = curl_init();
    curl_setopt($c, CURLOPT_URL, $url);
    curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($c, CURLOPT_CONNECTTIMEOUT, 3);
    curl_setopt($c, CURLOPT_TIMEOUT, 5);
    $response = curl_exec($c);
    $responseInfo = curl_getinfo($c);
    curl_close($c);
    if ($response != '' && intval($responseInfo['http_code']) == 200) {
      if (class_exists('SimpleXMLElement')) {
        return new SimpleXMLElement($response);
      } else {
        return $response;
      }
    } else {
      return false;
    }
  }
 
  public function rebuild_archive($your_timezone) {
    $orig_tz = date_default_timezone_get();
    date_default_timezone_set('GMT');
    $tz = new DateTimeZone($your_timezone);
    $sql = "SELECT id FROM twitter ORDER BY id DESC LIMIT 1";
    /**
     * INSTALLATION
     * execute $sql on your DB to get the latest twitter post
     * set the value of `id` to a variable named $since_id
     * set $since_id to false if the table is empty (i.e. a new install)
    **/
    $tweet_count = 0;
    for ($page = 1; $page <= 200; ++$page) {
      if ($twitter_xml = $this->user_timeline($page, '200', $since_id)) {
        foreach ($twitter_xml->status as $key => $status) {
          $datetime = new DateTime($status->created_at);
          $datetime->setTimezone($tz);
          $created_at = $datetime->format('Y-m-d H:i:s');
          $sql = "INSERT IGNORE INTO twitter
                    (id, created_at, source, in_reply_to_screen_name, text)
                  VALUES (
                    '" . $status->id . "',
                    '" . $created_at . "',
                    '" . addslashes((string)$status->source) . "',
                    '" . addslashes((string)$status->in_reply_to_screen_name) . "',
                    '" . addslashes((string)$status->text) . "'
                  )";
          /**
           * INSTALLATION
           * Execute $sql over your DB here
          **/
          ++$tweet_count;
        }
      } else {
        break;
      }
    }
    $sql = "ALTER TABLE twitter ORDER BY `id`";
    /**
     * INSTALLATION
     * Execute $sql over your DB here
    **/
    date_default_timezone_set($orig_tz);
    return $tweet_count;
  }
}

Twitter::user_timeline()
This method is a modified version of my previous twitter_status() function.

The big difference is that we’re passing additional arguments to Twitter’s user_timeline API call: count (specifies the number of statuses to retrieve) and page (specifies the page of results to retrieve).

Twitter::rebuild_archive()
This method takes the results from user_timeline() and places them in your DB. Its lone argument is the string representation for the timezone of your server. To find out what the string is and why you need it, just read the second post of my twitter series. For me on the US east coast, I use 'America/New_York'.

Quick Warning
Hopefully you noticed several large comment blocks with INSTALLATION in all caps: I didn’t include any code to run SQL over your DB. Every system includes their own wrapper for database calls, including mine, so I’m not wasting time writing out SQL inserts using raw PHP functions that you’ll just remove. Find the three blocks labeled “INSTALLATION” and follow the instructions to execute the list SQL.

Now we just need to run it.

require('/path/to/twitter.class.php');
$Twitter = new Twitter('12345678');
$Twitter->rebuild_archive('America/New_York');

We instantiate the class and pass the ID number of our Twitter account. You’ll find instructions on getting this number about halfway down my first post on displaying Twitter updates. After that, a single call to Twitter::rebuild_archive() will grab all available updates and store them.

If the `twitter` table is empty, it will grab your entire Twitter timeline, up to 3200 posts. If you have more than 3200 posts, you’re out of luck for the time being, although I’d recommend you take a break from the computer, take a shower, and say “Hi” to the wife and kids.

After the first run, subsequent runs will only grab new posts by way of the API’s since_id argument.

If you have the access, you can easily make this into a cron job:

#!/usr/bin/php5
<?php
require('/path/to/twitter.class.php');
$Twitter = new Twitter('12345678');
$Twitter->rebuild_archive('America/New_York');
?>

Save that last block of code to a file, set it to be executable (chmod 755 usually), and set the job to run hourly. That top line identifies the interpreter that the system should use to read the file. You may need to change it to reflect the location of the PHP executable on your system.

Want to see everything described above in action? Check out the Developer’s Diary on Fwd:Vault.

Don’t worry about cut ‘n paste, just download the zip file with the class and all the examples:
Twitter Archiver (.zip)

Update 08-19-2009: Removed references to function calls specific to my framework.

Update 12-16-2009: The `id` field has been bumped up to a BIGINT. Twitter ID numbers are bigger than what an unsigned INT field can hold.


Do Search Stats Dictate Your Agenda?

The latest trend in perfecting “the message” in the internet age has been to optimize content for search engine indexing, aggregation, and delivery through the voodoo of search engine optimization – SEO. Let me be plain: I think most sites and web services are spending too much time worrying about what the Google crawler sees.

I can hear every one of you website statisticians howling at me right now. “We get so much traffic from Google results, it just doesn’t compare to direct hits.” I know, I know, and I’m not suggesting that we bury our heads in the sand and ignore those numbers. Instead, I think most site administrators are aiming their efforts at reaching the 10-yard line, instead of the end zone. Walk with me…

Off the top of your head, what do you think are the most important things to include on a site that will improve your SEO? If content wasn’t the first or most emphatic idea you had, you’re doing it wrong. Under Google’s own Webmaster Guidelines, the following appears as the third bullet in the “Design and content guidelines” section, after emphasizing internal linking and site map submission.

Create a useful, information-rich site, and write pages that clearly and accurately describe your content.

Keep in mind that the focus of the page is on how to build crawler friendly pages.

The “Quality guidelines” are even better. Here Google is pushing the need to avoid the blacker arts of SEO voodoo, such as link schemes, but here’s how they start…

Make pages for users, not for search engines. Don’t deceive your users or present different content to search engines than you display to users, which is commonly referred to as “cloaking.”

Avoid tricks intended to improve search engine rankings. A good rule of thumb is whether you’d feel comfortable explaining what you’ve done to a website that competes with you. Another useful test is to ask, “Does this help my users? Would I do this if search engines didn’t exist?”

Seeing the pattern? Even when the king of search engines is focused on talking SEO, they talk early and often about the importance of building a positive, informative user experience. Sure there are other tasks you need to do in order to get properly listed—meta tags, submit your site for searching, build a sitemap, etc.—but you’ll see the best results by building a library of good content.

However, note that I said “primary focus;” You are seeing a significant influx of users from searches, after all, so I’m not suggesting that you leave them out in the cold. Just keep in mind that at the end of the search is a real person, who wants to see real content.

Good content equals users, equals links, equals page rank, and the numbers should ideally help you identify hot spots and weak points in your digital library. To place your primary focus on anything else is rolling the dice on what Google and other search engines think of your stuff. That may get users to the site, but it won’t keep them there.

I say all this because the negative effects of misplaced emphasis run deep on a lot of sites. Placing time and energy onto a search crawler’s keyword hits detracts from your efforts to enrich your visitors’ experience. It’s a 1:1 inverse relationship, and applies to every site regardless of staff size…

  • Did you put a designer or developer in charge of your SEO effort? Fairly obvious time-share problem there.
  • Oh, you went all out and hired someone explicitly for SEO purposes? How about hiring someone to write more content instead, or even *gasp!* another developer? Lord knows there’s always plenty to do, and never enough developers on staff to do it (at least that’s been my experience; if a developer is reading this because his/her plate is clear right now, please drop me a line informing me where I can submit a resume).
  • If you have gobs of money and can hire all the staff you need, I suggest you ask your developers and designers how much time a week that SEO person (or people, ick…) saps away from tangible site development.

Regardless of your SEO approach, I can promise you that any excessive attention to it is detracting in some real way from what should be your true goal: delivering an ever-better experience, leaving your users more satisfied with each return visit.