Archive your entire Twitter timeline

My code for displaying Twitter posts on your site is pretty handy, but it does have drawbacks. Each page load involves calling a remote URL, downloading a resulting XML file, and parsing the results, increasing your load times and using bandwidth. To minimize the impact, you can really only display a handful of the most recent posts.

Plus, the downloaded stream is never saved. Google does index Twitter, but the thoroughness and benefit to you are subject to much speculation.

We can solve both problems by locally storing and serving Twitter posts ourselves. Once you have them in your own system, you can display as many of them as you want without expensive external URL lookups. Plus, with the content centrally located on your site, getting Google to index and apply it to your rankings is straightforward.

Note for SEO geeks:
Yes, I am aware that displaying and indexing Twitter posts on your own site does technically fall under the category of duplicate content, so save your typing.

Given the disparate nature of Twitter content and the utter disconnect from my sites, I’m not too concerned about incurring a penalty for it. Your opinion and experience may vary. You should at least familiarize yourself with Google’s rules for duplicate content. If your paranoid, consider applying canonicalization to pages that display large portions of a Twitter timeline.

Let’s get started
The end of the post includes a link to download all the code, as well as a link to a live demo.

I am assuming that you’ve got a standard PHP/MySQL stack for your site, ideally running on Linux, super-ideally Debian (Digg uses it for a reason, you know).

I am also assuming that you know how to use it; bring a decent understanding of SQL, PHP, and basic web programming. Here’s your first test: the demo assumes your PHP installation is version 5 and includes the Simple XML libraries.

First, here’s the SQL INSERT command for the table that our example will use. Apply this to your database:

CREATE TABLE IF NOT EXISTS twitter (
  `id` bigint(10) unsigned NOT NULL,
  `created_at` datetime NOT NULL,
  `source` varchar(255) NOT NULL,
  `in_reply_to_screen_name` varchar(255) NOT NULL,
  `text` varchar(255) NOT NULL,
  UNIQUE KEY `id` (id)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Now let’s have a look at the class, which is the meat of the entire thing:

class Twitter {
  public function __construct($twitter_id) {
    $this->id = (int)$twitter_id;
  }
 
  public function user_timeline($page, $count = '200', $since_id = '') {
    $url = 'http://twitter.com/statuses/user_timeline/' . $this->id . '.xml?count=' . $count . '&page=' . $page;
    if ($since_id && $since_id != '') {
      $url .= '&since_id=' . $since_id;
    }
    $c = curl_init();
    curl_setopt($c, CURLOPT_URL, $url);
    curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($c, CURLOPT_CONNECTTIMEOUT, 3);
    curl_setopt($c, CURLOPT_TIMEOUT, 5);
    $response = curl_exec($c);
    $responseInfo = curl_getinfo($c);
    curl_close($c);
    if ($response != '' && intval($responseInfo['http_code']) == 200) {
      if (class_exists('SimpleXMLElement')) {
        return new SimpleXMLElement($response);
      } else {
        return $response;
      }
    } else {
      return false;
    }
  }
 
  public function rebuild_archive($your_timezone) {
    $orig_tz = date_default_timezone_get();
    date_default_timezone_set('GMT');
    $tz = new DateTimeZone($your_timezone);
    $sql = "SELECT id FROM twitter ORDER BY id DESC LIMIT 1";
    /**
     * INSTALLATION
     * execute $sql on your DB to get the latest twitter post
     * set the value of `id` to a variable named $since_id
     * set $since_id to false if the table is empty (i.e. a new install)
    **/
    $tweet_count = 0;
    for ($page = 1; $page <= 200; ++$page) {
      if ($twitter_xml = $this->user_timeline($page, '200', $since_id)) {
        foreach ($twitter_xml->status as $key => $status) {
          $datetime = new DateTime($status->created_at);
          $datetime->setTimezone($tz);
          $created_at = $datetime->format('Y-m-d H:i:s');
          $sql = "INSERT IGNORE INTO twitter
                    (id, created_at, source, in_reply_to_screen_name, text)
                  VALUES (
                    '" . $status->id . "',
                    '" . $created_at . "',
                    '" . addslashes((string)$status->source) . "',
                    '" . addslashes((string)$status->in_reply_to_screen_name) . "',
                    '" . addslashes((string)$status->text) . "'
                  )";
          /**
           * INSTALLATION
           * Execute $sql over your DB here
          **/
          ++$tweet_count;
        }
      } else {
        break;
      }
    }
    $sql = "ALTER TABLE twitter ORDER BY `id`";
    /**
     * INSTALLATION
     * Execute $sql over your DB here
    **/
    date_default_timezone_set($orig_tz);
    return $tweet_count;
  }
}

Twitter::user_timeline()
This method is a modified version of my previous twitter_status() function.

The big difference is that we’re passing additional arguments to Twitter’s user_timeline API call: count (specifies the number of statuses to retrieve) and page (specifies the page of results to retrieve).

Twitter::rebuild_archive()
This method takes the results from user_timeline() and places them in your DB. Its lone argument is the string representation for the timezone of your server. To find out what the string is and why you need it, just read the second post of my twitter series. For me on the US east coast, I use 'America/New_York'.

Quick Warning
Hopefully you noticed several large comment blocks with INSTALLATION in all caps: I didn’t include any code to run SQL over your DB. Every system includes their own wrapper for database calls, including mine, so I’m not wasting time writing out SQL inserts using raw PHP functions that you’ll just remove. Find the three blocks labeled “INSTALLATION” and follow the instructions to execute the list SQL.

Now we just need to run it.

require('/path/to/twitter.class.php');
$Twitter = new Twitter('12345678');
$Twitter->rebuild_archive('America/New_York');

We instantiate the class and pass the ID number of our Twitter account. You’ll find instructions on getting this number about halfway down my first post on displaying Twitter updates. After that, a single call to Twitter::rebuild_archive() will grab all available updates and store them.

If the `twitter` table is empty, it will grab your entire Twitter timeline, up to 3200 posts. If you have more than 3200 posts, you’re out of luck for the time being, although I’d recommend you take a break from the computer, take a shower, and say “Hi” to the wife and kids.

After the first run, subsequent runs will only grab new posts by way of the API’s since_id argument.

If you have the access, you can easily make this into a cron job:

#!/usr/bin/php5
<?php
require('/path/to/twitter.class.php');
$Twitter = new Twitter('12345678');
$Twitter->rebuild_archive('America/New_York');
?>

Save that last block of code to a file, set it to be executable (chmod 755 usually), and set the job to run hourly. That top line identifies the interpreter that the system should use to read the file. You may need to change it to reflect the location of the PHP executable on your system.

Want to see everything described above in action? Check out the Developer’s Diary on Fwd:Vault.

Don’t worry about cut ‘n paste, just download the zip file with the class and all the examples:
Twitter Archiver (.zip)

Update 08-19-2009: Removed references to function calls specific to my framework.

Update 12-16-2009: The `id` field has been bumped up to a BIGINT. Twitter ID numbers are bigger than what an unsigned INT field can hold.


Build a slick Twitter feed on your site

A few months ago I published an article describing how to output a Twitter stream on a page using PHP, and later followed up with two more to polish the display. The article content and code examples have since been tweaked based on feedback and my own debugging.

If you haven’t already had a look, or missed a portion, here’s the full series:

  1. Display Twitter updates on your website
  2. Calculate dates and times in different timezones (translate Twitter timestamps)
  3. Parse URL’s in text, create links (automatically link URL’s in stream)
  4. Download and store your Twitter posts in a database

If you have any comments or questions, be sure to post them under the proper article.


Free and open source alternative to ShareThis, AddThis, AddToAny

Update: Make sure you check out the comments! My post is just a launching point for some great commentary from staff at iBegin Share and Add to Any.

Every site with timely or useful content should utilize some on-site bookmark sharing tool. I’m talking about the bar of links to social networking sites like Facebook, Digg, Reddit, Twitter, etc. that you find at the end of a post. These buttons are preset to recognize the URL of the page they appear on, allowing visitors to quickly propagate your content to their digital lifestream. WordPress specifically offers a ton of plugins that offer such functionality.

The most popular tools use Javascript to display all the sites in a popup: Add to Any, AddThis, and ShareThis. Speaking in terms of pure function, these tools are great: they make sharing functionality readily available without cluttering up the display.

However these JS-based bookmarkers possess some significant downsides. First and foremost are the performance concerns. These tools are all stored remotely, and get loaded on your page as a javascript include. Here’s an example of the code from ShareThis:

<script src="http://w.sharethis.com/button/sharethis.js#tabs=web%2Cpost%2Cemail&amp;charset=utf-8&amp;style=default&amp;publisher=abc123" type="text/javascript"><!--mce:0--></script>

Pay attention specifically to src="http://w.sharethis.com/button/sharethis.js[...]". It’s just a normal URL, like any page you visit. This means that each time the page is loaded, the user’s browser goes off to retrieve a copy of the javascript required to display the button. Aside from the obvious bump in bandwidth usage, they can cause an obvious delay in page loading. Worse, if the service is experiencing any kind of slowdown or outage, including these services can cause your site to hang and timeout. And these services do hang on a regular basis. I’ve seen it last so long on my own blog that I’ve had to disable to the plugin until service returned. That the delay is not your fault does not matter; it slows your page down, making you the laggard in the eyes of users. Not good.

But while these services are not focused on reliability and uptime, they do spend an awful lot of time on data collection/aggregation, legal, and advertising. None of these are good for you, the site owner. All activity surrounding the button on your site is tracked. They can partner with ad networks, packaging in extra ad cookies when the button is served up. Aside from the privacy issues, this again increases bandwidth. Imagery — specifically the branded icons of each service — are copyrighted, making them subject to usage restrictions and leaving you open to dealing with pain-in-the-ass take-down requests. Update: Per conversation with Add to Any Founder Pat Driven in the comments, Add to Any actually avoids this type of language entirely, limiting all their legal jargon to a plain-speak Privacy Policy.

To be clear, there’s nothing inherently wrong with any of this. These are businesses, they provide a service and have to make money to stay alive. However I think the vast majority of users just want the fancy javascript popup, everything else is excess baggage.

Enter iBegin Share, a free, open source alternative for javascript-powered bookmark sharing. Instead of going offsite to retrieve code at each page load, iBegin Share runs locally on your site, saving you bandwidth and decreasing load time. iBegin Share tracks usage like its corporate counterparts, but that data is stored in your database and used for your own data tracking purposes only, saving more bandwidth (since it doesn’t have to communicate back) and your privacy. Finally, since its open source you can modify the code any way you want: change the look, layout, color scheme — the tool includes 4 preset color schemes, plus an option for text vs. button link — even add totally new share options. A WordPress plugin version is available.

On the downside, external documentation is pretty thin at the moment, but the code is well-commented. There is also a forum, but activity there is rather limited right now — a discussion on a seemingly common issue started earlier this month has yet to receive any official word. So you’re on your own with any heavy customizing or problems, but I suppose that’s the tradeoff for eliminating any third party eyes poking around your traffic. Assuming it works as advertised, I’d argue that it’s a far better deal than the other tools, even without any customization ability.

If you decide to give iBegin Share a shot, or if you’re using it already, I’d love to hear how it’s working for you. Please share your experiences in the comments.