Archive for August, 2009

Fix emails dropped or blocked by Comcast

As an email-based backup service, Fwd:Vault ran into spam filters pretty quickly. Most of this can be mitigated with proper server configuration and getting records in the right places (i.e. abuse.net). From there it’s simply a matter of reminding users to check the spam folder when things are missing.

However through the tribulations of one of my testers, I found out that Comcast goes the extra mile for users of their comcast.net webmail. Unlike most setups, where spam is simply redirected to a spam-specific folder, Comcast will delete the message outright, without issuing any kind of notice to the sender or recipient.

Truly, above and beyond (belief).

Of all the lousy IT practices I’ve seen over the years, this one takes the cake. No spam filter is perfect, so it’s guaranteed that they are dropping legitimate emails (case-and-point: I’m losing Fwd:Vault account emails). Plus it appears they default to a “highly suspicious” mode with newer systems, as fwdvault.com, my IP address, and my DNS records are completely fresh and unblemished.

Finally, the sheer size of their operation means that getting a hold of anyone to actually fix the problem when it happens to you is virtually zero. I’d go so far as to say that they can get away with this nonsense precisely because they are a large ISP. As a former “your company IT guy,” I can imagine getting at least an earful, and at worst a pink slip, if I were caught doing this.

Despite my astonishment, I couldn’t deny reality. Through my logs I watched Fwd:Vault’s mail server find their systems, connect, and deliver the message and get a 250 response code (i.e. all good). Then over in my comcast.net inbox I’d get exactly nada, ditto for the spam folder. Since the actual delivery had no technical issue, I had zero clue as to the cause of the problem. I wasn’t on any blacklists, the IP was static, and my DNS records were in good order, including a reverse DNS record with my hosting service.

Fortunately, it seems that someone in the trenches at Comcast is fighting the good fight, as I took two long-shot attempts today and it seems one of them paid off. Here’s what I did, hopefully it works for you.

1. Use the feedback form at comcastsupport.com
I tried to retrace my steps on how I found this one, but their sites are so damn convoluted I kept going in circles. However I know I started from inside the web mail interface, aka their “SmartZone”.

(See kids? That’s what we call irony. Can you say, “irony?”)

Whatever, here’s the link. You don’t need to log in to use the form:

http://www.comcastsupport.com/forms/net/sccfeedback.asp

I selected Spam or Junk Mail in the checkboxes and wrote something to the effect of:

I am not receiving mail from example.com in my Comcast email. I own and operate the mail server for this domain and have confirmed through my logs that the message is delivered properly (response code 250) to Comcast MX servers.

My tests delivered via the server mx.comcast.net (IP 00.00.00.00). It’s been over 24 hours and I have not received a bounce, nor is anything showing up in my inbox or spam folder.

As I have nothing else to go on, I am looking for help from your end.

I did not receive any reply, however I also took another step…

2. Use their RBL Removal Form
This should only apply if your mail server has actually been blocked by Comcast, in which case you would likely see an error code of 550 in your logs. If your server picks up the full response from Comcast, you may also get additional helpful information as outlined in their list of custom mail delivery error codes.

None of this applied to me, as the connection and delivery went off without a hitch. Still, I figured it was worth a shot; a bureaucracy this big is bound to have systems running into one another.

I sent in a request to be removed from their RBL by way of this form:

http://www.comcastsupport.com/Forms/NET/blockedprovider.asp

Most of the information will depend on your setup, however I did check the boxes for Implemented technology to filter or prevent transmission of spam and Changed the rDNS records to reflect a consistent and non-dynamic setting just in case. I included text similar to what I outlined earlier in the Issue Description box.

I saw emails coming through less than 30 minutes after sending this message. However, I sent the feedback first, followed by a brief online chat with their support, who directed me to the RBL form. All told it was at least an hour between my first step and the delivered message.

Update: I received this message back in response to my RBL request…

Thank you for contacting Comcast Customer Security Assurance. We have received and reviewed your RBL removal request.

Below each IP address you submitted in your request, we have included the result of our research. Please do not reply to this message.

[IP address(es)]

We have received your request for removal from our inbound blocklist. After investigating the issue, we have found that the IP you provided for removal is currently not on our blocklist.

We need the IP address currently blocked to further investigate this issue. The IP address is a number separated by decimals and is located in an error code starting with “550″ in the returned email from Comcast. You can learn more about how to identify a blocked IP by visiting our Frequently Asked Question page at:
http://www.comcast.net/help/faq/index.jsp?faq=SecurityMail_Policy18667

Please verify the IP(s) and resubmit your request to http://www.comcastsupport.com/rbl

So it looks like the RBL request didn’t do anything. Unless it did, and some numb-nut at Comcast was covering for their idiotic policies.

My gut tells me that I caught a particularly helpful support person manning the feedback desk who was able to punch the few keys it took to rectify the problem. If that’s the case, thanks for the help, and I hope the rest of you get to run into him/her as well. I sent the message around 2:00 pm on a Monday.

You can find more helpful information, including a link to the Blacklist Removal Request Form, on the Comcast Postmaster Site.

Best advice I can give: encourage your users to switch to Gmail. :)


Mentioned in recent IT World article

I was recently quoted in an article over at IT World, discussing underused developer tools (e.g. security testers). My quote is on page 2:

http://www.itworld.com/development/74088/developer-tools-you-dont-use-and-why-you-dont-use-them

Also FYI I am on vacation the rest of this week; return to our regular schedule next Monday.


Archive your entire Twitter timeline

My code for displaying Twitter posts on your site is pretty handy, but it does have drawbacks. Each page load involves calling a remote URL, downloading a resulting XML file, and parsing the results, increasing your load times and using bandwidth. To minimize the impact, you can really only display a handful of the most recent posts.

Plus, the downloaded stream is never saved. Google does index Twitter, but the thoroughness and benefit to you are subject to much speculation.

We can solve both problems by locally storing and serving Twitter posts ourselves. Once you have them in your own system, you can display as many of them as you want without expensive external URL lookups. Plus, with the content centrally located on your site, getting Google to index and apply it to your rankings is straightforward.

Note for SEO geeks:
Yes, I am aware that displaying and indexing Twitter posts on your own site does technically fall under the category of duplicate content, so save your typing.

Given the disparate nature of Twitter content and the utter disconnect from my sites, I’m not too concerned about incurring a penalty for it. Your opinion and experience may vary. You should at least familiarize yourself with Google’s rules for duplicate content. If your paranoid, consider applying canonicalization to pages that display large portions of a Twitter timeline.

Let’s get started
The end of the post includes a link to download all the code, as well as a link to a live demo.

I am assuming that you’ve got a standard PHP/MySQL stack for your site, ideally running on Linux, super-ideally Debian (Digg uses it for a reason, you know).

I am also assuming that you know how to use it; bring a decent understanding of SQL, PHP, and basic web programming. Here’s your first test: the demo assumes your PHP installation is version 5 and includes the Simple XML libraries.

First, here’s the SQL INSERT command for the table that our example will use. Apply this to your database:

CREATE TABLE IF NOT EXISTS twitter (
  `id` bigint(10) unsigned NOT NULL,
  `created_at` datetime NOT NULL,
  `source` varchar(255) NOT NULL,
  `in_reply_to_screen_name` varchar(255) NOT NULL,
  `text` varchar(255) NOT NULL,
  UNIQUE KEY `id` (id)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Now let’s have a look at the class, which is the meat of the entire thing:

class Twitter {
  public function __construct($twitter_id) {
    $this->id = (int)$twitter_id;
  }
 
  public function user_timeline($page, $count = '200', $since_id = '') {
    $url = 'http://twitter.com/statuses/user_timeline/' . $this->id . '.xml?count=' . $count . '&page=' . $page;
    if ($since_id && $since_id != '') {
      $url .= '&since_id=' . $since_id;
    }
    $c = curl_init();
    curl_setopt($c, CURLOPT_URL, $url);
    curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($c, CURLOPT_CONNECTTIMEOUT, 3);
    curl_setopt($c, CURLOPT_TIMEOUT, 5);
    $response = curl_exec($c);
    $responseInfo = curl_getinfo($c);
    curl_close($c);
    if ($response != '' && intval($responseInfo['http_code']) == 200) {
      if (class_exists('SimpleXMLElement')) {
        return new SimpleXMLElement($response);
      } else {
        return $response;
      }
    } else {
      return false;
    }
  }
 
  public function rebuild_archive($your_timezone) {
    $orig_tz = date_default_timezone_get();
    date_default_timezone_set('GMT');
    $tz = new DateTimeZone($your_timezone);
    $sql = "SELECT id FROM twitter ORDER BY id DESC LIMIT 1";
    /**
     * INSTALLATION
     * execute $sql on your DB to get the latest twitter post
     * set the value of `id` to a variable named $since_id
     * set $since_id to false if the table is empty (i.e. a new install)
    **/
    $tweet_count = 0;
    for ($page = 1; $page <= 200; ++$page) {
      if ($twitter_xml = $this->user_timeline($page, '200', $since_id)) {
        foreach ($twitter_xml->status as $key => $status) {
          $datetime = new DateTime($status->created_at);
          $datetime->setTimezone($tz);
          $created_at = $datetime->format('Y-m-d H:i:s');
          $sql = "INSERT IGNORE INTO twitter
                    (id, created_at, source, in_reply_to_screen_name, text)
                  VALUES (
                    '" . $status->id . "',
                    '" . $created_at . "',
                    '" . addslashes((string)$status->source) . "',
                    '" . addslashes((string)$status->in_reply_to_screen_name) . "',
                    '" . addslashes((string)$status->text) . "'
                  )";
          /**
           * INSTALLATION
           * Execute $sql over your DB here
          **/
          ++$tweet_count;
        }
      } else {
        break;
      }
    }
    $sql = "ALTER TABLE twitter ORDER BY `id`";
    /**
     * INSTALLATION
     * Execute $sql over your DB here
    **/
    date_default_timezone_set($orig_tz);
    return $tweet_count;
  }
}

Twitter::user_timeline()
This method is a modified version of my previous twitter_status() function.

The big difference is that we’re passing additional arguments to Twitter’s user_timeline API call: count (specifies the number of statuses to retrieve) and page (specifies the page of results to retrieve).

Twitter::rebuild_archive()
This method takes the results from user_timeline() and places them in your DB. Its lone argument is the string representation for the timezone of your server. To find out what the string is and why you need it, just read the second post of my twitter series. For me on the US east coast, I use 'America/New_York'.

Quick Warning
Hopefully you noticed several large comment blocks with INSTALLATION in all caps: I didn’t include any code to run SQL over your DB. Every system includes their own wrapper for database calls, including mine, so I’m not wasting time writing out SQL inserts using raw PHP functions that you’ll just remove. Find the three blocks labeled “INSTALLATION” and follow the instructions to execute the list SQL.

Now we just need to run it.

require('/path/to/twitter.class.php');
$Twitter = new Twitter('12345678');
$Twitter->rebuild_archive('America/New_York');

We instantiate the class and pass the ID number of our Twitter account. You’ll find instructions on getting this number about halfway down my first post on displaying Twitter updates. After that, a single call to Twitter::rebuild_archive() will grab all available updates and store them.

If the `twitter` table is empty, it will grab your entire Twitter timeline, up to 3200 posts. If you have more than 3200 posts, you’re out of luck for the time being, although I’d recommend you take a break from the computer, take a shower, and say “Hi” to the wife and kids.

After the first run, subsequent runs will only grab new posts by way of the API’s since_id argument.

If you have the access, you can easily make this into a cron job:

#!/usr/bin/php5
<?php
require('/path/to/twitter.class.php');
$Twitter = new Twitter('12345678');
$Twitter->rebuild_archive('America/New_York');
?>

Save that last block of code to a file, set it to be executable (chmod 755 usually), and set the job to run hourly. That top line identifies the interpreter that the system should use to read the file. You may need to change it to reflect the location of the PHP executable on your system.

Want to see everything described above in action? Check out the Developer’s Diary on Fwd:Vault.

Don’t worry about cut ‘n paste, just download the zip file with the class and all the examples:
Twitter Archiver (.zip)

Update 08-19-2009: Removed references to function calls specific to my framework.

Update 12-16-2009: The `id` field has been bumped up to a BIGINT. Twitter ID numbers are bigger than what an unsigned INT field can hold.