Preventing Firefox memory, processor bloat

If you’ve read this blog before, you know I’m a big Firefox fan. But the one problem that has dogged me is the inevitable bloat that Firefox suffers when open for long periods of time. I work in my browser all day, and after 8 hours it has usually gobbled up all the available RAM, and sucks processor cycles like a Dyson. Fortunately I’ve finally pinpointed the cause and several solutions.

Let’s start at the source. The Firefox team made a crucial usability decision, which is at the heart of the problem. They wanted to allow the user to recover any page that may have recently opened. So by default, Firefox keeps navigation history for all your open tabs, plus the last 10 tabs that you closed. The navigation history for each of those tabs — both open and closed — can hold up to a maximum of 50 pages (i.e. the number of URLs you can traverse purely through the Back/Forward buttons)

With no limit to the number of open tabs, plus the high limit on the Back/Forward navigation, it’s easy to see why Firefox slows to a crawl. If you do a lot of browsing in a lot of tabs, your memory disappears in a hurry. Managing all that extra memory causes the processor to work overtime to keep Firefox hippo moving.

There are two ways to fix this issue in a pinch. First you can simply restart the browser, making sure that it doesn’t save your tabs (you are prompted to save tabs at close by default). Second, you can clear the Recently Closed Tabs to eliminate a portion of the tab history bloat (History > Recently Closed Tabs > Clear Closed Tabs List).

For a more long-term solution, we need to mess with the system settings. Type about:config in the address bar to bring up Firefox’s complete configurations list. The latest Firefox versions present you with a warning before opening the page.

A warning: this page handles everything in your browser. Everything. Don’t mess with stuff if you don’t know what you’re doing.

In the “Filter” textbox at the top, enter

browser.sessionstore.max_tabs_undo

This setting controls how many closed tabs to track. Less old tabs = less memory usage. Double click the lone entry in the list and change the value from “10″ to “5.”

Back to the filter box, enter

browser.sessionhistory.max_entries

This setting controls the navigation history limit. Double click the entry and drop the value from “50″ down to “25″.

Close the about:config tab and restart your browser.

Your mileage on these tweaks will vary depending on your system specs. If you can go a day of heavy browsing without hitting the creep, slowly increment the settings back up, until you hit the sweet spot.


jQuery 1.4 released

The latest and greatest version of jQuery, version 1.4, was released on January 14, the birthday of jQuery’s original launch. Bugfixes and improvements abound!

The jQuery team has put together a site devoted to the new version, called the 14 days of jQuery, covering the major version changes as well as infrastructure updates coinciding with the new release. For example, the documentation site has been completely redesigned, and been moved to it’s own subdomain home, api.jquery.com. Links from the primary jquery.com site should be updated within the next week. With video demos of new features, Q&A’s with the core team (including founder John Resig), it’s well-worth checking out for every jQuery developer.


MySQL’s Monty Widenius responds

My summarizing and opining post discussing MySQL founder Michael Widenius’ protest of the Sun purchase by Oracle prompted a response from none other than Monty himself. Hit the comments to see what he has to say about my response, which was definitely net-negative in the final analysis. I have responded in the comments of that post as well.

I must be moving up in the world, or Monty was just really bored over his Christmas vacation. :)


MySQL founder Michael Widenius concerned about sale to Oracle

In case you haven’t heard, Sun is being bought by Oracle. After dancing around the issue in blog posts over the past 8 months, MySQL developer-founder Michael “Monty” Widenius finally comes out and adamantly opposes MySQL’s role in the sale.

In a Dec. 12 blog post, Widenius tries to rally open source MySQL supporters in an effort to seek assurances from Oracle that the project will, in fact, stay open source. He makes a good case for a future Oracle decision to limit or close off the open source elements:

Oracle [has] to lower prices all the time to compete with MySQL when companies start new projects. Some companies even migrate existing projects from Oracle to MySQL to save money. Of course Oracle has a lot more features, but MySQL can already do a lot of things for which Oracle is often used…So I just don’t buy it that Oracle will be a good home for MySQL. A weak MySQL is worth about one billion dollars per year to Oracle, maybe more. A strong MySQL could never generate enough income for Oracle that they would want to cannibalize their real cash cow.

Anyone who’s loosely familiar with open source software knows that the community can execute the almighty fork, just pick up the code and go. But Widenius believes the code is only a portion of the equation, and that the economy around MySQL is vastly more important. Richard Stallman penned a letter in conjunction with Knowledge Ecology International (KEI) and the Open Rights Group (ORG) that succinctly describes the issue:

MySQL is made available to the public in two parallel ways. Most users obtain it as free/libre software under the GNU General Public License (GPL) version 2; the code is released in this way gratis. MySQL is also available under a different, proprietary license for a fee.

This approach was able to provide (1) an attractive platform for developers looking to use FLOSS, and secured MySQL enormous mind share, particularly in supporting content rich web pages and other Internet applications, and (2) the ability for paying clientèle to combine and distribute MySQL in customizations that they do not want to make available to the public as free/libre software under the GPL. With excellent management and considerable trust within the user community, MySQL became the gold standard for web based FLOSS database applications.

Bolding my emphasis, which is the key here. Most MySQL users don’t need licenses, for two reasons. First, other OSS projects naturally play very nicely with MySQL’s matching open source license. Second, websites that use proprietary code in conjunction with MySQL are clear because nothing is actually distributed, users simply visit a site. My company Fwd:Vault is a perfect example.

The remaining clients, who write software that gets distributed (think boxed software in a store), must utilize MySQL’s second, fee-based proprietary license. This is where the money is, and is the true engine that has powered MySQL’s rise over the last 20 years.

As any business owner can tell you, replicating a strong consumer base and community climate is nearly impossible. “If it would be easy to take over MySQL by just forking it,” says Widenius, “Sun would never have bought MySQL and Oracle would have forked MySQL a long time ago instead of now trying to buy it as part of the SUN deal.”

Now this whole system get’s handed to Oracle, who has a directly competing product and feels major price pressure due to MySQL’s free offering. I agree with Widenius on the eventual outcome, but he doesn’t have a legal leg to stand on here. He sold MySQL AB to Sun, and they can do whatever they want with it. If Sun gets swallowed by Oracle, MySQL goes alogn with it. That’s how businesses work. He can argue all day that the Sun deal was predicated on their track record for positively supporting FLOSS projects, but his control over MySQL’s future was out the door the moment the Sun deal was closed.

I’m a huge OSS proponent, but I’m a capitalist first. If the EC doesn’t find the sale to be monopolistic — keep in mind the USDOJ already approved the deal — then I wish Oracle the best of luck with their new purchase.

That being said, capitalism favors the huge MySQL install base in the longer term. If Oracle removes MySQL “the open source database” from the OSS environment, they’re going to leave a massive hole in the market, a hole that cannot be filled with Oracle’s overpriced high-end database software. A new product will rise to fill the void. Maybe it will be a MySQL fork, maybe it will be something new, but it will happen. MySQL did it once, why can’t someone else do it again?

And when you acknowledge the likelihood of that potential outcome, it makes Widenius’ entire protest seem self-interested. He’s not necessarily concerned with the open source database community, but his position within it. I have no doubt that his intentions are at least in part altruistic — replacing MySQL would be a torturous process — but I’m sure he’d rather see his baby leading the pack than some neophyte.

In short, if he’s just trying to protect his turf, is his mindset really any different from Oracle?

For me, the entire issue is summarized in the introduction of his protest post, “I have spent the last 27 years creating and working on MySQL and I hope, together with my team of MySQL core developers, to work on it for many more years.”

If that was the case, you shouldn’t have sold it off in the first place.


Open source mentoring

I was quoted in another IT World article last week discussing how mentoring occurs in open source communities. Below is the full text I sent to the author, in case anyone wanted more background on my comments.

In my experience, mentoring in open source projects occurs between the project leaders (i.e. artisans) and their most active community members (i.e. journeymen). In other words, rarely does the complete newbie directly benefit from the knowledge of team. They must first absorb what the team has to offer in the form of their project code — prove themselves worthy, in a sense — and then join the high-level discussions.

I have a perfect example from my own experience. The Zen Cart project fosters a vibrant support community, however direct, unscheduled contact with team members is strictly prohibited.

Unfettered developer access to the team is limited to a select few community members who are proactively contacted by the team, instead of the other way around. I was fortunate to fall into this category, but it required serious upfront work.

I taught myself the Zen Cart platform in order to launch my then-employer’s first ecommerce site. During my time launching and maintaining the site, I developed several utilities for the program in order to fill some holes in their functionality. For example, the available accounting reports were lackluster, so I built a custom reporting tool to output all the sales figures our accountant would need. I released this and several other modules back into the community as a “thank you” for all the support they had provided. Attention from the team followed, culminating in an offer to join them as a support team member.

Here, (finally!) begins the mentoring. The team shared their private development plans so I could coordinate my my modules with the release schedule, and offered direct advice on how to improve my offerings. In turn, I provided feedback on my experience with the program, offering recommendations on future areas of improvement.

But the relationships quickly extended beyond the Zen Cart project itself, and today I consider the team members professional colleagues. I contact the team members any time I needed advice or support on my projects, Zen Cart-related or not. I just had a phone call with one of them last week to discuss online billing setups for my current project. Who better to seek help with online payments than the programmer who’s checkout code has reached “featured cart” status with PayPal?

In this ecomony, these relationships are invaluable. They have made themselves available as professional references if I need them which, according to one interviewer, just makes my resume levitate off the pile.

The article is chock full of additional insights and perspectives, so be sure to check it out.


Archive your entire Twitter timeline

My code for displaying Twitter posts on your site is pretty handy, but it does have drawbacks. Each page load involves calling a remote URL, downloading a resulting XML file, and parsing the results, increasing your load times and using bandwidth. To minimize the impact, you can really only display a handful of the most recent posts.

Plus, the downloaded stream is never saved. Google does index Twitter, but the thoroughness and benefit to you are subject to much speculation.

We can solve both problems by locally storing and serving Twitter posts ourselves. Once you have them in your own system, you can display as many of them as you want without expensive external URL lookups. Plus, with the content centrally located on your site, getting Google to index and apply it to your rankings is straightforward.

Note for SEO geeks:
Yes, I am aware that displaying and indexing Twitter posts on your own site does technically fall under the category of duplicate content, so save your typing.

Given the disparate nature of Twitter content and the utter disconnect from my sites, I’m not too concerned about incurring a penalty for it. Your opinion and experience may vary. You should at least familiarize yourself with Google’s rules for duplicate content. If your paranoid, consider applying canonicalization to pages that display large portions of a Twitter timeline.

Let’s get started
The end of the post includes a link to download all the code, as well as a link to a live demo.

I am assuming that you’ve got a standard PHP/MySQL stack for your site, ideally running on Linux, super-ideally Debian (Digg uses it for a reason, you know).

I am also assuming that you know how to use it; bring a decent understanding of SQL, PHP, and basic web programming. Here’s your first test: the demo assumes your PHP installation is version 5 and includes the Simple XML libraries.

First, here’s the SQL INSERT command for the table that our example will use. Apply this to your database:

CREATE TABLE IF NOT EXISTS twitter (
  `id` bigint(10) unsigned NOT NULL,
  `created_at` datetime NOT NULL,
  `source` varchar(255) NOT NULL,
  `in_reply_to_screen_name` varchar(255) NOT NULL,
  `text` varchar(255) NOT NULL,
  UNIQUE KEY `id` (id)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Now let’s have a look at the class, which is the meat of the entire thing:

class Twitter {
  public function __construct($twitter_id) {
    $this->id = (int)$twitter_id;
  }
 
  public function user_timeline($page, $count = '200', $since_id = '') {
    $url = 'http://twitter.com/statuses/user_timeline/' . $this->id . '.xml?count=' . $count . '&page=' . $page;
    if ($since_id && $since_id != '') {
      $url .= '&since_id=' . $since_id;
    }
    $c = curl_init();
    curl_setopt($c, CURLOPT_URL, $url);
    curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($c, CURLOPT_CONNECTTIMEOUT, 3);
    curl_setopt($c, CURLOPT_TIMEOUT, 5);
    $response = curl_exec($c);
    $responseInfo = curl_getinfo($c);
    curl_close($c);
    if ($response != '' && intval($responseInfo['http_code']) == 200) {
      if (class_exists('SimpleXMLElement')) {
        return new SimpleXMLElement($response);
      } else {
        return $response;
      }
    } else {
      return false;
    }
  }
 
  public function rebuild_archive($your_timezone) {
    $orig_tz = date_default_timezone_get();
    date_default_timezone_set('GMT');
    $tz = new DateTimeZone($your_timezone);
    $sql = "SELECT id FROM twitter ORDER BY id DESC LIMIT 1";
    /**
     * INSTALLATION
     * execute $sql on your DB to get the latest twitter post
     * set the value of `id` to a variable named $since_id
     * set $since_id to false if the table is empty (i.e. a new install)
    **/
    $tweet_count = 0;
    for ($page = 1; $page <= 200; ++$page) {
      if ($twitter_xml = $this->user_timeline($page, '200', $since_id)) {
        foreach ($twitter_xml->status as $key => $status) {
          $datetime = new DateTime($status->created_at);
          $datetime->setTimezone($tz);
          $created_at = $datetime->format('Y-m-d H:i:s');
          $sql = "INSERT IGNORE INTO twitter
                    (id, created_at, source, in_reply_to_screen_name, text)
                  VALUES (
                    '" . $status->id . "',
                    '" . $created_at . "',
                    '" . addslashes((string)$status->source) . "',
                    '" . addslashes((string)$status->in_reply_to_screen_name) . "',
                    '" . addslashes((string)$status->text) . "'
                  )";
          /**
           * INSTALLATION
           * Execute $sql over your DB here
          **/
          ++$tweet_count;
        }
      } else {
        break;
      }
    }
    $sql = "ALTER TABLE twitter ORDER BY `id`";
    /**
     * INSTALLATION
     * Execute $sql over your DB here
    **/
    date_default_timezone_set($orig_tz);
    return $tweet_count;
  }
}

Twitter::user_timeline()
This method is a modified version of my previous twitter_status() function.

The big difference is that we’re passing additional arguments to Twitter’s user_timeline API call: count (specifies the number of statuses to retrieve) and page (specifies the page of results to retrieve).

Twitter::rebuild_archive()
This method takes the results from user_timeline() and places them in your DB. Its lone argument is the string representation for the timezone of your server. To find out what the string is and why you need it, just read the second post of my twitter series. For me on the US east coast, I use 'America/New_York'.

Quick Warning
Hopefully you noticed several large comment blocks with INSTALLATION in all caps: I didn’t include any code to run SQL over your DB. Every system includes their own wrapper for database calls, including mine, so I’m not wasting time writing out SQL inserts using raw PHP functions that you’ll just remove. Find the three blocks labeled “INSTALLATION” and follow the instructions to execute the list SQL.

Now we just need to run it.

require('/path/to/twitter.class.php');
$Twitter = new Twitter('12345678');
$Twitter->rebuild_archive('America/New_York');

We instantiate the class and pass the ID number of our Twitter account. You’ll find instructions on getting this number about halfway down my first post on displaying Twitter updates. After that, a single call to Twitter::rebuild_archive() will grab all available updates and store them.

If the `twitter` table is empty, it will grab your entire Twitter timeline, up to 3200 posts. If you have more than 3200 posts, you’re out of luck for the time being, although I’d recommend you take a break from the computer, take a shower, and say “Hi” to the wife and kids.

After the first run, subsequent runs will only grab new posts by way of the API’s since_id argument.

If you have the access, you can easily make this into a cron job:

#!/usr/bin/php5
<?php
require('/path/to/twitter.class.php');
$Twitter = new Twitter('12345678');
$Twitter->rebuild_archive('America/New_York');
?>

Save that last block of code to a file, set it to be executable (chmod 755 usually), and set the job to run hourly. That top line identifies the interpreter that the system should use to read the file. You may need to change it to reflect the location of the PHP executable on your system.

Want to see everything described above in action? Check out the Developer’s Diary on Fwd:Vault.

Don’t worry about cut ‘n paste, just download the zip file with the class and all the examples:
Twitter Archiver (.zip)

Update 08-19-2009: Removed references to function calls specific to my framework.

Update 12-16-2009: The `id` field has been bumped up to a BIGINT. Twitter ID numbers are bigger than what an unsigned INT field can hold.


Free and open source alternative to ShareThis, AddThis, AddToAny

Update: Make sure you check out the comments! My post is just a launching point for some great commentary from staff at iBegin Share and Add to Any.

Every site with timely or useful content should utilize some on-site bookmark sharing tool. I’m talking about the bar of links to social networking sites like Facebook, Digg, Reddit, Twitter, etc. that you find at the end of a post. These buttons are preset to recognize the URL of the page they appear on, allowing visitors to quickly propagate your content to their digital lifestream. WordPress specifically offers a ton of plugins that offer such functionality.

The most popular tools use Javascript to display all the sites in a popup: Add to Any, AddThis, and ShareThis. Speaking in terms of pure function, these tools are great: they make sharing functionality readily available without cluttering up the display.

However these JS-based bookmarkers possess some significant downsides. First and foremost are the performance concerns. These tools are all stored remotely, and get loaded on your page as a javascript include. Here’s an example of the code from ShareThis:

<script src="http://w.sharethis.com/button/sharethis.js#tabs=web%2Cpost%2Cemail&amp;charset=utf-8&amp;style=default&amp;publisher=abc123" type="text/javascript"><!--mce:0--></script>

Pay attention specifically to src="http://w.sharethis.com/button/sharethis.js[...]". It’s just a normal URL, like any page you visit. This means that each time the page is loaded, the user’s browser goes off to retrieve a copy of the javascript required to display the button. Aside from the obvious bump in bandwidth usage, they can cause an obvious delay in page loading. Worse, if the service is experiencing any kind of slowdown or outage, including these services can cause your site to hang and timeout. And these services do hang on a regular basis. I’ve seen it last so long on my own blog that I’ve had to disable to the plugin until service returned. That the delay is not your fault does not matter; it slows your page down, making you the laggard in the eyes of users. Not good.

But while these services are not focused on reliability and uptime, they do spend an awful lot of time on data collection/aggregation, legal, and advertising. None of these are good for you, the site owner. All activity surrounding the button on your site is tracked. They can partner with ad networks, packaging in extra ad cookies when the button is served up. Aside from the privacy issues, this again increases bandwidth. Imagery — specifically the branded icons of each service — are copyrighted, making them subject to usage restrictions and leaving you open to dealing with pain-in-the-ass take-down requests. Update: Per conversation with Add to Any Founder Pat Driven in the comments, Add to Any actually avoids this type of language entirely, limiting all their legal jargon to a plain-speak Privacy Policy.

To be clear, there’s nothing inherently wrong with any of this. These are businesses, they provide a service and have to make money to stay alive. However I think the vast majority of users just want the fancy javascript popup, everything else is excess baggage.

Enter iBegin Share, a free, open source alternative for javascript-powered bookmark sharing. Instead of going offsite to retrieve code at each page load, iBegin Share runs locally on your site, saving you bandwidth and decreasing load time. iBegin Share tracks usage like its corporate counterparts, but that data is stored in your database and used for your own data tracking purposes only, saving more bandwidth (since it doesn’t have to communicate back) and your privacy. Finally, since its open source you can modify the code any way you want: change the look, layout, color scheme — the tool includes 4 preset color schemes, plus an option for text vs. button link — even add totally new share options. A WordPress plugin version is available.

On the downside, external documentation is pretty thin at the moment, but the code is well-commented. There is also a forum, but activity there is rather limited right now — a discussion on a seemingly common issue started earlier this month has yet to receive any official word. So you’re on your own with any heavy customizing or problems, but I suppose that’s the tradeoff for eliminating any third party eyes poking around your traffic. Assuming it works as advertised, I’d argue that it’s a far better deal than the other tools, even without any customization ability.

If you decide to give iBegin Share a shot, or if you’re using it already, I’d love to hear how it’s working for you. Please share your experiences in the comments.


No “private” setting in open source

I love the PHPMailer system. Straightforward, effective, very well documented and supported. It’s everything that a piece of software should be, and best of all it’s free. The parent company, codeworx Technologies, supports and maintains a piece of software used by millions of sites for free, and gets a ridiculous amount of exposure in return. Win-win.

I came across the only beef I have with their code just the other day, and it illustrates a larger issue with open source projects as a whole. Fwd:Vault uses PHPMailer to send all its outgoing messages. I want to keep a log of all outgoing messages, so I extended PHPMailer to store a copy of the message in my database. When I ran the code, I found that the “To:” address was not getting stored along with the rest of the data. I had probed the core PHPMailer class and found the variable containing this information, but had failed to check its scope, which was set to private. I changed this to public and I was good to go.

But should I really have been forced to make this extra step? I had taken the time to go through their code to find what I need and use it to meet my needs. What interest is it of the developer that I use his/her code in a way that they do not “condone?” Even if I break functionality, it’s my problem to solve it.

If you write something with the intention of releasing it to the world, you must assume that no part of it will be hands off. Developers declare classes and variables as private, protected, and final when they don’t want other people playing in their sandbox. Practically speaking, you make it more difficult for others to utilize the code you’ve has written, which seems to conflict with the intention of releasing it in the first place. The practice also runs counter to the very nature of open source, which is that all information is free and clear.

Unless the structure of your code demands specific scope settings — those situations are extremely rare — show some faith in your fellow devs and save them the extra step of unlocking your code. There’s enough work to do without making it difficult for ourselves.