Run your servers without timezone offsets

I recently made the decision to store times on Fwd:Vault systems in Greenwich Mean Time, or GMT. I decided to do this because I have time-sensitive events happening along several dimensions. Email coming into the system has several timestamps associated with it: the user’s initial delivery, relay from their mail server, and receipt by the Fwd:Vault mail server. Payment receipts come into Fwd:Vault from our billing provider, which gets stored in my system and made available to the user.

Up until now, my server time was set for the US Eastern, where both I and the server physically reside. Then I started building the code to display local time based on a user’s selected timezone.

Ugh.

Here’s the problem: displaying local time requires at least one time conversion, from server time to the user’s timezone. If the time is initially set to anything other than no-offset GMT, you have two calculations to do, from the server timezone to GMT, then GMT to user timezone. You can do it, of course, but who really wants to write even more code?

Now add to this equation the fact that most data-delivery systems have settled on sending time data in GMT. A very good practice, to be sure, but presents the need to do another timezone conversion when the data come into your systems. Going back to my example, I had to convert payment times from GMT to US Eastern before dropping them into my database.

Finally, add to the mix the potential for time data coming in from more than one source with more than one offset. Again back to my case, payment data is GMT, as is the Twitter feed I store and display on the site. Meanwhile, email was set to US Eastern. This matched the server and MySQL database where all the data ends up residing, so I was still looking at just one time conversion. But what happens down the road, when my server configuration changes, or I move to another timezone?

Tying this information to me makes as much sense as tying it to any one of my users. It’s the same rationale that data service providers use when delivering GMT time data, it applies to me, and it applies to you too.

I’m just too lazy to try and keep all that timezone switching straight in my head.

If you find yourself in the same scenario, save your sanity and your future support efforts. If you run a website that (a) displays time-sensitive data, and (b) allows users to create an account, you really owe it to everyone involved to store time in a neutral fashion and adjust time displays according to the user’s selected timezone.


Get domain out of any URL string (yes, really)

It’s a common problem with no single right answer: extract the top domain (e.g. example.com) from a given string, which may or may not be a valid URL. I had need of such functionality recently and found answers around the web lacking. So if you ever “just wanted the domain name” out of a string, give this a shot…

<?php
function get_top_domain($url, $remove_subdomains = 'all') {
  $host = strtolower(parse_url($url, PHP_URL_HOST));
  if ($host == '') $host = $url;
  switch ($remove_subdomains) {
    case 'www':
      if (strpos($host, 'www.') === 0) {
        $host = substr($host, 4);
      }
      return $host;
    case 'all':
    default:
      if (substr_count($host, '.') > 1) {
        preg_match("/^.+\.([a-z0-9\.\-]+\.[a-z]{2,4})$/", $host, $host);
        if (isset($host[1])) {
          return $host[1];
        } else {
          // not a valid domain
          return false;
        }
      } else {
        return $host;
      }
    break;
  }
}
 
// some examples
var_dump(get_top_domain('http://www.validurl.example.com/directory', 'all'));
var_dump(get_top_domain('http://www.validurl.example.com/directory', 'www'));
var_dump(get_top_domain('domain-string.example.com', 'all'));
var_dump(get_top_domain('domain-string.example.com/nowfails', 'all'));
var_dump(get_top_domain('finds the domain url.example.com', 'all'));
var_dump(get_top_domain('12.34.56.78', 'all'));
?>

Most of the examples are simply proofs, but I want to draw attention to the string in example #4, 'domain-string.example.com/nowfails'. This is not a valid URL, so the call to parse_url() fails, forcing the script to use the entire original string. In turn, the path part of the string causes the regex to break, causing a complete failout (return false;).

Is there a way to account for this? Surely, however I’m not about to tap that massive keg of exceptions (i.e. just a slash, slash plus path, slash plus another domain in a human-readable string, etc).

No regex for validating URL’s or email addresses is ever perfect; the “strict” RFC requirements are too damn broad. So I did what I always do: chose “what works” over “what’s technically right.” This one requires any 2-4 characters for a the top level domain (TLD), so it doesn’t allow for the .museum TLD, and doesn’t check to see if the provided TLD is actually valid. If you need to do further verification, that’s on you. Here’s the current full list of valid TLD’s provided by the IANA.

If you need to modify the regex at all, I highly recommend you read this article about email address regex first for two reasons:

  1. There’s a ton of overlap between email and URL regex matching
  2. It will point out all the gotcha’s in your “better” regex theory that you didn’t think about


Select a single dataset across multiple tables

Say you’ve got two tables with the same structural layout but contain logically different information. A common example would be storing “deleted” records in a separate table to reduce table sizes, simplify queries, and improve performance. A record only exists in one of the two tables, either it’s deleted or it’s not.

But sometimes you just need to find the information. “I don’t care where record #123 is, I just need to see it.” This situation presents a unique problem. A standard SELECT statement can combine tables to find a unified set of data, but can’t look for the same thing in two different place simultaneously. So you can do the obvious thing, which is the run the same query twice, once on each table. But that’s a way bigger performance penalty than necessary. Get to know the UNION statement

(SELECT * FROM tbl_stuff WHERE id = '123')
UNION
(SELECT * FROM tbl_stuff_deleted WHERE id = '123')

One query, a single set of results, and performance optimization all wrapped into one. Sweet.


Circumvent PHP errors with define_once()

Core PHP does not include a define_once() function to complement functions like require_once() and include_once(), which is pretty silly in my opinion. While I am generally not a fan of using *_once statements due to the performance penalty (and incurred laziness), define_once is the exception. There are ways to look for a loaded/missing file, but a define is not a define until you define it, so you really have no choice.

So in situations where you have to blindly load defines — I do it to build language defines in a cascading templating system — use this function to achieve the proper results:

function define_once($define, $value) {
  if (!defined((string)$define)) {
    define($define, $value);
    return true;
  }
  return false;
}


Jeff Atwood still wrong about PHP

Jeff Atwood’s latest post on Coding Horror provides great insight into the history and mindset of one of the Computer Science greats, Alan Kay. It’s a good read for any computer professional looking to delve further into the advances that Kay worked on.

Unfortunately, Jeff has a pretty strong distaste of PHP, and blindly jumps on a perceived opportunity to back up his case. He cites an ACM Queue article where Kay discusses why software development does not happen faster. Here’s the important section that Atwood quoted. Bolding is Atwood’s emphasis:

Let’s say the adoption of programming languages has very often been somewhat accidental, and the emphasis has very often been on how easy it is to implement the programming language rather than on its actual merits and features. For instance, Basic would never have surfaced because there was always a language better than Basic for that purpose. That language was Joss, which predated Basic and was beautiful. But Basic happened to be on a GE timesharing system that was done by Dartmouth, and when GE decided to franchise that, it started spreading Basic around just because it was there, not because it had any intrinsic merits whatsoever.

He follows his citation with this comment:

Any similarity between the above and PHP is, I’m sure, completely coincidental. That sound you’re hearing is just a little bit of history repeating.

His link here goes to an earlier post where he lambastes PHP for, well, existing really. I took issue with the post at the time, finding several glaring holes in his logic and generally disagreeing with the premise (obviously). I replied in the comments (look for my name), and expounded further in a rebuttal post on my own site. I didn’t let him get away with unfounded arguments then, and I’m not going to now.

In the ACM Queue article, Alan Kay discusses the evolution of programming languages. He and his colleagues expected the next big leap in programming language structure to occur somewhere around 1984, with the introduction of a new generation of programmers. It never happened — to Kay’s satisfaction, anyway — and he believes that commercial software development doomed this advancement, and has stagnated the evolution of programming theory. This leads to Jeff’s quote and PHP quip.

Two problems here. First, Kay was discussing evolutionary leaps, not the quality of the current crop. Kay doesn’t know how language theory will evolve, he’s waiting for someone to come along with the next bright idea (otherwise he would have done it, duh). Labeling any current language as subpar simply because someone theorizes “we can do better” is complete non-sequitor. I believe Kay knows this and his words in the article back it up.

To put it another way, using Atwood’s logic, wouldn’t VB.NET would fall into the same category of failure? Kay may not be worshiping at the altar of PHP, but I don’t see him anywhere near Microsoft’s temple either.

Second, PHP is successful because of its merits, not in spite of their lacking. Jeff needs to read the Wikipedia article on PHP. The very first paragraph on the page reads:

PHP originally stood for Personal Home Page. It began in 1994 as a set of Common Gateway Interface binaries written in the C programming language by the Danish/Greenlandic programmer Rasmus Lerdorf. Lerdorf initially created these Personal Home Page Tools to replace a small set of Perl scripts he had been using to maintain his personal homepage. The tools were used to perform tasks such as displaying his résumé and recording how much traffic his page was receiving. He combined these binaries with his Form Interpreter to create PHP/FI, which had more functionality. PHP/FI included a larger implementation for the C programming language and could communicate with databases, enabling the building of simple, dynamic web applications. Lerdorf released PHP publicly on June 8, 1995 to accelerate bug location and improve the code. This release was named PHP version 2 and already had the basic functionality that PHP has today. This included Perl-like variables, form handling, and the ability to embed HTML. The syntax was similar to Perl but was more limited, simpler, and less consistent.

PHP is 100% grass-roots success, starting out as one guy’s collection of useful tools for building websites. I refuse to believe that, in this age of abundant choice, all these programmers picked up PHP because it was “easy.” The numbers are just too big, features and power must enter into the equation at some point.

Nonetheless, Atwood’s application of Kay’s quote would lead you to believe that PHP exists because Zend Technologies pushed the language financially, just as GE backed BASIC. The truth is exactly the opposite: Zend Technologies exists because of PHP’s overwhelming success. Coming from a hard core VB.NET programmer, I find the insinuation here a little insulting; Zend Technologies is hardly MS Borg.

When it comes to having a discussion about programming philosophy, I wouldn’t even put myself in the room with Kay. I don’t know enough on programming language evolution to even join the discussion. Atwood, on the other hand, gets too close to the fire and gets burned. He erroneously applies Kay’s argument to his own beliefs which, when scrutinized, deflate much like the previous one.

This is the second time now Jeff has taken a unfounded shot at PHP. The complete lack of substance behind his claims make him look like nothing more than your average “my language is better than yours” fan boy. I could ignore it, but his audience is large enough to effect real influence. So like anyone else in a position of power, he must be called on the outlandish claims.

I read your blog all the time, Jeff, I know you can do better than this.


The Mobius Strip of computer support

I’ve learned that one of the biggest office-cluttering offenders when you run a company is receipts. The government wants to see them for tax purposes, so you hold onto every last one to save every last dollar. The problem, of course, is that you’ll collect a crap ton of these stupid little bits of paper over the course of a year, they don’t file well, and certain types of paper fade over time.

Enter the NeatReceipts Mobile Scanner, which I recently purchased to help eliminate my paper bloat. Scans full-size pages, business cards, and all manner of receipts. Then it will catalog them for you. Nice.

I of course was one of the lucky ones to run into a weird install issue. The system uses a trimmed down version of Microsoft SQL Server to store its data, and the server simply would not install. A quick online chat with a customer support rep from The Neat Company, and a tech was remotely diagnosing the problem in a few minutes. No hoops, no “check your cables” nonsense, just a few up front questions and then she was on it. I was honestly impressed with her speed and professionalism.

Until we diagnosed the problem, anyway. Allow me to summarize the end of our exchange. (We used LogMeIn Rescue—great software—and I did not realize until afterwards that chat logs are not saved locally. Lesson learned.) While attempting to run a fixup batch file, she consulted with her manager on an error that popped up. The meat of the message was that wmiprvse.exe hit an error at a “procedure entry point” in fastprox.dll. I don’t speak Microsoftish fluently, but she was helpful enough to point out that the core problem lay with my installation of Windows Management Instrumentation, or WMI for short. This was a specialized issue, would require direct intervention from Microsoft staff to correct, and I would have to contact Microsoft to fix the problem.

The end result: I was on my own to fix the problem.

At that point, I had entered the feared “other guy zone.” This is what you enter when you have an issue that offers an “out” to the support line you called. You know, “It’s not us, it’s the other guy. Our [printer / router / hardware] merely uncovered a preexisting issue with your existing [operating system / internet connection / software], and you’ll have to contact [insert vendor name here] to get the problem rectified.” In a former life I did phone support, and I continue to provide direct support to end users in homes and businesses, so I’ve been on both ends of this conversation countless times.

The problem here is obvious: with the “right” problem (or the “right” explanation from the poor end user), your can end up in an endless loop of support calls. These problem seems to exist just outside the boundaries of whichever vendor to whom you are talking. [A] refers you to [B], who refers you back to [A]. God help you if [A] refers to [B] who refers to [C]; I advise you throw you up the white flag right and just return the product. Your remaining mental faculties may be at risk if [D] enters the picture.

The reality, of course, is in the middle. Some problems really are outside of the control of [vendor], but there are also plenty of times where [vendor] is simply being lazy. The customer is stuck in a Möbius strip of support teams, each blaming the next. If anyone is going to break the customer out, one of them is going to have to step up to the plate, take charge, and see to it that the problem is fixed. Otherwise the customer will eventually give up, return the product, and send their business elsewhere.

As end users, we are all screaming the next question: “Why don’t companies recognize this? Now, why doesn’t every company take this to heart in their support structure? Why do we hear endless horror stories about ‘customer service’ that’s anything but?”

Meanwhile, a lot of support guys respond to this line of discussion with something to the tune of, “The customer bought it, they ought to take responsibility for making it work.”

For the support reps out there, your end user question is a non-starter: all they want to do is use your widget, they could care less about how it’s done. The fact that they spent money to purchase your widget is evidence of this fact. They bought a product to simplify something in the life. Did you ever buy a product to make your life more complex?

For you end users, banging your head on the wall while on hold, let’s go back to my WMI issue. Technology is a series of dependent systems, one part builds on another. The Neat Company uses SQL Server, which uses WMI, which sits on top of Windows. If one part doesn’t work, the whole thing comes down. Neat couldn’t get SQL Server running because of WMI, and punted to Microsoft. Technically, they have pretty solid ground for doing so. The issue was “below” SQL Server, a part of the chain that they, on some level, have to assume is there and functioning properly.

However, has anyone with a standard desktop copy of Windows ever actually gotten through to Microsoft Support? I’ve been doing technical support for almost 14 years, and I have yet to get a real person on the phone, India or otherwise. If this WMI problem is “removed” for Neat Company, it’s on the moon as far as Microsoft is concerned.

In this instance, Neat Company suffers from the reality that it only has one point of view: their own. Every human being naturally looks out for themselves first (moral arguments about caring for your fellow man notwithstanding). Businesses are people, and thus are subject to the same follies.

On top of that, you have the financial realities of running a business. One of those realities is that support structures take up a cost column in the company ledger. Support is an expense, not a source of income. A business is naturally going to do everything to minimize costs, which can (and often does) affect service quality. I’m not making any claims about Neat in this regard, only pointing out what most people don’t stop to consider.

Now, all that explains why customer service sometimes fails, but it doesn’t do anything to explain why things go so awesomely well a lot of the time. My tech support agent, AJ, was great! She was on the issue immediately, and had an obvious grasp of what she was doing and how to fix the problem. When she remote’d into my desktop, I’m fairly certain the mouse was moving too fast for her to simultaneously read a stepwise guide. She also seemed genuinely disappointed that she was unable to solve my problem, maybe almost as bummed as I at the prospect of having to contact MicroMassivesoft. She didn’t finish the call until she gave me as much information about the problem as was at her disposal.

Two reasons why this happens. First people like AJ genuinely enjoy what they’re doing, which makes them better, which translates to a better customer experience. Second, there are companies out there who do in fact appreciate the value of good customer service. Think about it; the company whose customer support solves the problem, and pulls the user off the Mobius strip, will have their undying gratitude. As the problem persisted, the value in a solution increased.

By going the extra mile and solving the problem, the “winning” company walks away from the situation with a raving fan, instead of just a customer. A fan tells great stories to friends (who loves viral marketing, show of hands?), comes back for more goods and services in the future, and is harder pressed to seek alternatives from competitors. If that’s not financial incentive enough, your company might as well call it quits now, because your head is a little due south.

My rule is that a company should be an expert at diagnosing and fixing the systems with 1° of separation from the product. Running a website? You better know browsers inside and out, and have some operating system knowledge. Advertising Design? You better have a rock solid understanding of printing, and know your way around a computer. Chimney sweep? You better understand roofing and a little masonry.

In our example, Neat Company’s 1° here would definitely cover WMI issues (SQL Server -> WMI). WMI certainly overlaps enough ground that support staff should be at least versed in how to correct the most common issues.

As it turned out, AJ’s info on the issue was enough to get me going down the right path, and eventually I landed on a solution: rebuild WMI. After getting myself sorted out, I sent the steps to fix the issue back to Neat Company to help anyone else with this problem. I’m a geek, I take pity on any layperson faced with this kind of issue.

For those of you here because your own instances of Windows Management Instrumentation is busted, here’s what I did to rebuild the WMI:

  1. Start > Run: net stop winmgmt
  2. Rename %windir%\System32\Wbem\Repository folder to something else (e.g. Repository_bad)
  3. Start > Run: net start winmgmt
  4. Start > Run: rundll32 wbemupgd, UpgradeRepository
  5. Start > Run: cd /d %windir%\system32\wbem
  6. Start > Run: for %i in (*.dll) do RegSvr32 -s %i
  7. Start > Run: for %i in (*.exe) do %i /RegServer
  8. Run (or rerun) Neat Database Setup.exe from your Neat Company setup CD or download
  9. Make sure that the SQL Server (NR2007) service is started
    (Start > Run: services.msc)

And here’s where I got it. People with more complex issues may also want to check out the WMI Diagnosis Utility. I also highly recommend that people with issues relating specifically to Neat Company products contact their customer service first. They definitely fall in the exceptional category (I’m giving AJ’s supervisor the benefit of the doubt).

5 hours later, I’m off to track down that receipt shoebox…


There are only two types of coders

Across every language, platform, and experience level, you can summarize all programmers into just two groups:

Those who are constantly learning, and those who think they know everything.

The ones who learn are aware that they don’t know everything, and never hesitate to seek out new information when a new problem (or language, or platform) presents itself. They aren’t afraid to ask questions because asking is precisely what got them to their current level, and asking will propel them forward to greater levels of achievement.

The ones who think they know everything never do, because (a) knowledge is infinite, and (b) programming knowledge is infinity-squared. This field just grows and changes at way too fast a pace; anyone passing themselves off as a go-to resource has obviously stopped learning, and thus is automatically behind.

Thankfully, I think the vast majority of us land in the eager learner group. So when you do run across the rare know-it-all, do not hesitate to run the other way; you won’t have to go very far to find more like-minded colleagues.


Why include_once and require_once may make you a crappy coder

Over the last few years, I’ve noticed that the PHP community has, in general, started to favor include_once() and require_once() over the more standard include() and require(). For the uninitiated, the “_once” version of each function will check to see if a file has already been loaded. If it has, it will safely bypass loading the file again without throwing an error, and continue parsing you code. For the really uninitiated, here’s the difference between require() and include() straight from the manual:

require() and include() are identical in every way except how they handle failure. They both produce a Warning, but require() results in a Fatal Error. In other words, don’t hesitate to use require() if you want a missing file to halt processing of the page. include() does not behave this way, the script will continue regardless. Be sure to have an appropriate include_path setting as well.

At first glance, this looks great! Since most files are only ever parsed once—did you know that it is legal to load the same file repeatedly?—these functions will save you from screwing up your code. No more worrying about reloaded files, repeating actions that cause aberrant behavior or flat out fatal errors.

Obviously the dynamic coding newbie uses these functions as training wheels. “I may have loaded, but I’m gonna load it again, just in case.” That’s understandable, and all well-and-good. If this is you, don’t make it a habit, learn to structure your code properly so that files that should be loaded once only ever have one opportunity to do so.

At the opposite end of spectrum, there are big-time PHP projects that prefer the _once versions exclusively. My own beloved Zen Cart has slowly been making the switch (new major revisions on the horizon do it more sweepingly). The CakePHP Coding Standards actually demand the use require_once():

When including files with classes or libraries, use only and always the require_once function.

Here the rationale is completely different, and completely informed. Because these projects are designed to allow extensive 3rd part manipulation, the chances for a file collision are fairly high. Think about situations where two different mods require the presence of a third “standard” mod library. They trade off a fair amount of performance to offer this flexibility, but at least they know what they’re doing.

If you’ve read this far, chances are you are not a newbie, but you sure as hell aren’t Zen Cart or CakePHP either. So, what business do you have using include_once() or require_once()? None, truth be told. And if you use them extensively, then the title of this article was written for you. Congrats!

The problem that these functions introduce for most developers is two-fold. First, there’s a performance penalty for using these functions, because the function must first check to see if a file has been loaded. Standard include() and require() statements don’t perform such checks, they simply look for the file and load it. Any type of dynamic system setup is going to use these functions quite a bit, and the penalty is incurred each time the function is used. It only takes about a dozen or so calls to see a difference.

Second and more importantly, it feeds laziness. If loading order and structure don’t cause code to fail, then you’re naturally tempted not to worry about them. This leads to unnecessary calls—”Did I load file yet? Ah screw it, I’ll do it again to make sure…”—which feels eerily similar to the rationale of the newbie coder I described above.

I’ve learned that code naturally snowballs in one direction or the other. If you write good clean code, optimize where possible (without going overboard), and completely smash bugs, the code you write and your ability to code will only improve. If instead you opt to ignore formatting, write just until “it works” and move on, and/or create logic that fixes a bug symptom rather than the bug itself, you will not improve and your code will suck.

That’s really the great tragedy; you’ve bypassed an opportunity to potentially improve your code, to potentially improve your own ability. Any shlub can half-ass it; take the high road and do the work. That will put you ahead of said shlubs when it comes time to look for a new job or get a promotion.

I’m not saying any of that is true for you (is it?), but include_once and require_once definitely fall squarely in the lazy coder category as far as I’m concerned.


It’s a feature, not a bug, stupid!

Over at ClassicWines, we recently experienced a login issue, where data was not being saved to the session after submitting valid credentials. Enter your username, password, and you would end up back at the home page as if nothing happened. I was banging my head against my desk (literally) looking for the cause.

Cut to the happy ending: PHP introduced a new “feature” that deconstructs objects BEFORE writing and closing the session. This means that if you use classes to manage your sessions, those classes are gone before the script executes your methods to save the session. There’s a warning at the bottom of the PHP manual page for session_set_save_handler() that identifies the issue succinctly.

For anyone who ended up here looking for help, the workaround is to call session_write_close() before the classes are deconstructed, usually with a combination of the __destruct() magic method and register_shutdown_function(). PHP documentation claims it was introduced in v5.0.5, but I did not have this issue using v5.1.2. If you like examples, Zen Cart built in a fix in the latest versions of their package.

This is a small example of a much larger problem in our industry: blind devotion to “the rules.” Technically speaking, the PHP team’s decision was correct: items loaded into memory should be destroyed in the reverse order in which they were created. It’s typically a good assumption to make, because often the younger processes were spawned by and rely on the ones that came before it.

But in this situation it makes absolutely no practical sense. I need my classes in order to close the session properly (as do many developers), and so ideally they should exist when I need them. This is, in fact, the way things worked prior to the bug fix.

The documentation warning I mentioned above even describes the situation as a “chicken and egg” problem, indicating some tacit acceptance of the fact that you cannot have one without the other. The case can be made for either one taking priority.

All that being the case, why make the change at all? It did not make anyone’s life easier. Quite the opposite, in fact. The new setup requires more logic, and hence more lines of code, to write and maintain. The fact that the session came first in the load order is totally irrelevant. Breaking the order of destruction is merely a break in convention, nothing more.

It’s a classic case of architecture astronauts.

When you go too far up, abstraction-wise, you run out of oxygen. Sometimes smart thinkers just don’t know when to stop, and they create these absurd, all-encompassing, high-level pictures of the universe that are all good and fine, but don’t actually mean anything at all.

The PHP team is, by nature, a pretty high-floating group. To a certain extent, it’s to be expected; they’re writing a programming language, after all. But this decision definitely shows a lack of oxygen.

My number one concern is always making my code do what it needs to do in the most efficient way possible. If that breaks a socially accepted norm or two, I have absolutely no problem with it, and neither should you.