Before we begin...These posts focus on what I did in the deployment, so they tend to sound a bit like I'm the only one that did anything at all. That's just because those are the things I know about. Everyone else on the online team worked their tails off as well, with a lot of support from others in the company. They all did tons of testing, fixing, verifying and other things I never heard about. Any software project this large is a team effort, including this one.
Always be preparedThis was a large, scary upgrade from the beginning. Our web servers host all the Newsgator online applications, which includes Newsgator.com, all the Whitelabel sites, Whitelabel Admin, Buzz (or whatever we're calling it today), Denver Post Newshound, all the VNU Newsgator sites, an internal admin application and one app that's in quasi-production. For better or worse, we decided that everything but VNU should be upgraded in one fell swoop. Thankfully the Newsgator API has been moved to it's own cluster, so there was almost no way we could break the USA Today reader. Still, we were upgrading 7 different applications at once.
We had a very aggressive schedule for the migration, roughly four weeks. But that includes all the time to branch source control, upgrade all the developer machines and servers, migrate the code and write the deployment plan. Oh yeah, and test stuff too.
We typically do about a week of testing for an incremental upgrade, but in this case we were changing the entire underpinnings of the application, the very foundation the websites stand on, in addition to a nearly full schedule of new features. A full regression test is really what we needed, but we just didn't have the 2+ weeks that would have taken. Our testers, Ria and DeDe, did a really great job with the time they had. Unfortunately, halfway through testing Murphy's Law kicked in. DeDe, our tester for Newsgator.com, got an unscheduled root canal and was out for almost three days. The rest of the team worked hard to pick up the slack, and we found a few bugs, but nothing too terrible. At this point I was cautiously optimistic about the deployment. I should've known better. Developers generally make terrible testers, and this was no exception.
Deployment Day #1Our first shot at deployment was on a Thursday. DeDe was well enough to come back to work, and she began testing with a vengeance. That day was absolutely crazy. DeDe found a bunch of bugs, including something like 8 showstoppers. These ranged from minor display problems to major crashes, ASP.NET 2.0 bugs to configuration problems. It turned out that many of the issues stemmed from a source control crash we had back in May. We thought that had been cleaned up, but many of the files in source control were the wrong versions. So on top of upgrading we had to recheck everything we got from source control. The whole team fixed bugs frantically, but we were barely keeping up with the bug reports. The last fix was completed barely an hour before the deployment was scheduled to begin, with no time to verify it. Nonetheless we zipped up the code and started copying to production.
At this point any experienced developer should have warning bells, along with sirens, flashing lights, and the red alert klaxon. When you're still finding critical bugs the day of deployment, you're not ready to deploy. Despite that we almost deployed anyway. We even had a meeting with the entire development staff to talk it over. Frankly, it was kind of comical. We all immediately agreed that deploying was a stupid idea. Then we discussed whether to do it anyway. Thankfully, reason prevailed and we scrubbed it. The new deployment date was set for just two working days later, on Monday night.
Barely an hour later that decision was vindicated. We discovered that if we had loaded, we would have broken the Newsgator support forums. By itself that wouldn't have been a critical issue, but as it turned out that was just the tip of the iceberg.
Off-The-Shelf doesn't mean easyOriginally we had thought the forums could just be left alone, running on the 1.1 framework. Due to some oddities in our setup, it turned out to be impossible to upgrade Newsgator.com without also upgrading the forums. We tried running them under 2.0, but ran into some problems I described my previous post. Instead of fighting it, we decided to just upgrade to the latest version. Remember when I said we were upgrading seven applications? Make that eight.
We obtained the latest version of the software from the vendor. They had upgrade instructions, database scripts, new binaries, even the source code. What could be easier? I ran the upgrade scripts on the staging database, copied the new binaries and pages in, and was ready torock."I love off-the-shelf software!" I thought.
There were a few problems. In previous versions of the forums there were RSS feeds available for each of the forums. We made heavy use of these, since we are an RSS company. But for some inexplicable reason the vendor changed the way this was done and broke backwards compatibility. Unfortunately I don't have the option to break those feeds, because it would just be ridiculous for an RSS company to break their own feeds.
The vendor was of little help in the admittedly short timeframe available, so I ended up applying a little hackaround. I coded up a quickie HttpModule to look for the old urls, and return an HTTP 301 (Permanent Redirect) to the new ones. Any decent feed reader will update it's links based off of that. In particular, all of Newsgator's readers handle 301's properly - and who would use any other reader? :)
Another problem, the upgrade had dropped the branding on the forum site. Not only that, but they had completely changed the way customization was done. Thankfully some creative copy and pasting got things looking more or less the same as they had.
Those solved, the forums seemed to be humming right along. Once again, I was feeling optimistic about the load. We had fixed most of the problems in our software, and the forums were off-the-shelf, so there couldn't be anything wrong there. Apparently I just don't learn...
Next time, deployment #2...