A Meaningful Backup Strategy for Photographers

For the second time in the past few weeks, I’ve heard of a photographer who lost many years’ worth of work due to their computer and drives being stolen. This has caused me to start re-evaluating my own backup strategy, and I thought I would share a few notes about what I’ve already learned and how I’m planning to improve my own data security.

IMHO, a good backup strategy involves five prongs: good drives, local live backup, local online backup, remote backup, and portable backup.

Good Drives

The first line of defense is that you should entrust your RAW and PSD files only to drives with a strong record of low failure rates. Based on recent numbers from a study done by Backblaze, Hitachi drives outperform other brands.

Of course, the best drives are solid state drives — their Annual Failure Rate (AFR) runs closer to 1%, while physical drives run 3-8% depending on their age. But for now, solid state drives are too expensive for serious photographers — we churn through way too much storage.

Live Backup

Assuming you aren’t reading this in the future where 4TB SSD drives have a 0.5% AFR and cost $100, you’ll still be using good old spinning-plate drives. The catastrophic death of these drives is not a matter if “if,” but of “when.”

So, the second prong in a good backup strategy is that photography drive volumes should always be created in pairs — a RAID 1 set (mirrored). With this approach, every byte bit written to one drive is written to both simultaneously, and both drives can be used independently if one of them fails.

In years past, RAID 5 was the gold standard, since it offered a better balance of usable space than RAID 1 (2/3 of the space is available, versus 1/2 for RAID 1). However, as the storage size of drives have increased exponentially, so have the chances that when a drive in a RAID 5 set fails, that an unrecoverable error will occur during the rebuild, which then puts the entire volume in peril.

RAID 5 also relies on proprietary logic that determines how the data and parity stripes are laid out on the physical drives. Thus, if the RAID controller hardware fails and you can’t find replacement hardware that uses the same firmware, there’s a good chance your array will become an instant doorstop.

At the time of this writing, Hitachi 4TB drives run are $180, and cheaper drives are $140. If your average shoot runs around 16GB, you’ll be paying around $1.50 per photo shoot for storage.

Local Online Backup

A mirrored volume is not, by itself, a sufficient backup. It mitigates the issue of drive failure, but does nothing to protect you from yourself. If you accidentally trash the wrong folder, run a bad script, or overwrite the wrong file, you can lose hours of work, entire shoots, or in the worst case, everything.

The reason that a local online (i.e., connected and turned on at all times) backup is important is that, being human, you will forget to connect and use your backup drives.

I recommend software such as Apple’s Time Machine, which silently, reliably, and quickly backs up all changes you make to your files each hour, and makes it a cinch to restore the files, should the need arise.

It is preferable to locate this drive in your house, but in a a separate physical location from your main computer. This reduces the chances that a localized fire or theft will result in the loss of both your primary and backup drives.

If you’re an Apple fan, a good solution is to use an external hard drive connected to your Airport Extreme router. Apple sells a version of the router with a drive built in (“Time Capsule”), but as usual with Apple, the price is much higher than just connecting your own drive via its USB port.

These backups should contain not just your photographs, but also your boot drive, applications, Lightroom catalogs, and anything else you need to back up on your computer.

Again, because drives do fail and backups are difficult to restart from scratch, having a RAID 1 volume for your backup drive is a good idea. Unfortunately, the Airport Extreme does not support software RAID, so to use it, you need a drive enclosure with a built-in hardware RAID controller (so the Airport Extreme only sees one virtual drive).

I use an Akitio Hydra enclosure, which supports RAID 1 or 5 for up to 4 drives. I recommend against the Drobo — while I’m sure they’ve put a lot of work into their “BeyondRAID” algorithm, the fact remains that it is proprietary, and I’ve heard a number of horror stories of people losing Drobo volumes and having no means of recovering them.

Remote Backup

Having all of your primary and backup drives in one physical location (and online) is a bad idea, because it exposes you to a number of potential threats — a thorough thief, fire, flood, lightning strike, power surge, etc.

So, it is essential to have a backup in another physical location, and to keep it up to date. This is the piece I’m lacking in my own strategy, and it’s something I’m working to address.

The simplest solution is to ask a friend or family member to hold your backup drive for you, and swap them out once every few months (a safety deposit box would also work). If you don’t work from home, you could also just keep the drive in your office.

To make these remote backups, you may need to buy some additional software that can do incremental copies of your main drive to your backup drives — synching only the files that have changed. If you’re handy with the command line, rsync on OS X and xcopy on Windows can do this for free.

In this situation, encrypting the drive is probably a good idea, especially if your photographs are sensitive in nature (boudoir, art nudes, etc.). Even if you trust the person holding the drives, you can’t trust a thief who might take off with your drive while robbing their house. Fortunately, this is very simple to do in Disk Utility on a Mac, and using BitLocker on Windows.

There are a number of “cloud” backup services (Carbonite, Backblaze, Crashplan, Amazon Cloud Drive, Microsoft Drive, and Mozy, to name a few). On the plus side, these offer continuous (daily) backups and can allow you to access the files online from another computer. However, there are some disadvantages:

  • They can be expensive over time compared to just buying a hard drive or two.
  • You have to understand which of their plans you need to use. For example, if you use Carbonite, you would need to use the $100/year plan, not the $50/year one, because only the more expensive plan will back up something other than your main user directory (which will almost certainly not be the drive you’re using for your RAW and PSD files).
  • Upload speeds can be terrible. Most Internet providers give you only a modest upload speed — mine is 1.5Mbps, which is 1/10th of the download speed. At this speed, sending 16GB of RAW files to the cloud would take a full 24 hours and would saturate my uplink, which might cause issues with other Internet usage. So before you consider a cloud solution, test your upload speed and do the math!
  • Consider the risks of systems that don’t offer end-to-end encryption. Some services encrypt your files on their server and in transit, but they hold the keys, so anyone who compromises their system can read your files. The only safe encryption is where the encryption key never leaves your computer. If you don’t want your boudoir clients or models being involved in the next “The Fappening”-style breach, be sure you understand the basics of encryption and how they treat your files (good rule of thumb: if you can log into their site using a normal web browser and see your files, any “encryption” they say they do on your files is not sufficient).

I will say that of the cloud services I’ve seen, the one I like best is Crash Plan’s “Offsite Drive” option. This is a free service, it basically allows you to make a backup external drive, trade drives with your friend, and your changed files will be sent to your drive on their computer, directly and automatically, over the Internet (and vice versa). The drives are heavily encrypted (the right way), so you can’t see each other’s files. And if you want them to additionally store the files on their servers, they are happy to do that as well (for a fee of course).

While this concept from Crashplan doesn’t completely overcome the issue of upload speed, at least you are only uploading new/changed files to one another, not trying to upload your entire library of files. If you were trying to, say, upload 2TB of photos via a 1.5Mbps uplink, it would take over 4 months to complete the backup, so trading drives with a friend is far better than using a traditional cloud backup service, which is optimized for “normal” people who may only have 20-50GB of total data.


Portable Backup

The final piece to the puzzle is to have an emergency backup of your most important documents on your person at all times.

If the nightmare scenario happened and someone was able to compromise and destroy your files both on your local copies and your cloud backup, the goal of this backup would be to save (a) personal files of great importance, such as family photos and tax records, and (b) your legacy of work as a photographer.

Carrying around multi-terabyte hard drives is obviously not an option (yet), but you don’t really need to. Right now, a 256GB flash drive runs around $70. This won’t be nearly enough for your RAW and PSD files, but you can at least use it to store a very large number of full-resolution, final JPEGs of your work.

If you ever found yourself with only that flash drive remaining, the loss of the PSDs and original files would be regrettable, but you would still have a digital master that is suitable for making new prints.

Again, encryption is absolutely essential for this — if your USB drive is ever lost or stolen, you don’t want your personal information to be available to whoever ends up with the drive.

A good strategy is to create two partitions — one very small, unencrypted FAT partition with just a “readme.txt” file containing your contact information and a promise of a few bucks to return of the drive, and the second one for your main encrypted storage. Giving the smaller partition some extra breathing room (say, 4-8GB) might also be useful for keeping some basic data rescue programs, or just so you can use the drive in untrusted computers for short-term file transfers.

This final layer of protection may seem as if it borders on paranoia, but keep in mind that if every other backup you have happens because of automatic processes, you need at least one backup that requires a manual copy process.

Keep in mind, however, that current flash drive technology requires that drives be used — if you let a flash drive sit dormant long enough (a year or two), you could end up with corrupted data. As such, flash drives aren’t a perfect replacement for other backup media. (SSD drives have the same bit-rot issue.)

Final Thoughts

Data is fragile, and thinking through the potential points of failure requires good planning and a solid basic understanding of basic technology. No one ever thinks something bad will happen to their data, until it does. Years or even decades of work can disappear in the blink of an eye. So, stop reading this and GO BACK UP RIGHT NOW. :)

Why House Batteries are an Unsustainable Idea

I greatly admire Elon Musk. He’s like one part Steve Jobs, one part Tony Stark, and he has a knack for making the impossible both possible and profitable. And usually just plain damned cool.

But his latest idea, batteries for houses to store solar energy, makes zero economic sense for the vast majority of US homes.

The sale pitch is as follows:

  1. Make energy while the sun shines but you’re at work.
  2. Use it while you’re at home at night.
  3. Profit

The problem is, the numbers just don’t work out that way.

The average US home uses around 11 MWHr of electricity per year. However, that number has some serious biases that underestimate the usage among potential Tesla battery customers:

  • The average in states where solar power makes the most sense have much higher electricity use, primarily due to a much heavier A/C load. The average in Louisiana, for example, is around 15 MWHr a year.
  • The average is heavily skewed by apartment and condo dwellers, manufactured homes, vacation homes, and smaller, cheaper houses — all of which use far less electricity and are far less likely to buy a solar system, much less an expensive battery system to go with it. Larger homes (those whose owners can afford to drop $20-40k on a solar and battery system) generally have higher electrical usage, both for climate control and to feed more computers, TVs, refrigerators, pool pumps, and other power-suckers.

Because of the above, most rooftop solar installations barely provide enough electricity for daily use. Even if all residents are away (itself a real stretch), many of the most power-sucking devices still use a considerable amount of electricity. Air conditioning in the South, for example, doesn’t just run when residents are home, it takes hours to cool a house down from, say, 79 to 73.

Most parts of the country don’t have variable electrical rates, the rates are the same during the day and at night. So, trading electrical usage from one to the other has zero impact on the bill.

Even if rates were double the average at night ($0.20/KwH), which they are not, and you only used your solar power at night, it would take decades to make your money back on the battery.

Also, the battery system Tesla is selling doesn’t actually have the capability of running your entire house (again, see above, the “average” house of a would-be Tesla customer is not the “average” house), so you don’t even get the benefit of having uninterruptible power.

IMHO, Musk has solved a problem that simply doesn’t exist. He’ll sell some batteries, but would-be customers who actually sit down and do the math will find it doesn’t make financial sense, and unlike a Tesla car, there’s not much in terms of cool factor and bragging rights to having a glorified-but-undersized UPS hanging in the garage.

Zero Tolerance for DWI

A friend of a number of my friends was killed yesterday by a drunk driver. Not just any run-of-the-mill drunk, but one that had at multiple DWIs on his record.

As a result, my Facebook feed is full of calls for Texas to get stricter on sentencing for intoxication manslaughter, and comments range from “no parole” to “death penalty.”

IMHO, this may serve our collective need for revenge, but it doesn’t address the core problem — drunk drivers are generally habitual, and tough sentencing for murdering someone with their car will absolutely not deter them. These are selfish, brutish people who simply don’t give a flying rat’s ass who they hurt. They also believe themselves to have the superpower of not getting caught. So, tough sentencing only post-manslaughter probably won’t save many lives.

Instead, here’s what I think Texas should do after the first offense of drunk driving:

  • Suspend their license for a year. If the driver is under 18, suspend it until they turn 19. If they are 18-20, suspend it until they are 21 or for a year, whichever is longer.
  • Within 60 days of conviction, require them to complete a 21-day, in-patient rehab program.
  • Impound all vehicles registered in their name. When their suspension ends, to get them back, they must pay to have an engine interlock device (breathalyzer) professionally installed.
  • For the next 3 years:
    • Give them a “vertical” state ID or driver’s license, which will significantly lower the chances of them getting alcohol in clubs or buying it in the store.
    • Put a code “N” on their license (interlock device required). Require car dealerships and rental companies to comply with this before providing them a vehicle.
    • If they have a TABC certification, take it away.

Last but not least, states need to come together and have reciprocity around these restrictions, so the driver can’t just get around them by moving.

With current technology, we can’t stop people from making the decision to drive drunk, but I think we can do a lot better to protect the rest of us from someone who has already made that decision once and been caught, because if we don’t, it almost certainly will happen again.

Hopefully, at some point in the future, cars will either drive themselves or at least detect impaired drivers, disable themselves, and call the police. But until then, it’s time to stop giving slaps on the wrist.

Making Word 2010 Show ONE PAGE AT A TIME

I’ve been fighting with Word 2010 at work for months, trying to figure out how to simply view a document the way I want — ONE page at a time (not side by side), but also zoomed in to a comfortable reading level.

Unfortunately, some asshole code weenie at Microsoft decided that Word 2010 would ALWAYS show pages side by side UNLESS you either (1) zoom in too far for two pages to fit (which is too wide), (2) resize your window to do the same (too distracting), or (3) choose a “one-page” view that zooms out so the entire page is on-screen (too small).

After digging around, I finally found this solution on a forum. I wish it was on StackOverflow so I could upvote it, create 10 fake accounts, and upvote it more times:


Here’s what works (Word 2010 with Win 7):

1. Set view in Page Layout mode

2. In the View tab of the ribbon, open the zoom dialog box using the Zoom button and select 1 page wide by 2 pages high (DO NOT change the zoom percentage in this dialog). Or in the View tab of the ribbon, click on the “one page” button (not sure of the exact button name in the English version of Word).

3. Adjust the zoom using ONLY the scroll wheel (ctrl+scroll button). NEVER use the zoom slider at the bottom-right of the window to adjust zoom. NEVER use the zoom button in the View tab of the ribbon to adjust zoom. And remember to ALWAYS use ctrl+scroll wheel to adjust zoom.

It seems Microsoft forgot to screw-up the “ctrl+scroll button” function when they messed around with the way the zoom slider works

I don’t know who user “mousetrap” is over there, but whoever you may be, BLESS YOU for actually posting a solution that works!

Contracts are a poor substitute for strong typing

I was reading about some planned C# improvements on the Roslyn forum and decided to cross-post my thoughts here that I made in a comment on a thread that included a discussion about code contracts. The comment follows:

Code contracts would be an improvement, but IMHO they treat the symptom, not the disease, as it were.

Say your method only accepts whole numbers, but uses an `int` parameter, which can theoretically be negative. Sure, putting a contract on the parameter to ensure it is >= 0 provides some protection, but it just moves the problem, so now the callee either has to do the same checks or defaulting before calling the method, or it has to handle the exceptions.

In the end, code contracts don’t address the underlying issue — your parameter has the wrong logical type. You’re using `int` when you really want a far more limited type of either zero or a positive integer. Using an unsigned integer would be better, but has its own foibles, so we hold our nose and pretend that a signed type for a never-negative parameter is our best option.

So what’s really going on is that your code contract is creating, logically, an anonymous type with restrictions not present in the base type. But by making it anonymous, it’s not reusable, either between methods or between the method and the caller.

A better approach, IMHO, is to use the type system. E.g., if I want only whole numbers, I create a type that aliases int but does not allow negative numbers to be assigned to it:

public contract WholeInt : int where >= 0;
public contract PositiveInt : int where >= 1 ?? 1;
public contract NNString : string where !null ?? String.Empty;

public class Customer {
  public WholeInt Age { get; set; }
  public NNString Name { get; set; }
  public PositiveInt NumVisits { get; set; }

This moves the responsibility back where it belongs: where the invalid value assignment occurs, not when some hapless method gets the bad value as an argument. It also encourages reuse, and provides the option of an alternative default rather than an exception when a bad value is assigned or the default value of the underlying type would be invalid.

To allow better backward compatibility, it should always be possible to implicitly cast one of these contract types to their underlying type. This would allow, for example, `ICollection<T>.Count` to return a `WholeInt` but callers can still assign the result directly to Int32.

Using the keyword `contract` above is just an example — perhaps extending the concept of an Interface would be better.

Optimizing Character Replacement in C#

For some time, I’ve had a utility function that cleans strings of Unicode characters that can cause issues for the users of the applications I manage.

It runs through a series of regular expression statements that replace problematic Unicode characters with replacement strings. In the examples below, I’m showing only the replacements related to single-quote characters. In the full code (which has 10 separate RegEx calls), some characters are simply stripped out, others are replaced with two-character strings. Here’s how the old code worked:

public static string FixBadUnicode1(string s) {
  return RegEx.Replace(s,

This performs well, but due to the number of times it is called, the total execution time is non-trivial, so I decided to test it against a character loop.

To perform my tests, I performed replacements on a corpus of 38,000 representative strings. Most *don’t* have problematic characters, so the overhead of *looking* for the characters outweighs the cost of the actual replacements. I used the safe native method QueryPerformanceCounter, which provides a higher-resolution counter than the .NET library.

The baseline (RegEx) code ran consistently in around 20 million counter cycles, or about 9 seconds.

My recent experience has been that Dictionary usually beats switch{} when there are more than a handful of cases, so I decided to create a static Dictionary<char, string> to look up the problem characters. Here’s the first iteration:

public static string FixUnicodeChars(string s) {
    var mappers = UnicodeMapper;
    var sb = new StringBuilder(s.Length);
    string fix = null;
    foreach(var c in s) {
        if(mappers.TryGetValue(c, out fix)) {
            if(fix!=null) sb.Append(fix);
        } else {
    return sb.ToString();

If the “fix” string is null (i.e., I just want to strip the character), it skips the StringBuilder.Append() call.

The dictionary UnicodeMapper looks like this (again, only the single-quote portion):

    private static Dictionary<char, string> _unicodeMapper;
    private static Dictionary<char, string> UnicodeMapper {
        get {
            if(_unicodeMapper==null) {
                _unicodeMapper = new Dictionary<char,string>();
                lock(_unicodeMapper) {
                    // Fix all single quotes    [\u2018\u2019\u201A\u201B\u0091\u0092\u2032\u2035\u05FE]
                    _unicodeMapper.Add('\u2018', "'");
                    _unicodeMapper.Add('\u2019', "'");
                    _unicodeMapper.Add('\u201A', "'");
                    _unicodeMapper.Add('\u201B', "'");
                    _unicodeMapper.Add('\u0091', "'");
                    _unicodeMapper.Add('\u0092', "'");
                    _unicodeMapper.Add('\u2032', "'");
                    _unicodeMapper.Add('\u2035', "'");
                    _unicodeMapper.Add('\u05FE', "'");
            return _unicodeMapper;

This performs in just around 600,000 cycles, a more than 30x improvement over RegEx. This surprised me quite I bit — I expected RegEx to use unsafe pointers and other tricks to loop through the strings more quickly than I could enumerate the characters. Part of this performance is due to the fact that Char.GetHashCode() is far more efficient than String.GetHashCode(), so the Dictionary lookups are very fast.

On reviewing the code, I determined that since most strings don’t have any replacements, it made no sense to call sb.ToString() unless a replacement actually occurred. I added a flag, with this result:

public static string FixUnicodeChars2(string s) {
    var mappers = UnicodeMapper;
    var sb = new StringBuilder(s.Length);
    string fix = null;
    var hadChanges = false;
    foreach(var c in s) {
        if(mappers.TryGetValue(c, out fix)) {
            if(fix!=null) sb.Append(fix);
            hadChanges = true;
        } else {
    return hadChanges ? sb.ToString() : s;

This provided an additional 12% improvement… not as much as the first change, but worth the effort.

Next, I noted that until the first replacement occurs, I don’t need to instantiate the StringBuilder at all — I can just create it when the first replacement happen and populate it with the substring before the replacement:

public static string FixUnicodeChars3(string s) {
    var mappers = UnicodeMapper;
    StringBuilder sb = null;
    string fix = null;
    var hadChanges = false;
    int pos = 0;
    foreach(var c in s) {
        if(mappers.TryGetValue(c, out fix)) {
            if(sb == null) {
                sb = new StringBuilder(s.Length);
                if(pos > 0) sb.Append(s.Substring(0, pos));
                hadChanges = true;
            if(fix!=null) sb.Append(fix);
        } else if(hadChanges) {
        } else {
    return hadChanges ? sb.ToString() : s;

This final version requires us to keep track of the current position in the string until the first replacement (if any) is found, at which point we create and start using the StringBuilder. We’re also able to avoid assigning hadChanges more than once. This provides an additional 7% improvement over version 2.

As an aside, I tested using a for{} loop (since it would give us the position for “free”), but it was actually about 10% slower than using foreach{}. I also tried removing the null check before the StringBuilder.Append() call, since Append() already performs a null check. However, the overhead of calling Append() unnecessarily resulted in a 2% increase in execution time.

Another tweak would be to unroll the foreach{} loop into two loops — one until the first match is found, the other for subsequent tests and appends. That would remove the need for the “sb==null” and “hadChanges” checks during each character iteration. But it would also require either using for{} or creating a substring for foreach{} to work on, and I’m reasonably confident that these would eat up any savings.

The final potential improvement I can think of would be to use char[] instead of StringBuilder. Unfortunately, since some replacements result in a longer string and I can’t predict the replacements that will occur, there would be some complexity in bounds-checking and reallocating the array that could eat up all of the savings. It’s tempting to try, but maybe some other day. For now, I’m quite satisfied with a 42x performance improvement. :)

Oops… Lost my backup…

I temporarily misplaced the database driving this blog during an upgrade, but I’m back in business! *sigh*

I’m in the final stages of replacing my main web site (www.tallent.us), so that should be up soon. I’ve been hosting it with Zenfolio for a number of years, but since I don’t do commercial work anymore, it didn’t make sense to pay someone else for a fancier web site than I actually need. Plus, it gave me a chance to play with some Javascript and CSS features that I can’t use at work.

I’d love to know if anyone out there still even uses RSS readers, and if so, if you’re subscribing to this site. I could dig through my Apache log files, but I’d love to know actually *who* reads this, not just a number of visitors, and if *you* have a blog that I should be following.

If you have a second, please leave a comment with your name and, if applicable, blog URL…

SQL Server needs dynamic enums

I manage a number of databases at work that have varchar/nvarchar columns with very restricted values — usually fewer than a dozen valid choices.

The problem is that storing and indexing these values in Microsoft SQL Server is highly inefficient.

Enterprise Edition users can work around this by partitioning their tables on such a column, but (a) that only works for one column and (b) partitioning a table isn’t necessarily good for performance overall unless the partition function aligns well with the queries you tend to run on the table.

Anyone who has studied database theory knows the pat answer — create another relation (table) that has two columns: one with a meaningless unique int value, the other with the string value (also unique). Then in the main table, refer to each allowed value by its number, not its actual value, and join the tables for output. However, there are a few problems with this approach:

  1. It’s a pain in the ass — extra joins and aliasing for SELECT, translation on INSERT/UPDATE, etc.
  2. Adding new “valid” values over time requires inserting into the lookup table. That may work for, say, a list of countries, but is less useful for, say, a list of cities that is discrete but grows over time.
  3. If you have a number of columns like this, you’ll end up adding a ton of these list-lookup tables, which is messy. And if you decide to be clever and use an EAV approach of one lookup table with the ID number, list name, and list value, you’ll run into problems setting up standard referential integrity constraints when a table has multiple “pick-list” columns.

MySQL has an alternative — ENUM columns. These store a numeric value, but act like varchar() columns, and the list of valid choices is part of the column definition. While it looks tempting at first, the evils with this approach are many.

Instead, I would recommend that MSSQL server add a new keyword for varchar/nvarchar column definitions called LOW_CARDINALITY*. This would tell MSSQL to to the following in the background (transparently):

  • Maintain a dictionary of “used” values in the column;
  • Assign each unique value a surrogate integer value;
  • In the table or indexes covering the column, store the integer value, not the string.

This would result in highly efficient storage of columns that contain a limited number of choices, without needing to rely on relational joins, “magic” numbers (usually managed by the business logic), or other clumsy workarounds. It could make a huge difference for table and index storage, memory usage, and query performance. And since it is transparent to all queries, it’s the best kind of new feature — one that can be added to an existing database without having to make additional changes to views, stored procedures, or code that connects to the database.

I’m sure the devil is in the details, particularly when it comes to things like replication, but the benefits would far outweigh the effort.

* I’m sure someone at Microsoft could come up with a better keyword than LOW_CARDINALITY. Something like LIST might work but seems a bit colloquial. DISCRETE or SET are other options, though SET is already a reserved word.

A replacement for evil outs

I was reading a post by Jon Skeet and he mentioned the evil of using “out” parameters.

Anyone with a functional programming background understands what he’s referring to here — pure functions should have no side effects, they should only return a value.

The problem is, the world isn’t pure, and it’s quite common to need to return multiple results.

Probably the most common example in C# is the “TryGetValue()” pattern:

public bool TryGetValue(
TKey key,
out TValue value

C# 6 allows us to call this in a way that declares the “out” at the same time as the TryGetValue call (where before, we would need to declare the output variable first):

if(dict.TryGetValue("mykey", var myoutput) {

Jon makes the valid point that this will only encourage “out” parameters, which should really only be the exception to the rule. A more functional approach would be:

public struct TryGetValueResult(bool success, T value) {
public bool Success { get; } = success;
public T Value { get; } = value;

var result = dict.TryGetValue("mykey");
if(result.Success) {

Much cleaner from a functional standpoint, but still has that annoying declaration of a variable I’ll probably only use within the “if” block. (I could use “using,” but the line count is the same and I’m retreating one more brace level).

But what if we could do the following:

if(var result = dict.TryGetValue("mykey").Success) {

Essentially, allow variable declarations to create a variable scoped to the “if/else” block, assign a value, *and* return the value.

I don’t know if C# 6 would support this type of use, but if it would, it might be the best of both worlds.

Two Code Smells to Learn from Apple’s SSL Security Bug

I was reading an excellent ACM article on the recent Apple security bug, and it struck me that the author skipped completely over one of the *true* root causes of this bug.

Here’s the buggy code:

if ((err = SSLFreeBuffer(&hashCtx)) != 0)
goto fail;
if ((err = ReadyHash(&SSLHashSHA1, &hashCtx)) != 0)
goto fail;
if ((err = SSLHashSHA1.update(&hashCtx, &clientRandom)) != 0)
goto fail;
if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
goto fail;
if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
goto fail;
goto fail;
if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
goto fail;

Of course, the issue is in the duplicated “goto fail,” which, while indented, is not controlled by the preceding ‘if’ statement.

The ACM author blames the lack of unit tests. He has a point, but I think there are two other root causes: action repetition and poor formatting.

DRY – Don’t Repeat Yourself

The goto in itself isn’t “bad,” but the duplication of the goto is what leads to the problem.

When two conditions lead to the same result, you should favor code that “measures twice and cuts one.” You may have to engage a few more brain cells to make this code style perform as well as spaghetti branches, but it’s worth the investment.

Consider the following replacement code:

err = SSLFreeBuffer(&hashCtx)
|| ReadyHash(&SSLHashSHA1, &hashCtx)
|| SSLHashSHA1.update(&hashCtx, &clientRandom)
|| SSLHashSHA1.update(&hashCtx, &serverRandom)
|| SSLHashSHA1.update(&hashCtx, &signedParams)
|| SSLHashSHA1.final(&hashCtx, &hashOut);
if (err != 0) goto fail;

Bitwise OR will shortcut on the first non-zero value, so in terms of performance, this is equivalent to the goto-fail mess above, and also equivalent to what most programmers would suggest as an alternative (adding a clutter of braces to the “if” statements so the extra “goto” sticks out better).

But unlike the original code, the example above calculates the condition of the existence of an error *once* and then performs *one* action (the goto) if an error is found. IMHO, the code is more clear as well, since it doesn’t rely on goofy right-hand evaluation of an assignment (which in itself is a shortcut I was never fond of in C).

Sentences should not
have arbitrary line breaks and neither
should code

Sure, braces would have helped here, but IMHO, simple if statements should simply include their action on the same line, just like they did back in the ol’ line-numbered BASIC days. And, ideally, similar logic should align their actions. Consider this code:

if ((err = SSLFreeBuffer(&hashCtx)) != 0) goto fail;
if ((err = ReadyHash(&SSLHashSHA1, &hashCtx)) != 0) goto fail;
if ((err = SSLHashSHA1.update(&hashCtx, &clientRandom)) != 0) goto fail;
if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0) goto fail;
if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0) goto fail;
if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0) goto fail;

(Yes, I noticed this actually does soft-wrap in my default WordPress theme. Paste it into Notepad and behold its awesomeness. :) )

While this still exhibits the same overuse of evaluating assignments and duplicating actions, at least it is abundantly clear what is going on, and an extra “goto” will stick out like a sore thumb.

I realize this is an unorthodox way to format C code, but it’s a favorite of mine. Maybe my mind is just crippled by the semester of COBOL I took in college, but I like aligned code. Like a good grid vs. a CSV text file, the alignment makes it far easier to spot inconsistencies. I use the same approach in CSS and T-SQL, where it makes sense to do so.