Political post ahead…

I was trying to explain yesterday why my beliefs lean to left libertarianism (i.e., somewhat close to democratic socialism), and how that has absolutely nothing to do with Marxist Socialism or Communism.

In short, it’s because I believe the government is not the enemy, nor is business. Instead, the enemy is *unfettered* corporate power and *unchecked* government.

We check corporate powers through a combination of free-market capitalism (voting with our feet), unions, and where those are ineffective (and they are in a number of broad classes of corporate abuse), we use the law and the courts to regulate them. Now, I work in the business of dealing with regulations, so I’m keenly aware that there are good regulations and bad ones, but that doesn’t mean we should just drop all regulations just because a big business complains about them.

*Both* of these are incredibly important to keeping corporate power in check. And if you’ve *read* The Wealth of Nations, you would know that this is fully in agreement with Adam Smith’s own beliefs on capitalism — he is not the father of the economic anarchy that the far-right libertarian wing makes him out to be.

On the other hand, we check government powers by ensuring that legislators are working for educated, engaged voters, not for special interests who plaster the airwaves with lies to scare the masses who can’t be bothered to do some research before taking a position.

And we check both with a free, competitive, open press.

Well, here’s one small example of what happens when regulators fight for their *constituents*, not for big media conglomerates who bankroll their campaigns and then abuse their natural monopoly. And it also happens to be one more reason that I’m confident that I’m backing the right guy.

http://www.huffingtonpost.com/entry/fcc-proposal-cable-tv-boxes_us_56aa781ae4b05e4e3703b26e?

And… that’s my last political post for awhile…

“Take Our Country Back”

When someone says that, this is what I hear:

TAKE — providing nothing in return, using force, manipulation, mob rule, bribery, obstruction, or any other means to achieve the goal.
OUR — WASPs, aka “real” Americans.
COUNTRY — militant nationalists, willfully ignorant of anything else going on around this tiny blue dot.
BACK — back to the days when non-comformists and minorities of every kind were enslaved, ridiculed, railroaded, interred, denied the right to vote, and ignored in the courthouses and statehouses.

I don’t want to “take” this country back. I want to share it with people who didn’t win the genetic lottery by being born here. I want to change it to make a more perfect union. I want to build it to be a light shining on a hill, an example of informed democracy.

Ten Reasons I Hate Local News

Let me preface this by saying (a) I used to work in local television, and (b) I have friends who are or have been part of the industry, so I don’t really blame the on-screen talent or even some of the people behind the scenes.

I hate local newscasts.

I still record one newscast a day, and skim it about 75% of the time, usually during dinner. I’ve chosen the least objectionable local newscast, which for me is KFDM, but it’s still pretty terrible.

The reason I hate local news is that it could be so much better. As in, it could be a true force for good and change in our communities, rather than being the mostly-useless filler between advertisements that it is today.

In the interest of time, I’ve whittled my many grievances down to ten of the top reasons I hate local news, in no particular order.

1. Local Sports

Sports takes up almost half of the average local newscast, and then half of that is wasted regurgitating scores for national and state games that anyone with an Internet connection who gives a shit already knows.

Then, what passes for local coverage is a mindless droning of scores, with the occasional inane interview with a local coach or athlete that is completely interchangeable with any other interview (“we’re just gonna go out there and have teamwork and try our hardest blah blah blah…”).

I’ll admit I’m not a fan of taxpayer-sponsored religion school sports in the first place, but if you’re going to cover it, cover it!

Take just one or two of the 20 hours of live local news programming each week and create a dedicated show for local sports. Show recaps of local games. Go over the schedules. Talk about the athletes. Cover the little-league and soccer teams. Also, extend coverage to also include other intramural competitions — debate, theater, band, chess, robotics, spelling, math, etc., showing local children that throwing a ball around is not the only way to be recognized on local TV.

2. Weather

Everyone has a smart phone now. We’re all two clicks away from a 10-day forecast that generally beats the local guy’s predictions, and it doesn’t come with the 10-minute lecture about high- and low-pressure systems or the gleeful watching of every storm cloud in the Gulf of Mexico that could, with enough butterfly-flaps, transform into a hurricane in a few weeks.

So unless there’s a tornado coming, just show some pretty infographics with sunrise, sunset, forecast, and the boring stuff only people with boats care about, then go away.

If you want to jazz it up, show us something interesting happening in astronomy, or help shed some light on climate change for the 30% of Americans who still believe it’s a liberal conspiracy to take away their incandescent lights and gas-guzzling duallies.

3. Pre-Packaged News

You aren’t fooling us with that “Tech Time” and “Healthcare Watch” and “Market Minute” and other bullshit content (including lead-in copy) that you bought to fill the time. These pieces are the true bottom of the barrel of journalism, with their third-rate analysis, copy ripped straight from press releases, and the intellectual depth of the average Kardassian. I watch the local news for local news, not so I can hear Sally the Generic Reporter tell me that that evil hackers want to steal my credit card and I’d better protect myself by using a good password and buying an antivirus program.

4. Biased National Politics

Poll questions obviously written by a drunk Tea Party activist. News copy ripped straight from the GOP daily talking points. Gushing coverage of Republican candidates who come to town. Lack of even the most basic fact-checking when reporting what a politician says. Those of us who don’t get our national news only from local sources are on to you, and that includes the majority of Millennials, including conservative ones who can still smell one-sided BS. I suppose pandering to the old white audience is what sells more truck commercials, but the bias is obvious, and it stinks.

5. Social Media Comments

If I wanted to attend a virtual KKK rally, I already have an Internet connection and I can go look at the ignorance and knee-jerk hatred spewing from the comment section of every article on your Facebook page. Repeating that shit on the air, especially without any sort of critical analysis, just adds fuel to the flame.

https://www.facebook.com/KFDMNews/posts/10153767642803756

6. We Are Experiencing Technical Difficulties

It’s disappointingly rare to watch an entire live newscast without seeing some stupid technical snafu — dead mics, missing audio, mistimed B-roll, swapped graphics, poor lighting, misspelled crawls, drifting camera shots, reporters staring at their notes unaware that they’re live — the list goes on. Seriously, people, get your shit together! I’ve seen better production values at an elementary school musical.

(Ok, the video isn’t exactly on-point, but it’s still funny… I’m actually far mor forgiving of people flubbing their lines…)

7. Advernewsment

Yes, we the audience do notice that the people you to interview or book as “experts” just happen to be associated with the companies who advertise heavily on your station. We also notice when news that happens to be bad for local industry comes along, it gets glossed over, or only told from the industry’s perspective.

8. Quantity over Quality

Many network stations are churning out three or more hours of live local programming per day in newscasts and morning/afternoon shows. Worse, since so many stations are owned by the same media conglomerates, the same news program gets thrown out on multiple channels, or they share the same news desk.

The reason is obvious — stations believe they can make more in ad revenue with three hours of shite than 30 minutes of hard news by journalists who have the time to research their stories and produce compelling stories.

Maybe they’re right, and it’s more important simply to capture bored eyeballs immediately before and after the workday than it is to create a show people would actually make plans to watch. After all, the 24-hour news channels have the same approach — continuous, uncritical repetition of opinion, propaganda, and speculation rather than focused, critical journalism. Still, it’s sad.

http://www.journalism.org/media-indicators/amount-of-local-tv-news-per-weekday/

9. Horrendous Web Sites

 

Seriously, folks, they are SO BAD. Horrific. Slow, ad-laden, broken, lacking in aesthetics, mobile-unfriendly, Flash-driven, content-sparse, disorganized, … I could go on.

I suspect the design templates and back-end programming are mandated by the media conglomerate bosses (who probably outsourced some Elbonian programmers to hack around on some “content management system” sold to them by a guy in a slick suit). So it’s not all the local station’s fault. But that doesn’t make the user experience any better for the local viewer, and it devalues the station’s brand on the very platform that will eventually replace the time-slot broadcast news they depend on for so much ad revenue.

National newspaper sites aren’t much better. It’s like all the people who knew a damned thing about typography, photography, white space, etc. were fired when pixels replaced paper, and they haven’t realized yet that their web sites look worse than a mimeographed church newsletter from the 1980s. Hell, you’re reading this post right now on the free standard WordPress template, and it looks cleaner and more professional than 90% of the major news sites.

Local television stations need to recognize that the Internet isn’t going away, and that their only long-term hope is to capture a younger audience who live online, don’t subscribe to cable, don’t have a UHF antenna, and who won’t put up for slow pages, broken links, pop-up ads, and designs that make their eyes bleed.

http://www.kfdm.com/ (Edit: They redesigned their web site in early 2016, it looks MUCH better now!)

http://www.12newsnow.com/

http://www.beaumontenterprise.com/

The last link is the local newspaper… they’re just as bad.

10. Little Proactive Reporting

All too often, I hear people in Beaumont say about some local event with a poor turn-out, “I wish I had known about it!” Same goes for interesting items that were on the agenda at city council or school board meetings, debates between local politicians, etc.

One of the things the Internet doesn’t do well these days is to connect nonprofits, schools, governments, churches, etc. with their local communities so they can promote their events to the public. Facebook actively works against such promotion, unless the organization in question wants to the extortion fees to “advertise” to their own fans.

Local news generally fails to actively engage with local NPOs to promote public events and opportunities before they happen. Sure, a few chosen favorites like YMBL and Gift of Life get pre-coverage of their events, but it’s nearly impossible for, say, a nonprofit art gallery to get a little story about a local artist’s show opening, or a children’s program or fundraiser. Likewise, coverage of basic election information, such as poll locations and interviews with people on the ballot, is dismally thin.

I have no doubt that if the local newscast included a stronger focus on letting people know what’s going to happen in their communities, people would tune in more often. I don’t really need to know about every car crash, house fire, and storm-felled tree, but I would like to know when things are happening that I might want to get involved in, not just see reports about them after the fact.

Wrapping It Up

I hate on local news not because it is simply so bad, but because I see what it could be if only station leadership (and their corporate overlords) had the vision to do more than crank out the same thing, over and over. I hope some of them recognize and address these issues before it’s too late and local newsrooms go the way of the dodo.

A Meaningful Backup Strategy for Photographers

For the second time in the past few weeks, I’ve heard of a photographer who lost many years’ worth of work due to their computer and drives being stolen. This has caused me to start re-evaluating my own backup strategy, and I thought I would share a few notes about what I’ve already learned and how I’m planning to improve my own data security.

IMHO, a good backup strategy involves five prongs: good drives, local live backup, local online backup, remote backup, and portable backup.

Good Drives

The first line of defense is that you should entrust your RAW and PSD files only to drives with a strong record of low failure rates. Based on recent numbers from a study done by Backblaze, Hitachi drives outperform other brands.

Of course, the best drives are solid state drives — their Annual Failure Rate (AFR) runs closer to 1%, while physical drives run 3-8% depending on their age. But for now, solid state drives are too expensive for serious photographers — we churn through way too much storage.

Live Backup

Assuming you aren’t reading this in the future where 4TB SSD drives have a 0.5% AFR and cost $100, you’ll still be using good old spinning-plate drives. The catastrophic death of these drives is not a matter of “if,” but of “when.”

So, the second prong in a good backup strategy is that photography drive volumes should always be created in pairs — a RAID 1 set (mirrored). With this approach, every byte written to one drive is written to both simultaneously, and both drives can be used independently if one of them fails.

In years past, RAID 5 was the gold standard, since it offered a better balance of usable space than RAID 1 (2/3 of the space is available, versus 1/2 for RAID 1). However, as the storage size of drives have increased exponentially, so have the chances that when a drive in a RAID 5 set fails, that an unrecoverable error will occur during the rebuild, which then puts the entire volume in peril.

RAID 5 also relies on proprietary logic that determines how the data and parity stripes are laid out on the physical drives. Thus, if the RAID controller hardware fails and you can’t find replacement hardware that uses the same firmware, there’s a good chance your array will become an instant doorstop.

At the time of this writing, Hitachi 4TB drives run are $180, and cheaper drives are $140. If your average shoot runs around 16GB, you’ll be paying around $1.50 per photo shoot for storage.

Local Online Backup

A mirrored volume is not, by itself, a sufficient backup. It mitigates the issue of drive failure, but does nothing to protect you from yourself. If you accidentally trash the wrong folder, run a bad script, or overwrite the wrong file, you can lose hours of work, entire shoots, or in the worst case, everything.

The reason that a local online (i.e., connected and turned on at all times) backup is important is that, being human, you will forget to connect and use your backup drives.

I recommend software such as Apple’s Time Machine, which silently, reliably, and quickly backs up all changes you make to your files each hour, and makes it a cinch to restore the files, should the need arise.

It is preferable to locate this drive in your house, but in a a separate physical location from your main computer. This reduces the chances that a localized fire or theft will result in the loss of both your primary and backup drives.

If you’re an Apple fan, a good solution is to use an external hard drive connected to your Airport Extreme router. Apple sells a version of the router with a drive built in (“Time Capsule”), but as usual with Apple, the price is much higher than just connecting your own drive via its USB port.

These backups should contain not just your photographs, but also your boot drive, applications, Lightroom catalogs, and anything else you need to back up on your computer.

Again, because drives do fail and backups are difficult to restart from scratch, having a RAID 1 volume for your backup drive is a good idea. Unfortunately, the Airport Extreme does not support software RAID, so to use it, you need a drive enclosure with a built-in hardware RAID controller (so the Airport Extreme only sees one virtual drive).

I use an Akitio Hydra enclosure, which supports RAID 1 or 5 for up to 4 drives. I recommend against the Drobo — while I’m sure they’ve put a lot of work into their “BeyondRAID” algorithm, the fact remains that it is proprietary, and I’ve heard a number of horror stories of people losing Drobo volumes and having no means of recovering them.

Remote Backup

Having all of your primary and backup drives in one physical location (and online) is a bad idea, because it exposes you to a number of potential threats — a thorough thief, fire, flood, lightning strike, power surge, etc.

So, it is essential to have a backup in another physical location, and to keep it up to date. This is the piece I’m lacking in my own strategy, and it’s something I’m working to address.

The simplest solution is to ask a friend or family member to hold your backup drive for you, and swap them out once every few months (a safety deposit box would also work). If you don’t work from home, you could also just keep the drive in your office.

To make these remote backups, you may need to buy some additional software that can do incremental copies of your main drive to your backup drives — synching only the files that have changed. If you’re handy with the command line, rsync on OS X and xcopy on Windows can do this for free.

In this situation, encrypting the drive is probably a good idea, especially if your photographs are sensitive in nature (boudoir, art nudes, etc.). Even if you trust the person holding the drives, you can’t trust a thief who might take off with your drive while robbing their house. Fortunately, this is very simple to do in Disk Utility on a Mac, and using BitLocker on Windows.

There are a number of “cloud” backup services (Carbonite, Backblaze, Crashplan, Amazon Cloud Drive, Microsoft Drive, and Mozy, to name a few). On the plus side, these offer continuous (daily) backups and can allow you to access the files online from another computer. However, there are some disadvantages:

  • They can be expensive over time compared to just buying a hard drive or two.
  • You have to understand which of their plans you need to use. For example, if you use Carbonite, you would need to use the $100/year plan, not the $50/year one, because only the more expensive plan will back up something other than your main user directory (which will almost certainly not be the drive you’re using for your RAW and PSD files).
  • Upload speeds can be terrible. Most Internet providers give you only a modest upload speed — mine is 1.5Mbps, which is 1/10th of the download speed. At this speed, sending 16GB of RAW files to the cloud would take a full 24 hours and would saturate my uplink, which might cause issues with other Internet usage. So before you consider a cloud solution, test your upload speed and do the math!
  • Consider the risks of systems that don’t offer end-to-end encryption. Some services encrypt your files on their server and in transit, but they hold the keys, so anyone who compromises their system can read your files. The only safe encryption is where the encryption key never leaves your computer. If you don’t want your boudoir clients or models being involved in the next “The Fappening”-style breach, be sure you understand the basics of encryption and how they treat your files (good rule of thumb: if you can log into their site using a normal web browser and see your files, any “encryption” they say they do on your files is not sufficient).

I will say that of the cloud services I’ve seen, the one I like best is Crash Plan’s “Offsite Drive” option. This is a free service, it basically allows you to make a backup external drive, trade drives with your friend, and your changed files will be sent to your drive on their computer, directly and automatically, over the Internet (and vice versa). The drives are heavily encrypted (the right way), so you can’t see each other’s files. And if you want them to additionally store the files on their servers, they are happy to do that as well (for a fee of course).

While this concept from Crashplan doesn’t completely overcome the issue of upload speed, at least you are only uploading new/changed files to one another, not trying to upload your entire library of files. If you were trying to, say, upload 2TB of photos via a 1.5Mbps uplink, it would take over 4 months to complete the backup, so trading drives with a friend is far better than using a traditional cloud backup service, which is optimized for “normal” people who may only have 20-50GB of total data.

 

Portable Backup

The final piece to the puzzle is to have an emergency backup of your most important documents on your person at all times.

If the nightmare scenario happened and someone was able to compromise and destroy your files both on your local copies and your cloud backup, the goal of this backup would be to save (a) personal files of great importance, such as family photos and tax records, and (b) your legacy of work as a photographer.

Carrying around multi-terabyte hard drives is obviously not an option (yet), but you don’t really need to. Right now, a 256GB flash drive runs around $70. This won’t be nearly enough for your RAW and PSD files, but you can at least use it to store a very large number of full-resolution, final JPEGs of your work.

If you ever found yourself with only that flash drive remaining, the loss of the PSDs and original files would be regrettable, but you would still have a digital master that is suitable for making new prints.

Again, encryption is absolutely essential for this — if your USB drive is ever lost or stolen, you don’t want your personal information to be available to whoever ends up with the drive.

A good strategy is to create two partitions — one very small, unencrypted FAT partition with just a “readme.txt” file containing your contact information and a promise of a few bucks to return of the drive, and the second one for your main encrypted storage. Giving the smaller partition some extra breathing room (say, 4-8GB) might also be useful for keeping some basic data rescue programs, or just so you can use the drive in untrusted computers for short-term file transfers.

This final layer of protection may seem as if it borders on paranoia, but keep in mind that if every other backup you have happens because of automatic processes, you need at least one backup that requires a manual copy process.

Keep in mind, however, that current flash drive technology requires that drives be used — if you let a flash drive sit dormant long enough (a year or two), you could end up with corrupted data. As such, flash drives aren’t a perfect replacement for other backup media. (SSD drives have the same bit-rot issue.)

Final Thoughts

Data is fragile, and thinking through the potential points of failure requires good planning and a solid basic understanding of basic technology. No one ever thinks something bad will happen to their data, until it does. Years or even decades of work can disappear in the blink of an eye. So, stop reading this and GO BACK UP RIGHT NOW. :)

Why House Batteries are an Unsustainable Idea

I greatly admire Elon Musk. He’s like one part Steve Jobs, one part Tony Stark, and he has a knack for making the impossible both possible and profitable. And usually just plain damned cool.

But his latest idea, batteries for houses to store solar energy, makes zero economic sense for the vast majority of US homes.

The sale pitch is as follows:

  1. Make energy while the sun shines but you’re at work.
  2. Use it while you’re at home at night.
  3. Profit

The problem is, the numbers just don’t work out that way.

The average US home uses around 11 MWHr of electricity per year. However, that number has some serious biases that underestimate the usage among potential Tesla battery customers:

  • The average in states where solar power makes the most sense have much higher electricity use, primarily due to a much heavier A/C load. The average in Louisiana, for example, is around 15 MWHr a year.
  • The average is heavily skewed by apartment and condo dwellers, manufactured homes, vacation homes, and smaller, cheaper houses — all of which use far less electricity and are far less likely to buy a solar system, much less an expensive battery system to go with it. Larger homes (those whose owners can afford to drop $20-40k on a solar and battery system) generally have higher electrical usage, both for climate control and to feed more computers, TVs, refrigerators, pool pumps, and other power-suckers.

Because of the above, most rooftop solar installations barely provide enough electricity for daily use. Even if all residents are away (itself a real stretch), many of the most power-sucking devices still use a considerable amount of electricity. Air conditioning in the South, for example, doesn’t just run when residents are home, it takes hours to cool a house down from, say, 79 to 73.

Most parts of the country don’t have variable electrical rates, the rates are the same during the day and at night. So, trading electrical usage from one to the other has zero impact on the bill.

Even if rates were double the average at night ($0.20/KwH), which they are not, and you only used your solar power at night, it would take decades to make your money back on the battery.

Also, the battery system Tesla is selling doesn’t actually have the capability of running your entire house (again, see above, the “average” house of a would-be Tesla customer is not the “average” house), so you don’t even get the benefit of having uninterruptible power.

IMHO, Musk has solved a problem that simply doesn’t exist. He’ll sell some batteries, but would-be customers who actually sit down and do the math will find it doesn’t make financial sense, and unlike a Tesla car, there’s not much in terms of cool factor and bragging rights to having a glorified-but-undersized UPS hanging in the garage.

Zero Tolerance for DWI

A friend of a number of my friends was killed yesterday by a drunk driver. Not just any run-of-the-mill drunk, but one that had at multiple DWIs on his record.

As a result, my Facebook feed is full of calls for Texas to get stricter on sentencing for intoxication manslaughter, and comments range from “no parole” to “death penalty.”

IMHO, this may serve our collective need for revenge, but it doesn’t address the core problem — drunk drivers are generally habitual, and tough sentencing for murdering someone with their car will absolutely not deter them. When they are drunk, they don’t give a flying rat’s ass who they might hurt, and they believe themselves to have the superpower of not getting caught. So, tough sentencing only post-manslaughter probably won’t save many lives.

Instead, here’s what I think Texas should do after the first offense of drunk driving:

  • Suspend their license for a year. If the driver is under 18, suspend it until they turn 19. If they are 18-20, suspend it until they are 21 or for a year, whichever is longer.
  • Within 60 days of conviction, require them to complete a 21-day, in-patient rehab program.
  • Impound all vehicles registered in their name. When their suspension ends, to get them back, they must pay to have an engine interlock device (breathalyzer) professionally installed.
  • For the next 3 years:
    • Give them a “vertical” state ID or driver’s license, which will significantly lower the chances of them getting alcohol in clubs or buying it in the store.
    • Put a code “N” on their license (interlock device required). Require car dealerships and rental companies to comply with this before providing them a vehicle.
    • If they have a TABC certification, take it away.

Last but not least, states need to come together and have reciprocity around these restrictions, so the driver can’t just get around them by moving.

With current technology, we can’t stop people from making the decision to drive drunk, but I think we can do a lot better to protect the rest of us from someone who has already made that decision once and been caught, because if we don’t, it almost certainly will happen again.

Hopefully, at some point in the future, cars will either drive themselves or at least detect impaired drivers, disable themselves, and call the police. But until then, it’s time to stop giving slaps on the wrist.

Making Word 2010 Show ONE PAGE AT A TIME

I’ve been fighting with Word 2010 at work for months, trying to figure out how to simply view a document the way I want — ONE page at a time (not side by side), but also zoomed in to a comfortable reading level.

Unfortunately, some asshole code weenie at Microsoft decided that Word 2010 would ALWAYS show pages side by side UNLESS you either (1) zoom in too far for two pages to fit (which is too wide), (2) resize your window to do the same (too distracting), or (3) choose a “one-page” view that zooms out so the entire page is on-screen (too small).

After digging around, I finally found this solution on a forum. I wish it was on StackOverflow so I could upvote it, create 10 fake accounts, and upvote it more times:

http://forums.anandtech.com/showpost.php?s=50716a67e804036d8b72464339a6fdf8&p=34500427&postcount=15

Here’s what works (Word 2010 with Win 7):

1. Set view in Page Layout mode

2. In the View tab of the ribbon, open the zoom dialog box using the Zoom button and select 1 page wide by 2 pages high (DO NOT change the zoom percentage in this dialog). Or in the View tab of the ribbon, click on the “one page” button (not sure of the exact button name in the English version of Word).

3. Adjust the zoom using ONLY the scroll wheel (ctrl+scroll button). NEVER use the zoom slider at the bottom-right of the window to adjust zoom. NEVER use the zoom button in the View tab of the ribbon to adjust zoom. And remember to ALWAYS use ctrl+scroll wheel to adjust zoom.

It seems Microsoft forgot to screw-up the “ctrl+scroll button” function when they messed around with the way the zoom slider works

I don’t know who user “mousetrap” is over there, but whoever you may be, BLESS YOU for actually posting a solution that works!

Contracts are a poor substitute for strong typing

I was reading about some planned C# improvements on the Roslyn forum and decided to cross-post my thoughts here that I made in a comment on a thread that included a discussion about code contracts. The comment follows:

Code contracts would be an improvement, but IMHO they treat the symptom, not the disease, as it were.

Say your method only accepts whole numbers, but uses an `int` parameter, which can theoretically be negative. Sure, putting a contract on the parameter to ensure it is >= 0 provides some protection, but it just moves the problem, so now the callee either has to do the same checks or defaulting before calling the method, or it has to handle the exceptions.

In the end, code contracts don’t address the underlying issue — your parameter has the wrong logical type. You’re using `int` when you really want a far more limited type of either zero or a positive integer. Using an unsigned integer would be better, but has its own foibles, so we hold our nose and pretend that a signed type for a never-negative parameter is our best option.

So what’s really going on is that your code contract is creating, logically, an anonymous type with restrictions not present in the base type. But by making it anonymous, it’s not reusable, either between methods or between the method and the caller.

A better approach, IMHO, is to use the type system. E.g., if I want only whole numbers, I create a type that aliases int but does not allow negative numbers to be assigned to it:

public contract WholeInt : int where >= 0;
public contract PositiveInt : int where >= 1 ?? 1;
public contract NNString : string where !null ?? String.Empty;

public class Customer {
  public WholeInt Age { get; set; }
  public NNString Name { get; set; }
  public PositiveInt NumVisits { get; set; }
}

This moves the responsibility back where it belongs: where the invalid value assignment occurs, not when some hapless method gets the bad value as an argument. It also encourages reuse, and provides the option of an alternative default rather than an exception when a bad value is assigned or the default value of the underlying type would be invalid.

To allow better backward compatibility, it should always be possible to implicitly cast one of these contract types to their underlying type. This would allow, for example, `ICollection<T>.Count` to return a `WholeInt` but callers can still assign the result directly to Int32.

Using the keyword `contract` above is just an example — perhaps extending the concept of an Interface would be better.

Optimizing Character Replacement in C#

For some time, I’ve had a utility function that cleans strings of Unicode characters that can cause issues for the users of the applications I manage.

It runs through a series of regular expression statements that replace problematic Unicode characters with replacement strings. In the examples below, I’m showing only the replacements related to single-quote characters. In the full code (which has 10 separate RegEx calls), some characters are simply stripped out, others are replaced with two-character strings. Here’s how the old code worked:

public static string FixBadUnicode1(string s) {
  return RegEx.Replace(s,
    "[\u2018\u2019\u201A\u201B\u0091\u0092\u2032\u2035\u05FE]",
    "'");
}

This performs well, but due to the number of times it is called, the total execution time is non-trivial, so I decided to test it against a character loop.

To perform my tests, I performed replacements on a corpus of 38,000 representative strings. Most *don’t* have problematic characters, so the overhead of *looking* for the characters outweighs the cost of the actual replacements. I used the safe native method QueryPerformanceCounter, which provides a higher-resolution counter than the .NET library.

The baseline (RegEx) code ran consistently in around 20 million counter cycles, or about 9 seconds.

My recent experience has been that Dictionary usually beats switch{} when there are more than a handful of cases, so I decided to create a static Dictionary<char, string> to look up the problem characters. Here’s the first iteration:

public static string FixUnicodeChars(string s) {
    var mappers = UnicodeMapper;
    var sb = new StringBuilder(s.Length);
    string fix = null;
    foreach(var c in s) {
        if(mappers.TryGetValue(c, out fix)) {
            if(fix!=null) sb.Append(fix);
        } else {
            sb.Append(c);
        }
    }
    return sb.ToString();
}

If the “fix” string is null (i.e., I just want to strip the character), it skips the StringBuilder.Append() call.

The dictionary UnicodeMapper looks like this (again, only the single-quote portion):

    private static Dictionary<char, string> _unicodeMapper;
    private static Dictionary<char, string> UnicodeMapper {
        get {
            if(_unicodeMapper==null) {
                _unicodeMapper = new Dictionary<char,string>();
                lock(_unicodeMapper) {
                    // Fix all single quotes    [\u2018\u2019\u201A\u201B\u0091\u0092\u2032\u2035\u05FE]
                    _unicodeMapper.Add('\u2018', "'");
                    _unicodeMapper.Add('\u2019', "'");
                    _unicodeMapper.Add('\u201A', "'");
                    _unicodeMapper.Add('\u201B', "'");
                    _unicodeMapper.Add('\u0091', "'");
                    _unicodeMapper.Add('\u0092', "'");
                    _unicodeMapper.Add('\u2032', "'");
                    _unicodeMapper.Add('\u2035', "'");
                    _unicodeMapper.Add('\u05FE', "'");
                }
            }
            return _unicodeMapper;
        }
    }

This performs in just around 600,000 cycles, a more than 30x improvement over RegEx. This surprised me quite I bit — I expected RegEx to use unsafe pointers and other tricks to loop through the strings more quickly than I could enumerate the characters. Part of this performance is due to the fact that Char.GetHashCode() is far more efficient than String.GetHashCode(), so the Dictionary lookups are very fast.

On reviewing the code, I determined that since most strings don’t have any replacements, it made no sense to call sb.ToString() unless a replacement actually occurred. I added a flag, with this result:

public static string FixUnicodeChars2(string s) {
    var mappers = UnicodeMapper;
    var sb = new StringBuilder(s.Length);
    string fix = null;
    var hadChanges = false;
    foreach(var c in s) {
        if(mappers.TryGetValue(c, out fix)) {
            if(fix!=null) sb.Append(fix);
            hadChanges = true;
        } else {
            sb.Append(c);
        }
    }
    return hadChanges ? sb.ToString() : s;
}

This provided an additional 12% improvement… not as much as the first change, but worth the effort.

Next, I noted that until the first replacement occurs, I don’t need to instantiate the StringBuilder at all — I can just create it when the first replacement happen and populate it with the substring before the replacement:

public static string FixUnicodeChars3(string s) {
    var mappers = UnicodeMapper;
    StringBuilder sb = null;
    string fix = null;
    var hadChanges = false;
    int pos = 0;
    foreach(var c in s) {
        if(mappers.TryGetValue(c, out fix)) {
            if(sb == null) {
                sb = new StringBuilder(s.Length);
                if(pos > 0) sb.Append(s.Substring(0, pos));
                hadChanges = true;
            }
            if(fix!=null) sb.Append(fix);
        } else if(hadChanges) {
            sb.Append(c);
        } else {
            pos++;
        }
    }
    return hadChanges ? sb.ToString() : s;
}

This final version requires us to keep track of the current position in the string until the first replacement (if any) is found, at which point we create and start using the StringBuilder. We’re also able to avoid assigning hadChanges more than once. This provides an additional 7% improvement over version 2.

As an aside, I tested using a for{} loop (since it would give us the position for “free”), but it was actually about 10% slower than using foreach{}. I also tried removing the null check before the StringBuilder.Append() call, since Append() already performs a null check. However, the overhead of calling Append() unnecessarily resulted in a 2% increase in execution time.

Another tweak would be to unroll the foreach{} loop into two loops — one until the first match is found, the other for subsequent tests and appends. That would remove the need for the “sb==null” and “hadChanges” checks during each character iteration. But it would also require either using for{} or creating a substring for foreach{} to work on, and I’m reasonably confident that these would eat up any savings.

The final potential improvement I can think of would be to use char[] instead of StringBuilder. Unfortunately, since some replacements result in a longer string and I can’t predict the replacements that will occur, there would be some complexity in bounds-checking and reallocating the array that could eat up all of the savings. It’s tempting to try, but maybe some other day. For now, I’m quite satisfied with a 42x performance improvement. :)

Oops… Lost my backup…

I temporarily misplaced the database driving this blog during an upgrade, but I’m back in business! *sigh*

I’m in the final stages of replacing my main web site (www.tallent.us), so that should be up soon. I’ve been hosting it with Zenfolio for a number of years, but since I don’t do commercial work anymore, it didn’t make sense to pay someone else for a fancier web site than I actually need. Plus, it gave me a chance to play with some Javascript and CSS features that I can’t use at work.

I’d love to know if anyone out there still even uses RSS readers, and if so, if you’re subscribing to this site. I could dig through my Apache log files, but I’d love to know actually *who* reads this, not just a number of visitors, and if *you* have a blog that I should be following.

If you have a second, please leave a comment with your name and, if applicable, blog URL…