Enforced Strings

This is an old post and doesn't necessarily reflect my current thinking on a topic, and some links or images may not work. The text is preserved here for posterity.

Enterprise applications typically deal with many categories of strings. Human names, reference codes, SKU identifiers, email addresses - the list is huge. There are subtle rules that apply to many of them:

  • Whitespace at the start and end of many strings should probably be ignored
  • Human names probably shouldn't contain newlines, tab characters, the percentage symbol, or 27 dashes in a row
  • For some strings, casing makes no difference when deciding equality, and sometimes it does

It's common to litter our code with these assumptions, which leads to inconsistency. Sometimes we assume that the UI will handle all of these issues, and the domain layer will simply use what it's given.

Recently I have started experimenting with creating custom strings to encapsulate a lot of these subtle things. On my blog, when you browse to a URL like /enforced-strings, instead of the page name being passed around as a string, it's passed as a PageName object. PageName supports implicit conversion operators, so it can be dealt with as a regular string too. Here is part of a unit test:

PageName expected = "hello-world";

// When cast to a PageName, each of these should be converted into the above
var logicallySameAsExpected = new string[] {
    "hello-world",
    "hello-worLD",
    " hello -world ",
    " hello$-world ",
    " hello $-world ",
    " hello-$-world ",
    " hello world ",
    " -   hello   world   - ",
    " -   HeLLo  WoRLD  - ",
    " -   HeLLo  %^@#@#*()[]WoRLD  - ",
    " -   HeLLo  %^@#@#*()[]WoRLD  - $%",
    "@# -   HeLLo  %^@#@#*()[]WoRLD  - $%"
};

foreach (var match in logicallySameAsExpected)
{
    var castMatch = (PageName) match;
    Assert.AreEqual(expected, castMatch);
}

The assumption I make with PageName is that while it may be instantiated with dirty input (a malformed URL, for example), I can probably infer what was meant. PageName is used throughout my domain model and even at the data access layer - in my case, I use a custom IUserType with NHibernate to treat strings from the database as page names.

To build your own enforced strings, here are the key things to consider doing:

  • Create a class to wrap a real string
  • Make it immutable, and ideally sealed
  • In the constructor, massage the input string
  • Override all of the equality operators, GetHashCode, etc., and implement IEquatable, and IComparable
  • Override ToString (obviously)
  • Add an implicit cast operator to automatically convert from your string to real strings and back

You can see an example of this in PaulPad - first I setup a base class with most of the overloads, then I inherit from that to setup the specific string type.