Enforced Strings

Enterprise applications typically deal with many categories of strings. Human names, reference codes, SKU identifiers, email addresses - the list is huge. There are subtle rules that apply to many of them:

  • Whitespace at the start and end of many strings should probably be ignored
  • Human names probably shouldn't contain newlines, tab characters, the percentage symbol, or 27 dashes in a row
  • For some strings, casing makes no difference when deciding equality, and sometimes it does

It's common to litter our code with these assumptions, which leads to inconsistency. Sometimes we assume that the UI will handle all of these issues, and the domain layer will simply use what it's given.

Recently I have started experimenting with creating custom strings to encapsulate a lot of these subtle things. On my blog, when you browse to a URL like /enforced-strings, instead of the page name being passed around as a string, it's passed as a PageName object. PageName supports implicit conversion operators, so it can be dealt with as a regular string too. Here is part of a unit test:

PageName expected = "hello-world";

// When cast to a PageName, each of these should be converted into the above
var logicallySameAsExpected = new string[] {
    " hello -world ",
    " hello$-world ",
    " hello $-world ",
    " hello-$-world ",
    " hello world ",
    " -   hello   world   - ",
    " -   HeLLo  WoRLD  - ",
    " -   HeLLo  %^@#@#*()[]WoRLD  - ",
    " -   HeLLo  %^@#@#*()[]WoRLD  - $%",
    "@# -   HeLLo  %^@#@#*()[]WoRLD  - $%"

foreach (var match in logicallySameAsExpected)
    var castMatch = (PageName) match;
    Assert.AreEqual(expected, castMatch);

The assumption I make with PageName is that while it may be instantiated with dirty input (a malformed URL, for example), I can probably infer what was meant. PageName is used throughout my domain model and even at the data access layer - in my case, I use a custom IUserType with NHibernate to treat strings from the database as page names.

To build your own enforced strings, here are the key things to consider doing:

  • Create a class to wrap a real string
  • Make it immutable, and ideally sealed
  • In the constructor, massage the input string
  • Override all of the equality operators, GetHashCode, etc., and implement IEquatable, and IComparable
  • Override ToString (obviously)
  • Add an implicit cast operator to automatically convert from your string to real strings and back

You can see an example of this in PaulPad - first I setup a base class with most of the overloads, then I inherit from that to setup the specific string type.

A picture of me

Welcome, my name is Paul Stovell. I live in Brisbane and work on Octopus Deploy, an automated deployment tool for .NET applications.

Prior to founding Octopus Deploy, I worked for an investment bank in London building WPF applications, and before that I worked for Readify, an Australian .NET consulting firm. I also worked on a number of open source projects and was an active user group presenter. I was a Microsoft MVP for WPF from 2006 to 2013.

18 Jan 2010


Nice post. Just a couple of comments, I think it would be nice to have an implict conversion from a string to a Pagename so this could work:

var pageNames = new PageName[]{"asas","some-other-page"}; PageName pageName = "my-page-name"; NavigateTo("my-page");

I'd still leave the conversion back to string explicit.

Also in this case I'll imagine you're not using it a lot to require any optimization, but the usage of string is heavily optimized accross the .NET Fx from the compiler to the runtime and GC and it's something we might feel the impact of if we used this technique for most of our strings.

Not that I'd discard this just for this two minor things, I'm just beign too picky

18 Jan 2010

Hi Miguel,

The derived class contains the implicit converter - check the attached sample:

public static implicit operator PageName(string value)
    return new PageName((value ?? string.Empty).Trim());

public static implicit operator string(PageName name)
    return name.ToString();

Performance could be a concern, but I'd see these objects as being used at 'entry points' to a library rather than something called in a tight loop.


19 Jan 2010

I'm not sure how I missed this post when it was first posted but I'm glad to see someone writing about it and comparing designs is always good.

Being in the telecom industry, I deal with these kinds of strings a lot! Not just strings but sometimes other base types and of course simple composite types that I usually create as structures. But I gotta agree with Miguel that the conversion from string -> custom type really ought to be explicit. Reason being that implicit conversions should never throw exceptions.

I've been playing around with T4 templates to create these wrappers. I ought to clean it up and put it up on my blog. Without the T4 template your base class helps out a lot but inheritance feels "wrong" in this scenario since we can't also derive from System.String.

A couple of ideas you might be interested in:

  • Adding a Length property
  • Less important but sometimes useful - adding commonly used String methods like Substring
  • Implementing IFormattable with support for at least G/g to format as all uppercase/all lowercase
  • Implementing IComparable in your base class
  • Implementing IXmlSerializable to maximize the "serializability" of the class without junking up the XML
  • Implementing Parse/TryParse methods on the derived class

Also watch for null in your == and != operators. Here's how I normally stub them out:

public static bool operator ==(PageName lhs, PageName rhs) {
    if (ReferenceEquals(lhs, rhs)) return true;
    if (ReferenceEquals(lhs, null)) return false;
    if (ReferenceEquals(rhs, null)) return false;
    return lhs.Equals(rhs);