Welcome, my name is Paul Stovell. I live in Brisbane and work on Octopus Deploy, an automated deployment tool.
Prior to founding Octopus Deploy, I worked for an investment bank in London building WPF applications, and before that I worked for Readify, an Australian .NET consulting firm. I also worked on a number of open source projects and was an active user group presenter. I was a Microsoft MVP for WPF from 2006 to 2013.
Performance is important. You can meet all of the requirements, and be completely bug free, but if a page takes 20 seconds to render, the customer won't be happy. As Jeff Atwood wrote, speed still matters. Performance is the functional requirement that every customer forgets to mention, and every developer forgets to ask about. Customers generally just assume the performance will be adequate.
Performance is so important that I suggest we change the standard user story template to:
As a <user> I want to <action> so that <goal> within <performance expectation>
When discussing performance, this quote is often tossed around:
premature optimization is the root of all evil
The full quote, however, is (emphasis mine):
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." - Donald Knuth
Small efficiencies in an algorithm are one thing, but often we go to the other extreme: we make architectural choices that make decent performance impossible.
- We introduce unnecessary layers, for the sake of architectural purity, which often have to be torn down to get halfway decent performance at the end of the project
- We don't look out for stupid bugs that lead to common problems SELECT N+1 and memory leaks
- We don't make provision for the simplest performance improvements, like caching, in our frameworks, so they have to be scattered throughout the code
Most projects end up having a sprint that is devoted wholly to fixing performance in the application. That's not "tweaking 10% extra". It's "making the home page render without 72 SQL queries". It's a sad fact that most performance problems aren't fixed by rocket science micro-optimization, but by undoing dumb architecture decisions.
A simple performance check-list
During the "sprint 0" backlog building stage, there's a few simple questions we should ask the customer:
- How many concurrent users should they expect to serve?
- Will there be periods of major increase in demand (e.g., Christmas sales)
- What is their maximum response time (usually this should be no more than a few seconds)
- What costs are they likely to wear as far as server costs, bandwidth costs, etc., so we can keep an eye on them
At the end of each sprint, as a bare minimum, we should:
- Measure the number of SQL queries that are issued as we browse the most common pages
- Measure the number of network requests (browser->web->app server) necessary to serve a single request
- Keep an eye on our memory and CPU usage, and watching how they change as more users are added
This steps won't uncover every potential performance problem, but they take about 10 minutes to do at the end of a sprint, and will uncover the most basic performance problems caused by the architecture, at the best time to fix them. Windows Performance and Reliability Monitor, SQL Profiler, NHibernate Profiler and the "network" tab of your favorite browser's debugging tools are all you really need.
"Coupling", in code, is usually described as loose or tight, and is a way of measuring how dependent one piece of code is on another piece. In integration, we talk about coupling between systems. But it's not as simple as loose and tight; there are many different kinds of coupling between systems, and that's the topic of this post.
When choosing an integration approach, coupling can be measured along many dimensions:
Application 1 understands a "Customer" as someone with an active, billable subscription. Application 2 understands them as anyone who has filled in the online Register form. Will one application be forced to conform with the other? Or can the two understandings co-exist?
Is Application 1 taking a dependency on the implementation details of Application 2? Or do they rely on well-defined contracts?
Does Application 1 require Application 2 to be online in order to function? Will performance problems in Application 2 affect Application 1's SLA?
If Application 1 grows old and is replaced with another system, will Application 2 need to change (either in code, or in configuration)?
- Source Cardinality
Application 1 uses information from Application 2. Will Application 1 need to be changed if Application 3 is introduced as a supplementary source of information?
By splitting coupling into some different categories, we can apply them to some of the integration solutions we've explored.
Coupling in integration solutions
For each solution, I gave the dimensions of coupling a rating from 1 to 5 (1 being loosely coupled, 5 being very tightly coupled). I came up with the scores based on the highly scientific approach of using my gut:
A few thoughts came to mind while creating this visualization:
- Web services technologies like WCF talk a lot about decoupling, but they are still lead to tight coupling in many ways
- I'm probably leaning towards choosing ETL solutions over web services unless my integration requires a lot of application logic
- Shared databases are the best way to ensure your applications become hard to maintain
We have been exploring ways to share customer information between these two applications:
The solutions we've covered so far are:
Web services are a nice way to decouple the applications, because they allow the applications to define and share a contract rather than taking a dependency on implementation details. But they do introduce other forms of coupling, especially around reliability.
Messaging allows the applications to exchange information, using well defined contracts, asynchronously. The Marketing application would keep its own list of customers, and would accept messages from the Web Store application. Architecturally, it will look like this:
The Web Store team would define the structure of a message - such as
CustomerRegistered. They'd probably document not just the structure of the message, but some of the semantics around it (what does "registered" mean?).
The Marketing team would also define a message, such as
CreateCustomer, which it would accept, along with the semantics of what "create" means. Note that our tone has changed from describing an event, to describing a command.
Integration using messages
The Web Store would behave like this:
- A user clicks the "register" button of an web page
- The customer is saved locally to the
CustomerRegisteredevent is written to the queue.
The queue (generally something like MSMQ, ActiveMQ or RabbitMQ) would ideally be local to the machine Web Store is being served from. Web Store can then continue to process other requests. It doesn't care what happened to the event. It doesn't know how other applications intend to use the events. It just writes to the queue, and moves on.
Somehow, the message will make its way from a local queue on the Web Store machine to a queue on another server running some kind of transformer application. The transformer will handle the
CustomerRegistered message, apply integration logic, transform it to a
CreateCustomer command message, and write it to a queue destined for the Marketing application.
From the Marketing teams point of view:
CreateCustomermessage lands in a local queue. They have no idea how or why, just that it did.
- Code in the
Marketingapplication picks up the message, writes the customer details to the MySQL database
- The message is removed from a queue
Note that steps 2 and 3 are typically done within a transaction; we only delete the
CreateCustomer message from the queue when the new customer's details are safely committed to the MySQL database.
The box I've labelled "transformer" above is a bit of an iceberg. It could be:
- A $200,000 BizTalk installation
- An NServiceBus DLL with a simple handler using pub/sub
- An open source package like Apache Camel
- A C# console app that uses
- An intern who pastes the message into a Word document, prints it, faxes it to a data entry clerk, who then re-types it into an InfoPath form which emits XML compatible with the Marketing queue
Integration of all of your applications may be centralized or decentralized. Once you start adding many transformations, and you make it easy to expose applications to them, it's generally called a service bus.
The code that lives inside the "transformer" box tends to be pretty predictable, if complicated. Enterprise Integration Patterns is a good book (which I have read) about the kinds of things that happen in this layer.
Messaging combines the best of our previous solutions:
- Like the ETL solution, neither application is (from a code point of view) aware of each other, nor do they require the other application to be online in order to function
- Like the Web Services solution, the application can control how requests are processed, and apply domain logic before data makes its way into the inner sanctum that is the database
Messaging can help to make our applications very reliable, since applications are designed to be completely decoupled from each other. They are decoupled not just from a "contract over implementation" point of view, but from an "uptime" point of view. I'm going to explore these coupling concepts more in another post.
Developers generally have less experience with messaging for integration, so there will be a learning curve. This is also an area swimming with vendor sharks selling pricey products, so if you spend too much time on the golf course you could get stuck with an integration solution you really don't want.
Hopefully this brief tour of integration solutions gives you an idea of how they could apply in the real world.
If you've used messaging for integration, how did it go? If you thought about it but opted for another solution, why?
We've been looking at ways to share customer information between two applications:
One approach was having both applications use the same OLTP database. This presented some challenges; namely, it coupled the two applications very closely together, creating a huge ripple effect if either application needed to change. A second solution was to use ETL scripts to shift data between application databases. This decouples the applications a little, but integrating at the data layer means we lose a lot of context.
For our third solution, let's explore the use of web services. In this solution, the Web Store would "own" the customer information. The Marketing application would store its own application-specific data, but it would make a web service call to fetch customer information. The solution might look like this:
The web service could be implemented many ways:
- An RPC endpoint, using SOAP for operations like fetching and updating customers, perhaps implemented using WCF
- A RESTful endpoint, with Customers as a true resource, exposed as XML or Json
- A URL that just returns a CSV of Customers
In the previous approaches, the Web Store application gave up control over its data. ETL scripts might have meddled with customer data, bypassing the application domain logic.
With this approach, Web Store retains complete control over how other applications access and modify the data it owns. It can validate updates to customers, reuse some of the domain model code, block updates to archived customers, and so on. Best of all, it can change the database schema completely without upsetting grumpy DBA's ;-)
While this approach has many advantages over the previous solutions, it has a major downside: the reliability of the Marketing solution is coupled to the Web Store solution.
Although a critical system like Web Store is unlikely to be completely offline for a period of time, this architectural mistake could manifest itself in other ways:
- If the Web Store is exceptionally busy, the Marketing solution may run very slowly
- A bug in the Marketing solution (like calling a service in a tight loop) could have negative impacts on Web Store's reliability
- If multiple applications begin to depend on Web Store's web services, the Web Store team may have to deal with a myriad of versioning issues.
These issues are especially important if either application has any kind of SLA.
It's important to note that while WSDL/MEX and technologies like WCF do a good job of decoupling applications by using contracts, they alone don't fully decouple the uptime, reliability and performance issues that come about when integrating applications.
What experiences, good or bad, have you had creating web services that are consumed by other applications (not just between client/server apps) in your enterprise?
As I outlined previously, we need to enable these two applications to share customer data:
A few people suggested having both applications share the same database, but that has some pitfalls. In coupling both applications to the same database, the ripple effect of change will become hard to manage over time.
A second solution, as suggested by Robert and Peter in the comments, is for each application to "own" an independent database, and to use an extract, transform and load process to push customer information from the Web Store into the Marketing application. Architecturally, it would look like this:
This approach means that:
- Each application can design and control its own database schema
- The storage and indexing of each database can be optimized for the access patterns of its application
- Changes can be made to either schema without having to co-ordinate the changes with the other team
Of course, point 3 is always murky - the Web Store team can't decide to drop the
EmailAddress column without it having an effect on the Marketing team. But the Web Store team could de-normalize the tables to gain a performance increase without affecting the Marketing team.
Someone will need to own the ETL process, and changes in either schema will need to be co-ordinated with that person as part of the release plan. But overall changing an SSIS package or batch file is probably easier than changing database schemas. We can never eliminate the need for communication, but by decoupling the two applications, we gain many benefits.
An ETL approach works well when we're just talking about two applications, with one application needing a read only view of the other. But it gets messy when:
- We introduce more than one application
- Data changes need to flow bi-directionally
If you consider an environment with multiple applications needing data from each other, you could end up with something like this:
The cost of change in an environment like this can still be expensive, although it's probably easier than if they all used a single database. If a new application is added, it might need to source data from three, four or five other databases, which requires that many new ETL packages. The difficulty of scaling a system like this is the same as the difficulties described in the Mythical Man Month about scaling teams.
Staging databases and other ETL patterns can help here, but I'd personally try and avoid creating an environment like this. There's also one other problem with using ETL:
You lose context
As with shared databases, ETL scripts work on data. Unless the database uses some form of event sourcing, Working at the data level means we lose a lot of context. We can see the data, and we can even tell that it changed, but it's hard to tell why it changed. Figuring it out is like putting together the sequence of events that took place at a crime scene, based on the current state.
When the address changed, was it because the customer mis-typed their address the first time, or because they moved home? The application may have known that at the time the data was updated, but that probably wasn't persisted anywhere.
As with a crime scene, we can only glean so much from the data by examining its current state. Next up we'll look at ways to provide more context around why data changed.
In my last post, I introduced a scenario where we need to allow two applications to make use of the same customer information:
The Web Store already has a SQL Server database. Since we're still designing the Marketing application, we could just make it use the same database. The result would look like this:
This is probably the simplest solution that could possibly work, but it has a few downsides:
- Since changes to the schema could affect the other team, changes need to be co-ordinated.
- Storage and indexing need to be optimized for the access patterns of both applications, which might be hard to accommodate.
In essence, the key problem is coupling. We create a ripple effect any time we try to change a shared database.
Unless the applications are talking through a façade, such as a stored procedure layer, it's difficult for one application to isolate itself from another.
As the enterprise grows, the effects of this become much worse. If more applications are built on top of this database, adding a column to a table could involve meetings between four or five teams, all with different priorities.
Mommy, where do DBA's come from?
Over time, the database morphs from being "just a place where the application stores its data", to the most critical piece of infrastructure in the organization. To deal with this, organizations often hire dedicated Database Administrators. Their job isn't just to keep the server running, but to act as strict guardians of any changes to the schema.
With DBA's come a strictly defined change control process. Instead of just adding a column to a table, an application developer might find themselves having to justify their case to a DBA, even if the column isn't used by any other application. The DBA might be busy responding to change requests from other teams, leaving the application developer blocked.
Chances are you've seen the shared database solution many times. It's probably the most common way of sharing data between applications (or in monolithic applications that should have been many small applications). What are some of the positive and negative experiences you've had with it?
This is the first in a short series on application integration.
Imagine a small business, with an online e-commerce Web Store. The store lets customers browse and purchase products. It is built using ASP.NET MVC, and the data lives in a SQL Server database. Over the years, the store has gathered information on thousands of customers and their purchasing history.
The marketing team decides they need a small application to help them run marketing campaigns. Since marketing groups are very creative, they call the application Marketing. The ASP.NET developers who built the original site are busy, so they hire a small Ruby on Rails shop to build the application for them.
While the marketing application will have its own domain model, it will need to source customer information from the e-commerce application. In other words, integration is needed.
What are the approaches at your disposal for solving this problem?
So far we've looked at:
At Readify we focus on the Microsoft stack - WPF, Silverlight, and ASP.NET. The bulk of our projects are ASP.NET, some are Silverlight, and even less are WPF (though the WPF projects tend to be bigger in scope, so they're probably about even).
Usually the customer has decided on a technology stack before we arrive, and they've engaged us to help plan/design/build/ship the application. Sometimes it's too late to suggest technology changes, but other times we can influence the technology they decide to use.
When I'm faced with deciding on a technology, here is my workflow.
I'll call out a few points:
- My default choice is always HTML/web
- If there's a very good reason to use WPF, then I will
- Silverlight isn't on here
I'm not just talking about public web applications - I'm also talking about internal, line of business applications.
Here are some examples of where I would choose WPF:
- A point-of-sale terminal
- A data capture application for geologists in remote Western Australian deserts
- An airline flight check-in kiosk
For everything else, the answer is HTML. That might be surprising, since I'm a "WPF guy". The reality is, I recognize that even with technologies like ClickOnce, web applications are dramatically easier to debug and maintain, and have a lower cost of ownership, than desktop applications.
So why isn't Silverlight on the list?
- Most UI's are much easier to build and maintain in HTML
- An ASP.NET->Database app is much easier to build than a Silverlight->WCF->Database app
- ASP.NET MVC creates much more maintainable apps than Silverlight encourages, no matter how many times you shout "MVVM"
- It will work on all mobile devices
Now, there are a few scenarios where you might be able to convince me that Silverlight make sense. But it is never going to be like this:
There is a possibility - high before HTML 5, but increasingly getting smaller - that you might be able to get me to agree to this:
Nick is known for building a popular IOC container. He's also put a lot of thought into the role of IOC containers in projects, and I consider him a major thought leader in the space.
As an example, many IOC containers have supported the ability to resolve an array of dependencies, or to inject a
Func<T>. Nick put a lot more thought into this, defining what he calls the "relationship zoo". The resulting implementation in Autofac meant that we got a lot of extra features, like
Meta<T>, which is very useful. Lesser containers are still catching up.
These kinds of insights into what a container should be are very educational. I knew how to use an IOC container before meeting Nick, but I learnt a lot about why and when by working with him.
If you'd be interested in attending Nick's course, and would like to suggest content, leave a comment on his blog. Even if you're not so interested in IOC, I'd recommend attending his course just to absorb the ideas that come from spending time in a room with the guy. Highly recommended.