Integration: Coupling

"Coupling", in code, is usually described as loose or tight, and is a way of measuring how dependent one piece of code is on another piece. In integration, we talk about coupling between systems. But it's not as simple as loose and tight; there are many different kinds of coupling between systems, and that's the topic of this post.

When choosing an integration approach, coupling can be measured along many dimensions:

  • Semantic
    Application 1 understands a "Customer" as someone with an active, billable subscription. Application 2 understands them as anyone who has filled in the online Register form. Will one application be forced to conform with the other? Or can the two understandings co-exist?
  • Implementation
    Is Application 1 taking a dependency on the implementation details of Application 2? Or do they rely on well-defined contracts?
  • Reliability
    Does Application 1 require Application 2 to be online in order to function? Will performance problems in Application 2 affect Application 1's SLA?
  • Lifecycle
    If Application 1 grows old and is replaced with another system, will Application 2 need to change (either in code, or in configuration)?
  • Source Cardinality
    Application 1 uses information from Application 2. Will Application 1 need to be changed if Application 3 is introduced as a supplementary source of information?

By splitting coupling into some different categories, we can apply them to some of the integration solutions we've explored.

Coupling in integration solutions

For each solution, I gave the dimensions of coupling a rating from 1 to 5 (1 being loosely coupled, 5 being very tightly coupled). I came up with the scores based on the highly scientific approach of using my gut:

A radial plot of different solutions and the coupling they create

A few thoughts came to mind while creating this visualization:

  1. Web services technologies like WCF talk a lot about decoupling, but they are still lead to tight coupling in many ways
  2. I'm probably leaning towards choosing ETL solutions over web services unless my integration requires a lot of application logic
  3. Shared databases are the best way to ensure your applications become hard to maintain
A picture of me

Welcome, my name is Paul Stovell. I live in Brisbane and work on Octopus Deploy, an automated deployment tool for .NET applications.

Prior to founding Octopus Deploy, I worked for an investment bank in London building WPF applications, and before that I worked for Readify, an Australian .NET consulting firm. I also worked on a number of open source projects and was an active user group presenter. I was a Microsoft MVP for WPF from 2006 to 2013.

28 Apr 2011

I disagree with your categorisation of ETL. The ETL should be treated as a separate application that bridges two other applications that do not know about each other at all, so it is almost the same as messaging.

A common scenario is using ETL to move data between a legacy system and the new system while the transition is being made (could be a period of years). At the end of the process the ETL should just be turned off without changes to Application 2.

With messaging as soon as a new requirement comes along, you may need to change both applications as they need to support the sending/receiving of both.

28 Apr 2011

Robert, what exactly do you disagree with? I ranked ETL as being very close to messaging (ETL is blue, and messaging is green, in the graphic above).

Interesting point about legacy migration. I think messaging (if the legacy application supported it) could be just as powerful in this area. Ideally an application would only publish and receive messages it defines. A transformation layer (like the ETL) would map messages from one application to another (one application would never take a dependency on messages created by another application).

28 Apr 2011

Mostly that ETL is the same as Messages in some of the categories, particularly Implementation. A good designed ETL should map between the two systems without either being designed the other. I am advocating the the ETL solution should not be considered as part of either application (you do consider it a part, I think your analysis is pretty spot on).

I'm currently stuck neck deep in ETL, and despite being badly designed and written, it works well. The goal is to keep the two DBs in sync at least daily.

One of the legacy systems we are integrating with support message passing and we use it to feed data to it. It works, but took a lot of hacks due to the design differences between the systems. ETL is used to go the other way as there is no scope to change the older system to provide us with the data we need.

Compared to ETL it felt like a bigger and more daunting task that was harder to test. If the messages has been designed with other systems in mind, it might not have been as bad.

Based on this (limited) experience, I think ETL solutions tend to be quicker to develop. Messages on the other hand shine at almost-realtime sync. Changes are not frequent due to the old schema being frozen.

28 Apr 2011

Mostly that ETL is the same as Messages in some of the categories, particularly Implementation

I see.

I guess the difference I saw there is that with ETL, my application can't suddenly change its database schema (which is definitely an implementation detail), because it would break the ETL process. So even though Application 1 isn't coupled to Application 2, its implementation is coupled to the ETL process. I need to consider how an implementation detail change (modifying my schema) affects another process.

In a messaging solution, the only coupling is to the message contract. There's really no assumption about the implementation. I can change my domain logic, database schema, handler structure, and so on, but as long as I don't change my message contract, the rest of the system shouldn't need to change. In that sense, I think messaging is less coupled than ETL as far as implementation details go. Does that make sense?

Peter
Peter
28 Apr 2011

Great set of posts, Paul!

A lot comes down the types of systems you need to integrate with, the developers in charge of those systems and the organization in which you're working.

While I like the messaging solution, I'm pretty sure there's no way I could do something like that in my current organization. For one thing, some of the systems I have to integrate with are COTS software that are not messaging-friendly. For the others, I don't think the developers have the knowledge or inclination to implement their part of a messaging solution ... and, organizationally, there's no one in an architect-type position to mandate it.

For me, ETL is the best I can do and good enough for my integration needs. :)

What do you think of a hybrid WCF + ETL approach - where the ETL uses a set of contracts (maybe DataServices?) exposed by each application instead of reaching directly into each application's database?

Robert Wagner
Robert Wagner
29 Apr 2011

If you are changing your schema, you most likely will have to change the code that creates/handles your messages so that you can still expose the same message interface. I'm probably splitting hair here :).

You've convinced me :).

Rob

Robert Wagner
Robert Wagner
29 Apr 2011

@Peter, I think it depends on the volume of data and how you determine what has changed.

In our situation we had to transfer all records in A that did not exist or were different in B. That meant comparing every record (in the hundreds-of-thousands).

A code based approach would have been too expensive to run. It ended up being done as SQL scripts, and the performance (even between two different servers) is great.

29 Apr 2011

@Robert, I guess the difference is how the change impacts on teams. With ETL, chances are the team maintaining the ETL package are not members of the application team (since they straddle both applications). So in an ETL solution, the app team can't change their DB without checking with the ETL team first. In a messaging scenario, so long as the message contract didn't change, the app team could change any of their implementation details without synchronizing their schedules with the integration team.