A picture of me

Welcome, my name is Paul Stovell. I live in Brisbane and work on Octopus Deploy, an automated deployment tool for .NET applications.

Prior to founding Octopus Deploy, I worked for an investment bank in London building WPF applications, and before that I worked for Readify, an Australian .NET consulting firm. I also worked on a number of open source projects and was an active user group presenter. I was a Microsoft MVP for WPF from 2006 to 2013.

In moving the Octopus portal to Nancy, I wanted to have some consistency in the way errors are handled. In ASP.NET MVC/WebAPI I had a number of different filters that would handle different error conditions and return them to the user. I also then had a catch-all in Global.asax that tried to render a friendly error page if something really bad happened.

There are two dimensions to the way errors are handled. First, there are different kinds of errors. Some are the user's fault (not authenticated, no permissions to perform an action, 404, validation failure). Others are our fault (internal server error). I've documented the main error status codes Octopus uses in the API documentation.

Secondly, there are different user experiences that will depend on the error type and request.

  • If the client prefers a JSON response, we'll send the status code and a JSON result containing the error details. For 500 exceptions we'll include the exception details; for other errors we'll include a description of what caused the problem and potential solutions.
  • If the client prefers a HTML response, we might redirect to a log in screen for 401 errors. Other errors will show a friendly error page describing the problem.

For example, if the user is using a web browser (Accept: text/html) and they navigate to a page that doesn't exist, they'll get:

A 404 page

While a user of the JSON API (Accept: application/json) will receive:

A 404 JSON response

And the same will happen if the server encounters an exception while processing the request:

Server exception

While API clients see:

Server exception from JSON

400: Bad request and 403: Forbidden errors will be handled in a similar way: JSON responses for API clients, and HTML responses for real users.

401 errors will be handled slightly differently. From the API, we'll return a JSON error indicating that the API key is probably invalid, while HTML clients will receive a 302 redirect to the log in page.

Implementation

I've only been playing with Nancy for a couple of days, so the implementation took some experimentation and could probably be much simpler. First, Nancy has very good documentation (I would say more usable than ASP.NET WebAPI), so this page in the Nancy docs was good reading on the subject:

My strategy starts with a custom JsonResponse called an ErrorResponse (view the full gist). This can be created from an exception or a single message. For example:

Get["/bad"] = c => { return ErrorResponse.FromMessage("Something bad happened"); };
Get["/buggy"] = c => { return ErrorResponse.FromException(new DivideByZeroException()); };

I then customized my Nancy bootstrapper to log and translate any unhandled exceptions into an ErrorResponse:

protected override void RequestStartup(ILifetimeScope requestContainer, IPipelines pipelines, NancyContext context)
{
    pipelines.OnError.AddItemToEndOfPipeline((z, a) =>
    {
        log.Error("Unhandled error on request: " + context.Request.Url + " : " + a.Message, a);
        return ErrorResponse.FromException(a);
    });

    base.RequestStartup(requestContainer, pipelines, context);
}

Now that I am capturing error details and turning them into meaningful responses, I need a solution to render the responses in an appropriate way. For this, I implemented a custom IStatusCodeHandler. My status code handler (full gist) decides whether a HTML response would be more appropriate based on the accept headers, and if so, turns the ErrorResponse into a new response type to render a HTML error page:

The final piece is my custom Response that renders the error HTML page from an embedded resource. I put this together based on some existing code in Octopus, so I'm not currently using a real Nancy view engine to do it. I'll be experimenting with turning this view into a Razor view soon.

The final piece of the puzzle is to ensure IIS doesn't attempt to render its own custom error pages instead of mine. To do this, it's as easy as a web.config entry:

<system.webServer>
  <httpErrors errorMode="Custom" existingResponse="PassThrough" />
  </.../>

To recap, my error handling strategy includes:

  • A custom response type that indicates a meaningful error with details
  • A status code handler that translates any other responses and determines whether to render them in HTML
  • Another custom response type that renders the friendly error page (may not be required)

Can it be improved? I'm sure it can, since as I said, I've only been playing with Nancy for a few days. Leave a comment in the box below!

For the next release of Octopus Deploy we're going to be moving to an API-first model. To support this, much of the Octopus web portal is going to be rebuilt.

While I originally planned to do this using the two headed monster that is ASP.NET and WebAPI, this week I've been experimenting with using Nancy to serve both the web API and web pages, and I have to say, I'm impressed.

One of the most attractive features of Nancy is portability; it can run self-hosted inside our Octopus windows service, or on is own in IIS under ASP.NET for customers who for some reason really want to stick with IIS. The self hosting option is great because it means that, out of the box:

  • We don't need to worry so much about whether ASP.NET is registered with IIS, or whether the right IIS features are installed and enabled
  • We don't need to worry about the wierd changes customers may have made to their IIS configurations, like deleting all the default MIME type mappings

And since Nancy works well for both building API's and web sites, we don't have to duplicate our routing, linking and authentication code the way I did with ASP.NET MVC and WebAPI.

I'm also working hard on the API and documenting it from the start. You can see the beginnings of this in the API documentation repository on GitHub. The goals of the API are:

  • To be friendly and easy to figure out
  • To be hypermedia driven, using links and the occasional URI template (see an example)
  • To support all of the features that the UI requires
  • To have a nice client library for .NET available via NuGet

As I go, I'm also raising a few questions about the API design on the GitHub issues page; if you're interested in HTTP API designs I'd love for you to take a look!

Yesterday I played Game Dev Tycoon, which is a brilliant new game from Greenheart Games, another Queensland startup.

Game Dev Tycoon™ is a business simulation game available for Windows, Mac and Linux as well as on the Windows 8 Store. In Game Dev Tycoon you replay the history of the gaming industry by starting your own video game development company in the 80s. Create best selling games. Research new technologies and invent new game types. Become the leader of the market and gain worldwide fans.

Like my real life small business, the game starts out being run from the garage:

Running the startup from my garage

In real life, Octopus Deploy is growing every month, and it's an exciting time because I'm hoping to be able to have somebody to work full-time on the product with me in the next few months. That's going to be an important milestone for me.

Likewise, in Game Dev Tycoon as soon as you reach a certain level of cash, you reach a milestone and get to move to a new office and start hiring:

Now we've grown up and got an office

I enjoyed playing this game because it gave me some insights into how I run my own small business. In the game, time goes by quickly, so you get a very macro view of running a business. Here are some observations I made while playing the game that I think can apply to my own business:

  1. In the game, marketing can be important, but it's easy to blow time and money on. Getting the product right is the most important thing.
  2. Having a good idea for a game (topic/genre) is useful, but the difference between a hit and a flop is in the way you balance feature selection and resources. Execution is what really matters.
  3. It's easy to get distracted with contracting and other tasks in the game, but it doesn't really pay off. Focus on the product.
  4. At some point you'll need to invest in research and development or risk being obsolete.
  5. You focus all of your hopes and dreams on achieving a goal. When you finally reach it, you feel happy and relieved. Then, you focus everything on achieving the next goal. Satisfaction is only ever temporary.

While I've been able to progress in the game out of the garage and into an office, I haven't been able to avoid bankruptcy in the office yet. Here's hoping I can figure that out in real life :)

The Windows and .NET platforms are a great stack for building and deploying applications when you have control of them. But as an ISV building software that needs to run on other people's systems, right now I feel like they are causing me a lot of pain. To give some recent examples:

  1. A customer was unable to use the IIS-hosted Octopus web portal because of problems serving the JavaScript/CSS files. The reason turned out to be because they had removed all the default MIME types and expected us to document every MIME type they should enable.
  2. Two customers are experiencing problems with the X509 certificates that we use, due to the way Windows deals with the certificates and the permissions around them. We've even build a custom diagnostics tool to try and discover the problems, but it seems to be something specific to the machines.
  3. A customer reported a bug because our software wouldn't run on Windows Server Core. This was because a Windows hotfix was needed to allow portable class libraries to work.
  4. Because .NET 4.5 overwrites parts of .NET 4.0, our software has to work on .NET 4.0 (for customers who can't install 4.5) and 4.5 (for customers that have installed it). But we often get bug reports because a customer has just installed .NET 4.5, and having not restarted yet, Octopus crashes because some code path is trying to now call .NET 4.5 code but is reading .NET 4.0 binaries.

Recently I've been toying with Erlang on Xen, which I think is a fascinating idea: being able to compile an application into a virtual machine image that can be run directly on a hypervisor, with a fit-for-purpose operating system and tiny footprint. If we took this approach, we could ensure that our application was running entirely on a stack that we control and can support properly.

Unfortunately I can see a few stumbling blocks if we were to try and ship a Hyper-V image containing a Windows server with our software installed and configured as the only way to use the software:

  • Customers would probably want to join it to a domain, and a lot of the problems would come back once group policies are applied
  • Customers would probably want to enable automatic updates, so it wouldn't remain "pure" for long
  • Customers in small shops might be unable to dedicate an entire running VM just for Octopus, and would also want to be able to run a build server alongside

Still, it's a wonderful dream. Perhaps this is a sign that Octopus needs to be ported to Java, where I can bundle the runtime and not depend on IIS, HTTP.sys or the abomination that is the Windows certificate store.

This week I needed to investigate and fix some bugs that customers behind proxy servers were experiencing in Octopus Deploy. I didn't have easy access to a proxy server, so I decided to set one up using Squid, an open source web proxy server. I've seen Squid used in many shops before but this is the first time I'd ever configured it.

Getting started

I use Windows 8 day to day, and although Squid appears to work on Windows, I wanted to set this up in a clean environment. So I created a new virtual machine in Hyper-V, and installed Ubuntu Server 12.10. I worked through the installation guide, selecting the keyboard layout, timezone, and so on. When prompted for packages, I only chose to install OpenSSH server.

Installing Squid

I started by installing Squid:

sudo apt-get install squid

This actually installed Squid 3.1.20, so my Squid configuration file was located at /etc/squid3/squid.conf.

Next, I tested whether Squid worked out of the box. I used ifconfig to find out my VM's IP address, then opened that in a browser on port 3128. I was given a page that said Squid at the bottom, so that's a good sign.

Squid

Setting up a password file

Squid has a ton of options for authentication. Since I'm just testing proxy server authentication, I went with a simple NCSA-style username and password configuration. First I installed apache2-utils to get access to htpasswd:

sudo apt-get install apache2-utils

Next I created a file called users in my Squid configuration folder, with a user named paul.

sudo htpasswd -c /etc/squid3/users paul

Using htpasswd to set a password

And I made sure Squid could read that file:

sudo chmod o+r /etc/squid3/users

Configuring Squid to use NCSA authentication module

The different authentication modules are distributed as binaries that come with Squid, and to configure them you have to know where they are located. This command listed their locations:

dpkg -L squid3 | grep ncsa_auth

For me the output was /usr/lib/squid3/ncsa_auth.

To enable the module, I opened the Squid configuration file in vi:

sudo vi /etc/squid3/squid.conf

I searched for the text TAG: auth_param to find where the authentication module is configured. Next I added the following configuration:

auth_param basic program /usr/lib/squid3/ncsa_auth /etc/squid3/users
auth_param basic children 5
auth_param basic realm Paul's Squid!
auth_param basic credentialsttl 2 hours
auth_param basic casesensitive off

Next, I needed to add the ACL to give the users access. I searched for TAG: acl in the Squid configuration file and added this ACL to the list:

acl ncsa_users proxy_auth REQUIRED

Then I searched for TAG: http_access to find where HTTP access rules are configured. Scrolling down, there's a section where you can insert your own rules. I added:

http_access allow ncsa_users

Restart Squid

Finally, I restarted Squid:

sudo service squid3 restart

And bam! After configuring the proxy settings, I was prompted for proxy credentials:

Prompted for proxy credentials

I could have just used Fiddler

Not long after this, I discovered that Fiddler (which acts as a proxy) can require authentication. It's as simple as checking Rules -> Require Proxy Authentication. D'oh!

Sources

The following guides were very useful in getting this working. The main differences I found was that when I installed Squid, I got Squid 3.1.20, while the guides appear to use an older version.

Although Octopus Deploy is an Australian business, most of our customers aren't. In fact Australian customers currently make up less than 10% of our sales. The United States is definitely the elephant in the room when it comes to software sales, an experience which I am sure is common throughout the software industry. In this post, I thought I'd share my experiences operating a small Australian business and dealing with foreign sales and the US dollar.

Octopus Deploy sales by country

Background

As I mentioned in a previous post on selling software, our software is sold online through FastSpring, a US-based reseller. Customers follow a purchase link on our website, which takes them to a page that is hosted by FastSpring but styled to look like our site. There, they fill in their order details and make a payment, usually by credit card (though also through purchase orders and other payment methods). FastSpring figure out what taxes apply and also handle the fraud side. All up, FastSpring take about 6% of the sale for their trouble, and we take the rest.

Pricing

We fix our prices in US dollars on our website. I think this is important for four reasons:

  1. Most customers are from the US, so it makes it easy for them
  2. Everyone else usually knows the value of their own currency in USD
  3. Our competitors list their prices in USD, so it makes comparing easier (we don't compete on price, but it enables people to put us into the right "affordable/expensive" bucket in their heads)
  4. My US-layout keyboard doesn't have a £ or € keys, so it's more convenient to use $ (OK, maybe that's not an important reason)

When people visit the order page, FastSpring displays the prices in the user's local currency based on our USD price. We also set a fixed price in 5 other currencies (GBP, EUR, AUD, NZD, and CAD) not because we want to, but because it's required for FastSpring to accept bank transfers in those currencies as a payment method. I update those prices usually once a month or so and I try to ensure customers in non-USD aren't punished for it.

Getting paid in USD

FastSpring (and I'm sure other resellers) can pay us in multiple ways. We could choose to be paid in Australian dollars, but since the totals are calculated in US dollars, we'd be receiving an exchange rate at potentially a pretty bad time. Instead, I'd rather receive the funds in USD, and convert them when the rate is more favorable. To do this, we set up a USD denominated bank account with the Commonwealth Bank. Once a month, FastSpring sends a wire transfer to that account. They also charge us $15 to send that transfer (since their bank charges them $35).

Paying bills in USD

Although our bank allows us to hold funds in USD, we can't spend them easily - wire transfers are the only way to get funds out of that account. This is unfortunate because many of our bills are in US dollars, including Amazon EC2 and Windows Azure bills, along with much of the software we buy including JetBrains products, RavenDB licenses, and so on.

Previously I've been converting the US dollars to Australian dollars, and then paying the bills online through a Australian Mastercard which then applies another exchange rate to pay in USD. Of course the banks sting you both ways with punishing exchange rates, taking about 8% all up in the process.

However, I think I've found a better alternative. FastSpring can make deposits to a Payoneer debit card, which is USD denominated. I'll be trying this over the next few weeks and I'm hoping it will allow me to avoid the USD->AUD->USD round trip when paying our expenses.

Paying out in AUD

While the business is pretty virtual and our expenses are mostly online, I'm a real person and I need to pay for groceries in Australian dollars. So, at some point I need to convert the USD to Australian dollars as dividend payments.

This is unfortunate because the Australian dollar for the last year has been trading at record highs against the USD for some time (chart courtesy Yahoo):

AUD vs USD, 5 year chart

(Speak to anyone in Australia this week of course and they'll blame the high Australian dollar for their woes. The local General Motors boss seemed to blame it as the cause for slashing 500 jobs, and even Australia's central bank can't make a profit because of the dollar. Our treasurer and shadow treasurer both seem to agree that the high AUD is a big problem. But, as it typical of Australian politics, aside from Bob Katter no one wants to actually do anything about it.)

Initially, I was using the Commonwealth Bank to convert the USD to AUD when I needed them. But they seem to take about three cents on top of the exchange rate in both directions, so they're taking a pretty steep cut. I've found much better rates through OzForex; for example, at the time of writing the CBA will give AUD $0.9234 for every USD $1, compared to OzForex giving AUD $0.9603.

GST

GST is a Australia's version of sales tax or VAT. Normally an Australian business would charge an additional 10% when selling to another Australian business or consumer, which goes to the Australian tax office every quarter. Like most sales taxes, it's designed so that the taxes you pay to other Australian businesses can be claimed as an offset, so that all you ever pay is 10%.

My business is interesting though, because we sell through a US-based reseller who is the merchant of record for the transaction. I can't charge them GST, and they can't charge Australian customers GST. But some of my expenses are local, which I do pay GST on. So every quarter, I file a business activity statement with a negative return to claim that back.

Summary and things I'd love to improve

I feel that running a small Australian-based business with customers overseas is hard. Our product is priced in USD, which means customers don't stop buying when the Australian dollar goes up (unlike other exporters who price in AUD). However we do have some costs in AUD, so the exchange rate can hurt. Finding ways to pay our overseas expenses without any currency conversion will hopefully help.

I think financial services in Australia could definitely do more to help small businesses and consumers deal with foreign currency. Foreign currency accounts shouldn't have so many fees (between account fees and transaction processing fees I'm paying about $70/month to the bank) and there should be ways to link them to debit cards so they can be used to pay for the kinds of expenses a small business has.

When I first started selling Octopus Deploy licenses, the initial orders came directly from small-medium companies, and most were paid using a credit card. Now, approximately a fifth of our orders are coming as purchase orders, and many of those now come from resellers buying software on behalf of corporate clients.

I didn't have any experience dealing with this before going into business for myself, so I've been making it up as I go along. It was all new to me, so perhaps some of it will be new to you, too. In this post, I'll share my current strategy for dealing with purchase orders and resellers. I'd also love your thoughts on how it could be improved.

Background

Octopus Deploy Pty. Ltd. is an Australian company. Licenses for our software are sold online through FastSpring, a Californian company. Approximately half or our sales are to US customers, then the UK, then Australia, then other countries.

Quotes

Usually the first step in the process is that an email comes to our sales email address, asking for a quote. Someone has heard of the product, tried it, decided to buy it, but before they can begin the process of purchasing a license, they need the price written on a quote that they can record in their system.

Our pricing model is simple and on the website, so there's usually no surprise in the price. We do provide discounts to some resellers (which I'll go into shortly) but overall there's nothing complex here.

What does the quote look like? We'll, it's just an invoice, except it says quote at the top instead. Here's an example.

Quote PDF

I use Xero for accounting (a service I can't recommend enough, it truly makes bookkeeping fun), and initially I also used Xero to create the quotes. But each time I needed to set the customer up as a contact, then create the quote, and so on. Now, I just use a Word document template, and save it as a PDF.

Each quote has a unique quote number at the top. This number is generated by a very sophisticated algorithm:

  1. I type "ODQ", which is short for "Octopus Deploy Quote"
  2. Then I mash the numbers on my keyboard
  3. There's no step 3

I don't currently keep track of the quote numbers, since they don't really matter from my perspective.

W9 forms

The IRS require (some? all?) American companies to record information about companies they purchase software from, which is usually supplied as a W9 form. Customers will occasionally email me to fill in such a form before they can place an order.

The Form W-9, Request for Taxpayer Identification Number and Certification, serves two purposes. First, it is used by third parties to collect identifying information to help file information returns with the IRS. It requests the name, address, and taxpayer identification information of a taxpayer (in the form of a Social Security Number or Employer Identification Number). The form is never actually sent to the IRS, but is maintained by the person who files the information return for verification purposes.

Since the company is actually buying the software from our reseller (FastSpring), it's FastSpring's W9 form that they need. Sometimes I've been asked to provide a W8 form (which is used when purchasing from a non-US company), but this is because the customer assumes they are buying from us directly.

Purchase orders

Once the customer has a quote, they'll create a purchase order in their system. This usually gets emailed to us as a PDF. I manually enter those order details into FastSpring, and the customer gets an email to let them know the order is ready to be paid. I usually also save the invoice as a PDF and email it to the customer.

The invoice is created by FastSpring and usually looks like this:

Invoice example

Finally, while FastSpring do provide an option to enter purchase orders online (so there's no need for me to be involved) I find it's usually easier to just ask customers to email the purchase order to me, especially where resellers are involved.

Deliver before or after payment

Normally, when an order is placed, we don't send the license key until payment has actually been received. Most people pay by credit card, so their license key is generated and delivered within a few minutes of ordering. But customers using a purchase order normally expect to be able to wait 30 days or so before making payment, and they like to pay using check/money order.

When a purchase order arrived, I used to generate a 45-day trial license key to send to the customer manually. That way they could use the product in full, and by the time the trial expired, they would have their real license key because the order should have been paid.

This actually caused a lot of problems for large customers, because the need to record the fact that they're using a trial license of software for production deployments in their configuration management system. And if resellers are involved (more on that later), there might be confusion as to the terms of payment, so the payment process can drag on.

Then I found out that FastSpring provide an option to deliver the license key before payment is received:

Purchase order delivery options in FastSpring

So now, my process is to accept the purchase order and deliver the license key right away, trusting that customers big enough to use a purchase order will eventually pay. I haven't had any problems with this so far, and the customer is usually happier as a result.

Resellers

A reseller is usually engaged to purchase software on behalf of a customer.

In an ideal world, resellers know about your product and are out there advertising it and promoting it to their customers. When they make the sale, they keep the difference between what you sell it to them for, and what they sell it to their customer for. Usually you might give them a discount to do this.

In reality, all of the resellers I've had have gotten in touch this way:

  • Bob is a developer lead, he learns how wonderful Octopus is for ASP.NET deployment, uses it, and wants his company to buy a license
  • The organization where Bob works has a policy that they only buy software through approved resellers
  • Bob asks his reseller to buy the license for him
  • Bob's reseller contacts me to ask for reseller pricing

Why does Bob's company have a reseller? It seems to be down to accounting: Bob's company would prefer to pay a handful of invoices for software purchases a month rather than hundreds (and I can't fault that). Resellers probably provide value to Bob's company beyond that when it comes to bulk purchases of more mainstream software, but for a small business like ours, that seems to be the extent of it.

When asked to provide reseller pricing, I point them to this page on partners and bulk licenses, which has a table of discounts:

Licenses Discount
1 licenseNo discount
2-9 licenses10%
10-14 licenses15%
15+ licenses20%

We do offer discounts to anyone who buys multiple licenses. But for one-off licenses (even from resellers), we don't provide any discount. As explained in this Business of Software thread:

When a reseller is contacting you it is because a customer has requested your product. They are buying from you no matter what and there is no reason to offer a discount. This is particularly true for the larger resellers: they are not promoting your product, they just handle the order for the client.

So far we haven't had anyone decide not to go through with an order through a reseller because they didn't get discounted pricing. I don't know if the reseller is charging anything on top or not (making it more expensive to use the reseller than to just buy direct), but I suspect that for small purchases like us they probably don't.

TL;DR

I've learned that quotes, W9 forms, purchase orders and resellers are all part of the business of selling software. They're all pretty simple at the end of the day, though they can be somewhat time consuming to deal with. As my business grows I'll no doubt need to find more streamlined ways to handle them, but for now my current process seems to be working OK.

Do you have a suggestion on how the above could be improved, or a tip to share? Leave a comment in the box below.

A question on StackExchange is "What's the best platform for blogging about coding?", to which the accepted answer is, of course: make your own.

This blog was originally written by me, then rewritten, and that rewrite went on to become FunnelWeb. This weekend I rewrote it again, for the third time, using:

One of the concepts I was going for in this rewrite was to make the home page more of a profile and less of a plain old list of posts. I also wanted to make it more likely that I would blog something, as opposed to just Google+'ing it or Tweeting.

So for example, while the home page looks like this for you:

Home page

I see a box to compose posts quickly:

Composing

Is it better than Wordpress/Blogger/all of the other services out there? Probably not. But hopefully you'll see me blogging more frequently.

A few weeks ago I received the following bug report for Octopus from a customer:

We have recently rolled out Octopus deploy [..] and we are having severe memory leak problems on the tentacles. After only 1 deploy one of the tentacles had used 1.2GB ram (Yes gigabytes). The package was quite large - circa - 180MB - but there must be a serious leak somewhere. Other projects are having similar issues - up-to about 600 MB memory usage.

I put together a test harness and created a NuGet package containing a pair of 90mb files. The harness simply used the NuGet.Core library's PackageManager class to install the package to a local folder. 55 seconds later, and having used 1.17GB of memory (as measured by GC.GetTotalMemory(false)), NuGet had finished extracting the package.

The good news is that given time, the memory usage reduced to normal, so the GC was able to free the memory (though much of it stayed allocated to the process just in case). The memory wasn't being leaked, it was just being wasted.

Wierdly, the NuGet API is designed around streams. And the System.IO.Packaging classes which NuGet depends on is also designed around streams. Looking into the implementation, the problem seemed to be down to NuGet.Core's ZipPackage class.

When you ask a NuGet.Core ZipPackage to list the files in its packages (GetFiles()) the implementation looks like this:

private List<IPackageFile> GetFilesNoCache()
{
    using (Stream stream = _streamFactory())
    {
        Package package = Package.Open(stream);

        return (from part in package.GetParts()
                where IsPackageFile(part)
                select (IPackageFile)new ZipPackageFile(part)).ToList();
    }
}

The ZipPackageFile constructor is being passed a PackagePart, which exposes a stream. But what happens with that stream?

public ZipPackageFile(IPackageFile file)
    : this(file.Path, file.GetStream().ToStreamFactory())
{
}

The ToStreamFactory call looks innocuous, but here's the implementation:

public static Func<Stream> ToStreamFactory(this Stream stream)
{
    byte[] buffer;

    using (var ms = new MemoryStream())
    {
        try
        {
            stream.CopyTo(ms);
            buffer = ms.ToArray();
        }
        finally 
        {
            stream.Close();
        }
    }

    return () => new MemoryStream(buffer);
}

That's right - it's reading the entire stream into an array, and then returning a new MemoryStream populated by the array anytime someone requests the contents of the file.

The reason for this appears to be that while System.IO.Packaging is designed to use streams which need to be disposed, the NuGet API and classes like ZipPackage are intended to be passed around without needing to be disposed. So instead of opening/closing the .nupkg file to read file contents when required, it copies it to memory.

This isn't a problem when your packages are less than a few MB, but it's pretty harmful when you're distributing applications.

After spending half a day trying to patch NuGet.Core to avoid reading the files into memory, in hopes of sending a pull request, I found that other people had also tried and been rejected - it seems like this is a problem the NuGet team plan to solve in an upcoming release.

Instead, I gave up and decided to write a package extraction function to suit my needs. This gist extracts the same file in 10 seconds using only 6mb of memory:

public class LightweightPackageInstaller
{
    private static readonly string[] ExcludePaths = new[] { "_rels", "package\\services\\metadata" };

    public void Install(string packageFile, string directory)
    {
        using (var package = Package.Open(packageFile, FileMode.Open, FileAccess.Read, FileShare.Read))
        {
            var files = (from part in package.GetParts()
                         where IsPackageFile(part)
                         select part).ToList();

            foreach (var part in files)
            {
                Console.WriteLine(" " + part.Uri);

                var path = UriUtility.GetPath(part.Uri);
                path = Path.Combine(directory, path);

                var parent = Path.GetDirectoryName(path);
                if (parent != null && !Directory.Exists(parent))
                {
                    Directory.CreateDirectory(parent);
                }

                using (var fileStream = new FileStream(path, FileMode.Create, FileAccess.Write, FileShare.Read))
                {
                    using (var stream = part.GetStream())
                    {
                        stream.CopyTo(fileStream);
                        fileStream.Flush();
                        fileStream.Dispose();
                    }
                }
            }
        }
    }

    internal static bool IsPackageFile(PackagePart part)
    {
        var path = UriUtility.GetPath(part.Uri);
        return !ExcludePaths.Any(p => path.StartsWith(p, StringComparison.OrdinalIgnoreCase)) &&
               !PackageUtility.IsManifest(path);
    }
}

There are other parts of NuGet.Core that break when large files are used. For example, some NuGet repositories use SHA hashes so that consumers can verify a package download. This is implemented in NuGet.Core's CryptoHashProvider. The method signature?

public interface IHashProvider
{
    byte[] CalculateHash(Stream stream);

    byte[] CalculateHash(byte[] data);

    bool VerifyHash(byte[] data, byte[] hash);
}

Again, instead of passing the stream (even though the underlying crypto classes we use accept streams), it will just read the entire file (180mb in this case) into an array just to hash it.

Here's hoping these problems are fixed soon. For now, at least, Octopus has a workaround that's only a few lines of code and performs much faster. As of Octopus 1.3.5, you'll be able to install large packages without dedicating most of your memory to it.

I'm working from home now, so staying productive is something I'm thinking about a lot. Because I am generous, I shall share my top five productivity improvement tips.

  1. Subscribe to every productivity blog you can find, and spend at least three hours a day reading top-X blog posts on improving productivity. Since the best tips come first, a lower value for X is better. Don't waste your time reading anything that can't be summarized in at most ten bullet points.
  2. Commit yourself to reading at least three books a week on productivity. Look for large books that only really have one new central idea, which is usually made clear in the title, work best. Only buy books from well-known authors that make a living from running conferences on productivity improvement.
  3. Start every day by committing two hours to making a list of goals for the day. Start with the biggest and hardest items, to increase the chances of never having a 'win' in a single day.
  4. End every day by taking an hour to berate yourself for not achieving anything on your list. Remember: even if you achieved 90% of your goals for the day, you are a failure. The guilt will make you work harder tomorrow (you loser).
  5. Leave comments on other people's posts about productivity, to explain to them why they are wrong. Tell them how some simplistic system you learnt from a book is working wonders for you and they should give it a try. Ensure each comment contains at least 500 words. After all, you're so productive you have time to spare.

Gratuitous Dilbert comic

I hope this post has helped :)