Automated Coder

Exploring the Code of CruiseControl.Net

Archive for September, 2009

Cutting Down the Memory

Posted by Craig Sutherland on 30 September, 2009

The Problem

This is a continuation on my earlier post on memory issues (read it here). In that post I made some minor changes to reduce memory usage (mainly around string handling). However those changes didn’t fix the underlying problem – CruiseControl.NET is working with large strings in memory.

Now normally the OS and .NET will allow us to play around with large objects in memory without too many issues. Normally…

Sometimes it is possible to use up all the available memory (including swap files, etc.) Why? Because we put too many big strings into memory – and that’s exactly what CruiseControl.NET does. And to make things worse, CruiseControl.NET likes to compound things by the way it works.

So, how does CruiseControl.NET work?

First off, we have the server. This is basically a polling application – every five seconds it checks if a build can start and if so, triggers the build (actually it’s a little bit worse than that, each project has it’s own thread which polls every five seconds, but that’s another matter altogether.) As the build runs it builds up a set of task results, which then get merged together into one big log file:

image

So, it is possible to have multiple projects, each having multiple large task results in memory. Then, when these get merged into the build log, that’s even more memory used. But wait, there’s more!

As well as the project threads, the server is also responsible for serving results to anyone who requests one of these build logs. This is done by loading the log into memory and then returning the entire log (via .NET Remoting) to the client:

image

So, as well as having multiple task results and multiple build logs in memory, it is also possible to have multiple instances of the build logs – one per request handler (although in theory these would be quick requests and shouldn’t hold onto memory.) And yes, there’s still more!

The CruiseControl.NET server only allows .NET Remoting clients – which isn’t very helpful if you need to go through a firewall. Plus, people need some way of seeing the results from the tasks – CCTray doesn’t display them. So to handle these situations there is the Web Dashboard. This functions as a client application and does all sorts of things – including manipulating build logs in memory (normally an XSL-T transform):

image

And where does the Web Dashboard normally sit? On the same machine as the server (unless you’ve changed the default install and installed the dashboard on a separate machine).

So, this gives us a picture of having large strings in memory in several places. The project threads might be okay by themselves, but when you add the communications side, it is very easy to start consuming large amounts of memory with big build logs. Especially when there are a sizeable number of people trying to get results from a build (and what’s the first thing people do when they see a broken build – they go to the server to see what’s broken!)

Pruning Some Memory

Now that we’ve quickly reviewed the possible places where build logs are stored, what can we do to reduce the amount of memory?

Two initial approaches come to mind:

  1. Reduce the size of the log file
  2. Reduce the number of times each log file is loaded into memory

Option #1 can be done either in CruiseControl.NET or externally. First, we can split the mega-log into smaller logs, second we could write less data, third we could compress the data.

I’ve been looking at what I can do to split the mega-log file into build result specific log files, so a client could just grab the data they need (e.g. just the NAnt/NUnit/MSBuild/etc. results) instead of everything at once. Unfortunately this is a MASSIVE change, that will need to modify both the server and the clients in a large number of places. So for the moment I have spun off a “play” branch in the Subversion repository where I will attempt to do this (oh, and resolve all the associated problems with having multiple result files.) So this is a no-go for the current release.

Next, writing less data – well CruiseControl.NET doesn’t put that much CruiseControl.NET-related data into the log files. Most of the data comes from the external tools. So if this needs to be done, then people need to reduce the amount of data the tools generate – hence this becomes an external task.

So, that leaves compressing the data. Personally I like this approach as it would also reduce the amount of network traffic. Unfortunately there are a couple of gottas – there is no guarantee that compression would actually save space and it requires changes to the clients to decompress the data. But I still think the idea has merit.

Returning back to the two initial approaches, the other approach is to reduce the number of times each log is loaded. This is the approach I want to investigate some more.

Log Viewing Usage Patterns

From my experience with CruiseControl.NET, there are two general usage patterns:

  1. THE BUILD HAS BROKEN – what’s wrong?
  2. What happened to an older build?

And typically the second pattern also happens a lot as part of pattern #1 when people try to find out what has changed to cause the build to break. Very rarely do people go and look at a historical build.

The following picture sums up this pattern, with the size of the arrow indicating how much usage a build log would get:

image

So, most people only view the logs when something is broken and they primarily focus on the log of the build that actually failed. A smaller set of people would also check the previous successful build to see if there is anything pertinent to the failure. Finally a very small set of people might go through older logs to see what has happened (your manager is checking up on the amount of work you’ve been doing, etc.)

This sounds like a very good scenario for caching. Since there is a (potentially) small amount of data, this could be cached on the server, the dashboard or even both.

Bring on the Cache(s)

On the server side, we could just cache the log files – although we wouldn’t want to cache too many big ones.

The dashboard offers us a few different possibilities for caching. First, we could take the same approach as for the server and cache the the log files. Second, we could cache the parsed XML documents (since the entire document must be loaded and parsed before it can be transformed). Finally, we could cache the transformed output.

However, like anything in the dashboard, the way the build logs are transformed is not a simple process. Instead we have a number of interfaces and their implementations to go through. The build report generation is a plug-in, which generates multiple actions. Each action can have one or more style sheets, which actually get loaded and processed by a class in the Core library.

To Be Continued…

I’ve gone back and reviewed the memory usage scenarios for CruiseControl.NET with a view of reducing the usage on the server. This took a step back and viewed the wider picture, with a view of all the items that CruiseControl.NET is doing on a (server) machine. Unlike my previous post, this takes into account multiple users doing similar things at the same time.

I’ve come up with a couple of ideas for reducing memory usage – caching and compression – and both of them have some gottas that need to be investigated.

So rather than making this an even longer post, I’m going to stop at this point and look into caching and compression in some future posts.

So stay tuned, more fun to come :-)

Posted in CruiseControl.Net | Tagged: , | 5 Comments »

An Experimental Branch for CC.NET

Posted by Craig Sutherland on 28 September, 2009

Recently I’ve been exploring a few options for refactoring CruiseControl.NET to handle the out of memory exceptions that we have been getting. Unfortunately a lot of the paths I’ve investigated have turned out to be red-herrings – they either have wide-reaching repercussions or they involve significant changes to the server. I’ve even trying profiling the application using dotTrace, without finding any obvious areas for improvement.

So, it’s time to get a little more drastic. Since we are aiming to release the 1.5 version sometime this year (hopefully), I have started a new “experimental” branch. In this branch I’m planning on looking at some more drastic refactoring to resolve the memory issues, plus a couple of other ideas I want to try out for a “CruiseControl.NET 2.0”.

And in case you are wondering, here are some of the ideas I want to try out:

  • Distributed builds: being able to take a single project and distribute it over multiple machines
  • Build agents/build distributions: extend CruiseControl.NET so there can be master/slave instances, with the master being responsible for distributing build requests across multiple machines
  • Data storage layer: move all of the file I/O into a common layer – this is in preparation for adding database persistence

Now, these are just ideas at the moment – with my current amount of free time it’ll be a while before any of these see fruition. But I’ll be doing my standard document things as I go, so if you are wondering why I am writing about these things, this is why.

Posted in CruiseControl.Net | Tagged: , | Leave a Comment »

FastForward.NET: Beta 4 Release

Posted by Craig Sutherland on 19 September, 2009

It’s been a while, I’ve been side tracked with trying to track some performance issues with CC.NET. I have just posted the binaries for the fourth beta – although most of the changes have been around for a few weeks. The following items have been added/fixed:

  • Swapped Ok and Cancel buttons
  • Cleaned up the settings dialog so the name of the tab is not on each button within the tab
  • Added a double-click action to the all projects grid – the user can choose which action to perform
  • Added servers and projects to the system tray – clicking on a project triggers the double click action

I am running this as my CC.NET monitor and it seems to be working. I know there are a couple of issues with configuration that need to be resolved, otherwise it is ready for release :)

The binaries can be downloaded from https://www.ohloh.net/p/FastForwardNET/download.

Posted in FastForward.NET | Tagged: | 3 Comments »

FastForward.NET Relocated

Posted by Craig Sutherland on 18 September, 2009

I’ve been having a number of issues with SourceForge – mainly with its performance. And I’m not the only one – one of the other developers for CruiseControl.NET has been so frustrated that he has set up his own Subversion server, together with a Trac instance.  Additionally, he has been kind enough to let me host FastForward.NET.

So, everything has been moved from SourceForge. I will keep an eye on SourceForge for any issues that people may raise, but I won’t be updating it at all.

So, here are the links for FastForward.NET:

Many thanks to Daniel Nauck for letting me use his servers.

Posted in FastForward.NET | Tagged: | 2 Comments »

Memory Issues with CruiseControl.NET

Posted by Craig Sutherland on 16 September, 2009

The Problem Defined

I’ve been spending a bit of time trying to resolve a number of outstanding issues in JIRA about running out of memory with CruiseControl.NET:

These are all different examples of getting an OutOfMemoryException when performing a build. Basically a task generates a large output, which CC.NET then attempts to merge into the standard build result. Unfortunately some of these issues have been around a long time (CCNET-819 was first raised in Jan 2007!) which implies that this is a fairly deep rooted issue.

This post contains what I’ve found out so far, and some of my changes to try and reduce the memory usage. Unfortunately, I say “try” as this is both a hard problem to replicate and a hard problem to resolve!

Some Investigation

Looking at the stack traces, the basic issue is with strings. CC.NET is loading the entire build log into memory and manipulating it. Even worse, it can be getting various parts of the results and manipulating them, before writing them into the build log.

When a task executes, it can generate multiple ITaskResult instances. These instances have a Data property which is a string. As you can imagine, this lead to the data being loaded into the various implementations and sticking around until the build has finished. So, if a task generates a 20Mb output, this is added to memory and held for the entire remaining duration of the build. Actually, it’s probably held even longer, as the memory will not be released until a garbage collection is performed.

But, don’t worry, things are even worse! As ITaskResult is an interface, there are a number of different implementations. The implementations I have found are:

  • DataTaskResult
  • FileTaskResult
  • ProcessTaskResult

DataTaskResult is the simplest of the three – it just provides a backing field for the property that contains the string. When this class is initialised the string is loaded and held there until the result is cleaned up. While this is the default result generated, from what I can see in the code it is not actually used (except in the null task.)

FileTaskResult is a view onto a file. Typically a task will generate a file (e.g. when an external application is called) and then this result type is generated to reference the file. Now for the bad news – when this result is generated it opens the file and loads the entire file into memory! So if the task generates a huge file (e.g. NCover results for a large code base, etc.) the file is loaded into memory and hangs around like a bad smell :-(

The final result, ProcessTaskResult, is the most complex of the three. It is also the cause of one of the issues. When an external task is executed it normally uses the ProcessExecutor class. The output of this class is a ProcessResult, which contains all the output written to StandardOutput and StandardError. And yes, these are both stored as strings. ProcessTaskResult manipulates these strings to generate the final output, and that’s where the problems start coming in.

Some Background

If you are wondering why I am picking on strings at this point here is some background. In .NET strings are immutable. Once a string has been allocated it cannot be changed. But what about the string manipulation functionality? Unlike C++, these functions do not modify the string, instead they generate a new instance of the string (with the modifications of course), which is then another immutable string in memory.

So for example, if we had the string “This is a test   ” and we wanted to remove the extra whitespace we would call string.Trim(). At this point we now have two strings in memory: “This is a test   ” and “This is a test”. Even if we assigned the new string to the old string variable, this is still the case.

So, when is the old string removed from memory? When garbage collection occurs. So on a heavily loaded machine where garbage collection is running slowly, these strings can hang around for a while.

Of course, for my short example, this isn’t really a problem – most machines could hold millions of strings like these without any problems. But, imagine if the string is 20Mb in length (this isn’t too out of the ordinary for some processes). All of a sudden the memory will be chewed up very quickly (especially as the OS likes to take a fair chunk).

At this point, I should mention most OSs now-a-days can use disk swapping to extend the amount of free memory. However problems occur if the OS is unable to find a large enough continuous free space to allocate – this is typically what is causing the out-of-memory errors. RAM has been filled and the OS is unable to swap out some memory.

So, short of planning around with garbage collection (which I have no intention of even attempting), our best approach is to reduce the amount of strings we are generating.

Starting Small

As I mentioned earlier, FileTaskResult loads the entire file into memory when the class is initialised. The first change is load the file only when it is needed, and not store a reference to the string. Garbage collection works by detecting whether an object has been orphaned – that is whether there are any references to the object. If there are not references, then garbage collection will remove it (at least that is my understanding).

So now file results will only be loaded when they are needed, and then disposed of as soon as possible (again depending on garbage collection). This gives us a little more to manoeuvre. But, it’s only the tip of the ice berg.

Strings upon Strings upon Strings

Looking at the way ProcessTaskResult works, and how the instances are generated, there is a lot of strings being generated.

As an example, in ExecutableTask it needs to check if there is any output from the executable. This output can be from either standard out or standard error. The literal line is:

if (!StringUtil.IsWhitespace(processResult.StandardOutput + processResult.StandardError))

This combines the two outputs together to generate a new string – hence twice the memory allocation. So, this can be changed to:

if (!StringUtil.IsWhitespace(processResult.StandardOutput) || !StringUtil.IsWhitespace(processResult.StandardError))

which means the original strings are used instead of generating a new string.

Next, digging a little deeper, this is how StringUtil.IsWhitespace() works:

return value.Trim() == 0;

As you can see, this is generating another string! If the string is whitespace, then there is no overhead – it will just generate an empty string, no matter how large the original was. But, if the original had 20Mb of non-whitespace, there is now a second 20Mb string generated!

I’ve replaced this approach with a slightly more complex approach. First I check if the string is null or empty – if it is then the string is considered whitespace. If it is not null or empty, then the new routine iterates through every character in the string and checks if the character is a whitespace character (using char.IsWhiteSpace()). if the character is not whitespace, the loop is exited and false is returned. Otherwise, it will continue through the entire string and return true if no whitespace characters are encountered.

I’m not sure on the performance loss with this change, but I imagine the string data type will be doing something similar with its Trim() method, so it shouldn’t be too bad. Plus this has the advantage of not generating a new string – hence lower memory usage.

There is a third change that I am thinking about, but I haven’t done yet. When the ProcessTaskResult is generated, the caller often calls StringUtil.MakeBuildResult(). This converts the original newline delimited string into an XML structure with an element per line. However I’m not sure exactly whether this will provide any improvements, so I’ve left it for the moment.

Baby Steps

These are my first few baby steps to reducing memory usage in CC.NET. At the moment I’ve just been looking at the server. My initial steps have been to reduce the memory usage by reducing the number of strings generated and held in memory. While I’m hoping this will reduce memory usage, I don’t think it will have too much of an impact (big sigh).

The real problem, and the massive challenge, is to remove the strings from memory as much as possible.  I have tried looking into using streams, but this will be a massive change :-(

Second, this is only looking at the server side. Two of the issues are with the dashboard – which is a whole different area to look into.

Anyway, baby steps, I’ll continue looking into what can be done to improve the memory usage.

Posted in CruiseControl.Net | Tagged: | 6 Comments »