Yes, It’s That Time Again!
Once again we are coming up to our annual trip to China to see the in-laws. This means I will be disconnected for the next three weeks, but I will have plenty of time on my hands. Last year I worked on a replacement for CCTray (Project Leo) which later became FastForward.NET. This year my plans are just as grand
Currently I am working on converting CruiseControl.NET to use streams instead of strings. Given that this is a big task, with lots of complexity, it’s been slow going. I’m aiming to finish this in my time away – plus look into some other possibilities for the concepts (like distributed builds).
Second, I’m looking at building a Silverlight front-end for CruiseControl.NET. I’ve been playing around with Silverlight at work, so this opens a few possibilities. This will be a standard plug-in for the dashboard, so people can choose to use it or not if they desire. The only downside is it will require .NET 3.5 and Silverlight 3 – so I’m not sure how well this will work on *nix environments.
So, you won’t hear much from me for a wee while, but I hope to have some exciting stuff soon!! Plus my Cantonese should be better by the time I return
Constant Improvement
As well as the core engine for CruiseControl.NET, there are a number of tasks that call external applications. These include tasks that perform:
- Code analysis
- Unit tests
- Run build scripts
And so on. It is very easy to integrate an external application with CruiseControl.NET – all it needs is a command-line interface.
As part of my job I am constantly looking for new tools and utilities that can improve the CI process at my work. When I see new tools or utilities I’ll give them a try. And slowly I build up a list of tools that I think would work nicely with CruiseControl.NET.
One of the acronyms that gets mentioned a lot now-a-days is DRY – Don’t Repeat Yourself. Of course, in theory this is a simple principle, but when the code base starts expanding, it becomes harder and harder to see duplication. To get around this, there are some tools that can be run to detect duplication.
For a long time I haven’t bothered with this analysis as most tools are commercial (and therefore cost money that I don’t have), but not that long ago somebody mentioned DuplicateFinder. This is a little command-line tool that will scan a codebase and report any duplicates.
So I decided to take a break from reducing memory and add in this tool as a new task.
Some Initial Decisions
DuplicateFinder consists of three files:
- The library that performs the analysis
- A command-line application
- A MSBuild task
So the first decision is which should I use? For pure speed, the library is the best approach, but this would require adding a new library the project, one that would only be used for a single task! And the MSBuild task is only useful within a MSBuild script.
So in the end I decided to call the application – even though this will be slightly slower and require an external dependency. But it means if people don’t want this analysis, they don’t have to have the binaries.
DuplicateFinder has three output options:
- Console – text
- Console – XML
- XML file
So, the next decision is which output to add, and in the end I choose console – XML for two reasons. First, CC.NET uses XML for its reporting, so this kept it nice and easy. And writing to the console means there is not an extra file floating around that needs to be cleaned up later.
After these decisions, it was easy to add the new task.
The New Task
The new task is called dupfinder. It has the following parameters:
| Name | Description | Type | Required | Default |
| executable | The path to the dupfinder executable. | String | No | dupfinder in the working folder |
| inputDir | The folder containing the files to scan. | String | Yes | n/a |
| fileMask | The file mask to scan for. There can be multiple values here separated by a space, e.g. *.cs *.vb. | String | Yes | n/a |
| focus | The name of a file to focus on – all other files will be compared against this file. | String | No | none |
| timeout | The period, in seconds, before the task will timeout. | Integer | No | 600 |
| threshold | The number of consecutive lines that have to be the same before an item is reported as a duplicate | Integer | No | 5 |
| width | The minimum number of non-space white characters in a line to be matched. | Integer | No | 2 |
| recurse | Whether to check all sub-directories (true) or just the selected folder (false). | Boolean | No | False |
| shortenNames | Whether to remove the input folder name from the file names (true) or not (false). | Boolean | No | False |
| includeCode | Whether to include the lines of code that were duplicated in the results (true) or not (false). | Boolean | No | False |
| excludeLines | Any lines to exclude from the analysis. | String array | No | none |
| excludeFiles | Any files to exclude from the analysis. | String array | No | none |
And here is an example of how to configure it:
<dupfinder recurse="true" width="5" shortenNames="true" includeCode="true" timeout="1200"><inputDir>C:\...\Trunk\project\remote</inputDir><fileMask>*.cs</fileMask><executable>C:\...\Trunk\Tools\DuplicateFinder\DupFinder.exe</executable><excludeLines><line>using System.*</line></excludeLines><excludeFiles><file>AssemblyInfo.cs</file></excludeFiles></dupfinder>
Some Extras
DuplicateFinder provides a nice simple XML output, but there were a couple of changes I wanted to it. First, the filename contained the entire path of the files. In CC.NET, the full path will point to somewhere on the server, with individual developers potentially having a completely different location.
So I have added the shortenNames parameter to the task. This will load the output XML and trim the filenames. This just iterates through all of the filenames and removes the input path from the name.
Secondly, the output didn’t tell me the code that was duplicated – it just has the starting position and the number of lines. To see the duplicated lines I need to open one of the lines, navigate to the lines and then look.
To get around this I added the includeCode parameter. This will add the lines that were duplicated into the results, so in the dashboard I can easily display them.
The Dashboard
Since this is an analysis type task, I have also added a couple of reports to the dashboard. First, in the project summary there is a summary of the analysis:
Clicking on the “Duplicate Finder Report” will bring up the detailed report:
Yes, nothing fancy, but it does allow someone to quickly see which files have duplicates and which lines are duplicated.
And to make it even easier to install, I have put together a dashboard package that contains the necessary files and settings. All an administrator needs to do is install the package and the report will appear in the dashboard
And Finally, the Why?
Now, any application that can be run from the command-line can be run from a NAnt or MSBuild script, or even via a exec task, so why do I add these new tasks to CruiseControl.NET?
One reason – integration! Adding a task directly to CruiseControl.NET means we can do some nice things with it. For one we can include the progress of the task in the project status – it’s nice to see that we’ve finished the build and are onto unit tests or code analysis, etc. Second, it means only one task – instead of a build task and a merge results task.
And finally, as I have done with the duplicate finder task, we can extend the basic functionality to include new functionality that will add value.
So, if you know of other tools that you think would be nice in CruiseControl.NET, let me know and I can investigate adding them in
Anyone who has installed CruiseControl.NET under Windows Vista or Server 2008 knows that CruiseControl.NET does not like IIS 7.0 (or later). The problem is very simple – IIS 7.0 uses an integrated pipeline for processing ASP.NET requests, while IIS 6.0 uses an ISAPI filter. This simple change introduced a new section in the web.config for integrated applications – system.webServer.
Just to complicate things the dashboard overrides the .aspx extension, basically so it can do it’s own version of MVC. When IIS 7.0 starts up it attempts to validate the configuration. It comes across the new HTTP handlers and basically says “I don’t know what to do!”.
The solution is actually very simple, just add the following section to web.config:
<system.webServer><validation validateIntegratedModeConfiguration="false" /><handlers><add verb="*" path="*.aspx" type="ThoughtWorks.CruiseControl.WebDashboard.MVC.ASPNET.HttpHandler,ThoughtWorks.CruiseControl.WebDashboard" name="CCNetHttpHandler"/><add verb="*" path="*.xml" type="ThoughtWorks.CruiseControl.WebDashboard.MVC.ASPNET.HttpHandler,ThoughtWorks.CruiseControl.WebDashboard" name="CCNetXmlHandler"/></handlers></system.webServer>
The validation element tells IIS 7.0 not to validate the old IIS 6.0 settings, while the handlers element adds the required handlers for the dashboard. Simple!
Now, in theory the dashboard should work from an install without anyone needing to change any settings
Introduction
This is the third post in converting CruiseControl.NET from using string to streams. The other posts are:
In this post I’m going to take another slight detour from the planned changed. The reason – to see whether the changes are actually working!
The reason why is pretty simple – one of the other devs on the project asked for some proof that my changes would reduce memory usage. Now my rough tests and knowledge of the system said they would, but I thought I should do some proper tests to prove it. And here they are!
Methods
The aim of this set of tests to see whether memory usage is reduced with the new stream-based processing. Since I’ve only converted one task at the memory, all my tests will use that task.
First I put together a small application that would write out “The Preventer of Information Services” to standard output 100,000 times. Using the same test, but writing to a file, generates a ~5 Mb file – not too small and not too big (previous tests on my machine showed I could handle files of up to ~50 Mb within CC.NET).
Next, I put together a simple ccnet.config that would run the application. I then ran this config using the –p switch to execute the project and stop (ccnet -p=Test). This allows me to quickly and easily run just that task without any confounding tasks.
Finally, I profiled the test using ANTS Memory Profiler 5.1 (see credits below).
I performed the same test with both the 1.5 codebase and the new 2.0 “experimental” codebase.
Results
Here are the results from running in 1.5:
This is looking at the private bytes – this is the amount of memory that has been allocated to the process.
And this is looking at the # bytes in all heaps (Gen 0, gen 1, gen 2 and large object).
There is a slow but steady increase in memory for around two thirds of the time, and then there is a huge massive jump in memory usage around the 2:20s mark. There’s also a jump in memory usage right at the end, but as the application is terminating it only appears briefly.
Overall execution time was around 3:12s. Total memory allocated was in excess of 128Mb, while the actual memory usage was fast catching up!
And now, the results from the 2.0 codebase:
There is a huge difference – in both graphs!
The private bytes only reached 32Mb – a quarter the amount of memory allocated for the 1.5 codebase. And the change is even more dramatic for the bytes in all heaps – I had to use the tooltip to find out how much memory was used – a mere 1.31 Mb at the most. Additionally, this high usage point was right at the beginning of the task executable, as opposed to a spike at the end. Finally, the memory usage is actually dropping at the end of the execution in the 2.0 codebase.
Finally, and this surprised me, the execution time was better for the 2.0 codebase (approximately 2:13s). However this might be because my machine is running at near physical memory capacity anyway, so the extra memory needs to come from the paging file.
Analysis
In the 1.5 codebase there are a number of memory jumps. I think these jumps are being caused by different steps in the process of persisting the results to the log file. The slow increase is due to the test task writing to standard output. The first big jump is when the task is completed and the data is being converted into an XML format. The final jump is when the data is actually being merged into the log file for writing to disk.
In contrast, the 2.0 codebase has a much lower memory usage level. The memory level is staying low because the output is being written directly to disk and then freed. The memory manager in .NET can then reuse the memory instead of needing to ask for more.
The second difference lies in the way the results are converted – this is done on a line by line basis in the new code. In the 1.5 codebase everything is done in memory, thus requiring sufficient memory for both the original string and the new string, plus any memory for working space. In the 2.0 codebase the conversion is done on a line by line basis. The line is loaded from the original stream, converted and written to the new stream. Again, a much smaller memory footprint being needed.
Finally, the spike at the end is eliminated as the data is already in the files, so there is no need to merge everything together (although in theory this could still be done.)
Summary
So, in conclusion, the stream changes have significantly reduced the memory footprint – at least for my simple test. Comparing the two codebases, the 2.0 codebase only requires a quarter of the total memory allocated, and internally less than 1% of the old heap allocations! For those machines like mine with only limited physical memory, the changes also result in faster performance!
Credits
I was able to perform these tests thanks to Red Gate. They provided a free software license of their .NET Developer’s Bundle to allow me to do some performance tuning, and I am very thankful to them for this (it turns out they use CruiseControl.NET, so hopefully this will help them too). Their website (http://www.red-gate.com/index.htm) contains the details on their products, plus a lot of useful information on how to use their tools.
For these tests I used the ANTS Memory Profiler, I haven’t tried the other tools in the bundle yet.
I really liked the way you could see what was happening in real-time, it was very easy to see the memory jumps and tie it into what was happening in the application. They also have a whole heap of additional information that is available – like snapshots of how the memory is being used and what has changed. But for this post the graphs had all the information – in future posts I’ll delve into some of the other information available.
So if you are looking for a memory profiling tool, I definitely recommend taking a look at the ANTS Memory Profiler, it takes the guesswork out of what is happening in memory
Continuation
In my last post on this subject (read it here) I added the concept of a task context. This is a context that the task runs within and stores all the output from a task. The next step is to start writing to this context.
One of the core concepts in the context is it generates streams that can be used by the task. The task is then responsible for managing the stream, but the context is responsible for managing the referencing to the stream. So, the trick to generate the streams from the context, pass it through to where they are needed in the tasks and then clean up when the task has finished with the streams.
And of course, that’s where it gets tricky!
The Current Situation
90% of the time, a task does not directly execute an external application. Instead it calls through a number of layers. This allows a number of cross-cutting functions to be built in, but at the same time it makes changes from strings to streams harder.
Here is the current way it works:
The calling task calls the TryToRun() method on BaseExecutableTask, which passes it onto ProcessExecutor and finally RunnableProcess. RunnableProcess internally creates two StringBuilder instances – one for standard output (StdOut) and one for standard error (StdErr). RunnableProcess also provides all the functionality necessary for getting the data from StdOut and StdErr and putting it into these two instances.
When the Run() method on RunnableProcess has finished, it generates a ProcessResult and stores the strings from the two StringBuilder instances into there. From here on, the StdOut and StdErr are stored as strings in memory.
So, the issue now becomes one of where should the streams be initialised? How many streams should be generated? And what do we want to store for future usage?
The New Situation
The main change I am making is removing the instantiation of StdOut and StdErr from within RunnableProcess to BaseExecutableTask. These will be instantiated as streams and passed through to RunnableProcess, where it is an easy enough change to write to the streams instead of a StringBuilder.
Additionally, I’m going to get BaseExecutableTask to generate two streams – one for StdOut and one for StdErr. These get passed down the chain. When the Execute() method on ProcessExecutor has completed, these two streams will be merged into one.
So, this shows how I am changing things:
On a side note, a ProcessResult will still be generated as this contains additional information needed for the tasks (e.g. exit codes, time-out details, etc.)
Based on this plan, most of the work is in the BaseExecutableTask, with minor changes to the other two classes. Additionally, I’m going to do some work around the merging of StdErr and StdOut in TaskContext, as this class is responsible for managing the references.
The Actual Changes
Most of the changes are straight-forward, and reasonably boring (except when I made a mistake and have to debug it!)
RunnableProcess and ProcessExecutor were both modified to accept streams and use them (instead of the internal StringBuilders.) Reasonably simple change – just needed to remember to close the StreamWriters I was using.
BaseExecutableTask got a new override for TryToRun() that uses the new streams functionality. This override includes the task name and type (required for creating a result). Additionally I added a couple of new protected virtual methods methods to allow people to override some of the functionality. This is the creating the result stream and merging results functionality (more on this below).
Finally TaskContext got a new method – MergeResultStreams(). This will literally merge two or more streams into a single stream. It also manages the references – the old references are removed and a new reference is added for the merged file. The merging is handled by a delegate, so individual tasks can define how the results are merged. The default merge is a binary merge – copy all the bytes from each of the streams into a single stream.
BaseExecutableTask defines a custom merge delegate. This merge delegate will merge all the results into an XML format – similar to how ExecutableTask does it currently. However this uses the streams to handle the merging and formatting, instead of directly manipulating strings in memory – so we shouldn’t have the out of memory exceptions
And that’s all there is to it – at least for this phase. At this point the new code breaks a number of other tasks – those that rely on the data being in a single massive string. Plus there are a number of additional overrides that I have added temporarily to reduce the amount of breaking changes.
So, What’s Next?
Looking over my previous what’s next list, I realise I’ve skipped over a couple of points – all of this is included in the task changing. I have modified Project to both generate the task context and associate it with tasks, but I think a bit more work is needed there. Additionally I need to look at the container tasks (e.g. parallel task, sequential task, etc.) to see what needs to change to pass on child contexts.
So, in my next post I plan on covering the “plumbing” for the contexts, and then I’ll return to modifying tasks.
Stay tuned…
Introduction
Earlier this year I added parallel and sequential tasks to CruiseControl.NET (read about them here). These tasks provided the ability to split a project into multi “streams” that ran in parallel. Additionally, projects within CruiseControl.NET already run in parallel – so this functionality has been around for a long time.
However, there are the occasional issue around having tasks running in parallel – and most of the time it revolves around access to a limited resource (e.g. a file on disk, etc.) This opens the possibility for a synchronisation task – which is what I have now added
Within a Project
Normally, within a project tasks run in sequence:
With the parallel and sequential tasks introduced in CruiseControl.NET 1.5, we can now have two (or more) sets of tasks running in parallel:
The problem comes when two (or more) tasks want to access the same resource at the same time, for example writing to a file:
Now if there is a slight timing difference, then this will work (either task 3 finishes before task 6 starts or vice versa). But when task 3 starts before task 6 finishes (or vice versa) then there is a potential resource conflict! The new synchronisation task handles this situation by wrapping the two tasks in a synchronisation context:
Tasks 3 and 6 are both wrapped in an instance of the new synchronisation task. When either of the wrapper tasks start, they check to see if the synchronisation context is free – if it is then the task is allowed to execute. Otherwise it waits until the context is free, and then starts the task.
This means only task 3 or task 6 will be running at any time – the two tasks will not be able to execute at the same time!
Between Projects
The same process also applies for common resources that are shared between projects. In this scenario we have two “streams” of tasks running in parallel:
Again, they might need to access the same resource:
Synchronisation in this scenario works in exactly the same way:
Again, the tasks will need to be wrapped in the new synchronisation task. This task generates the synchronisation context and ensures that only one task can enter the context at any one time.
And the good news is this is the same task for both within project and between projects!
Some Inner Workings
This section covers how the synchronisation task works, so if you don’t care about the technical details, just move onto the next section
Internally the task has a static table that contains a set of named synchronisation objects. When the synchronisation task starts, it first ensures that there is a synchronisation object in the table – if not it will add it. It then uses Monitor.TryEnter() to lock the synchronisation object. When the lock is entered, it then runs all the child tasks in sequence.
There are a few other small details:
- When the lock is entered, the next step is to enter a try/finally clause – this ensures that the lock will be exited, no matter what happens to the child tasks
- The TryEnter() uses a time-out – this can be configured, but defaults to 300 seconds (five minutes)
- The default context is called DefaultSynchronisationContext – the context can have different names
The last point is very important. If the name of the context is not changed for any synchronisation task, then there will only be one synchronisation context for the entire instance of CruiseControl.NET. Changing the context name allows multiple synchronisation contexts (e.g. one per project, one per shared resource, etc.)
Configuring the New Task
This is a wrapper task, similar to the parallel and sequential tasks. This means it is configured in the same way:
1: <tasks>2: <synchonised timeout="60">3: <tasks>4: <exec>5: <!-- Omitted for brevity -->6: </exec>7: </tasks>8: </synchonised>9: </tasks>
This example demonstrates running an exec task within the default context. If the context is not entered within 60 seconds, the task will fail.
1: <tasks>2: <synchonised context="sharedResource">3: <continueOnFailure>true</continueOnFailure>4: <tasks>5: <exec>6: <!-- Omitted for brevity -->7: </exec>8: <exec>9: <!-- Omitted for brevity -->10: </exec>11: <exec>12: <!-- Omitted for brevity -->13: </exec>14: </tasks>15: </synchonised>16: </tasks>
This example shows a named context, with multiple wrapped tasks. All of the tasks will run, whether or not they are successful.
Note: In CruiseControl.NET all simple properties can be set as either attributes or elements. continueOnFailure can be added as an attribute as desired and likewise context could be an element.
The following configuration options are available:
| Name | Description | Type | Required | Default |
| continueOnFailure | Whether to continue running tasks after a task has failed. If this is false, then the context will be exited after the first failure. Otherwise it will only exit after all the tasks have run. | Boolean | No | false |
| context | The name of the synchronisation context to use. | String | No | DefaultSynchronisationContext |
| timeout | The time-out period to wait (in seconds) when trying to enter the synchronisation context. If the context cannot be entered within this period the task will fail and no tasks will be run. | Integer | No | 300 |
Some Final Words
At the moment this task is only available in the CruiseControl.NET 2.0 branch – which means it won’t be available in the main release for a while. If there is interest in making this task available earlier, let me know and I can move it to the trunk.
Additionally I haven’t spent much time testing this task. There is a full set of unit tests (yes I used TDD), but I haven’t tried it on any “real” instances of CruiseControl.NET yet.
Introduction
This will be the first in a series of posts I’m planning on writing about modifying CruiseControl.NET to be stream-based for the results.
In my previous posts I’ve been looking into reducing memory usage for CruiseControl.NET. Most of the changes have been small tweaks (reducing the number of copies of a string, reducing memory peaking, etc.) However these changes have not addressed the underlying issue – CruiseControl.NET is manipulating some very large strings in memory!
This series will look at changing from using strings for encapsulating task results to using streams. Additionally, I’ll look at splitting our huge massive log file into a series of smaller log files.
Adding Some Context
At the moment, each task is an isolated unit. They exist in their own little worlds, with no interactions with the rest of the project. Information is passed into the task using an IIntegrationResult and information is returned via an IIntegrationResult. Additionally, there are a number of global instances that a task can use (like loggers, file systems, etc.)
For these changes I’m planning on a change in perspective. Instead of a task working in isolation, only communicating via an IIntegrationResult, I’m going to add a task context. When a task runs, it will have this context to call upon.
This means each task can effectively act on its own, while at the same time being part of a bigger picture. Before I explain what I mean by this, let’s take a look at a diagram of how I see this working:
This diagram breaks the task information into two – the configuration and the context. The task configuration is loaded when CruiseControl.NET loads (or is reloaded) and is in theory immutable (should never change unless reloaded). This defines the task, how it works and what it works on. The task configurations sit within the project configuration, and is likewise immutable (although not everything in a project is immutable, more on this in a future post).
So far, this describes how CruiseControl.NET currently works, so no changes needed at all. The new stuff is on the right-hand side of the diagram – context.
Context provides mutable data for a task. This is the data the task uses when it runs, and is reset for every run. Because of this, a task can start with a clean slate, and slowly build up its state within the context.
In the diagram above, I’ve shown two types of context – task context and child context. In the code, they are both the same type (more on this later), but contextually they work in different ways. The task context is the root level context – this matches the project configuration. As each task starts, a child context is started for the task to run in. Then, when the task finishes the child context is loaded into its parent context:
The reason for this design change will (hopefully) become more obvious over time. Basically this provides a “safe” environment for a task to run. It can modify any of its context settings, without needing to worry about screwing up any other tasks. This is especially relevant as we start moving into more complex task types – like parallel execution tasks where each task runs on its own thread!
So we now have the picture where the configuration is immutable and remains unchanged between builds. Instead each build has its own set of context to work with, and each task contributes to this context as it runs.
By this time you’re wondering where are the streams? So, that brings us to our next area:
From Context To Stream
In .NET (and many other languages) a stream is just a sequence of bytes. The Stream class provides a wrapper for whether the bytes are actually stored, which means a stream can be stored anywhere, provide all sorts of extra functionality and can really simplify testing, all without needing to change the code that uses the stream. On the down side, it is up to the developer to provide everything beyond the basic byte manipulation.
This is why behind the context changes I described above. As far as each task is aware, it just interacts with the stream of bytes – writing out the results as they are generated. The context provides the infrastructure around this – where the streams are stored, how to start a stream, merging streams and referencing them for future usage.
When a task needs a stream for writing its results, it calls the context that is is associated with. The context then opens the new file stream (including making sure each file name is unique) and returns the stream instance to the task. The task then uses the stream as desired.
At the same time as opening the stream, the context also stored a reference to the file stream. This is the path to the file that was opened. When the task is finished, the context for the task is merged up into the parent context – including the reference to any files that were opened. When the root level context is finished, it writes all of the file references to an XML index file. This provides the link between the individual task results and the overall build results.
Where To From Here?
This post outlines the basic context changes I’m making to CruiseControl.NET. It allows the generation of streams for each task in a way that is thread safe, plus has the ability to treat results as a whole. I’ve very quickly outlined how the writing process will work, but this still leaves many areas uncovered.
Here is a rough outline of where I’m planning to go from here:
- Add initialising the root context as part of the build process
- Work out how to associate the context with each task
- Change the tasks included with CruiseControl.NET to use the context
- Modify the server interfaces to work with individual task results (instead of the big log file)
- Change the dashboard to use the individual task results
- Fix the build report templates (the XSL-T files) to work with individual results
- Fix the tasks within CruiseControl.NET that use the entire build log (e.g. e-mail task, etc.)
- Add a report generator task that generates and stored the reports on the server (this will use the new HTML report plug-in in the dashboard)
- Add backwards compatibility to the server interface
Somewhere in this process I plan on removing ITaskResult and its associated/dependent classes and methods from the code base. This means that this series of changes will break existing plug-ins! The why behind this decision is two-fold:
- It simplifies my job – I don’t have to deal with backwards compatibility on the server (I still plan on adding it to the server interfaces so older clients can still connect to CruiseControl.NET)
- It prevents people in future from going down the strings path when writing new tasks
So that’s all for now, I’ll continue slowly working on this as I have time.
Previously
In the first post in this series (here) I looked at some options for reducing memory usage, in my last post (here) I applied the first option and added caching to the dashboard. This time it is the turn of the server to get some caching.
Again, the main point from the previous posts on caching is the primary interest for most people is seeing why a build has failed. The most recent build logs are expected to be the most used, so caching can help reduce the amount of “copies” of a build log in memory. This is especially relevant when there are a large number of clients (e.g. CCTray) accessing the same build information at roughly the same time.
Quick and Easy
When I added caching to the dashboard, I also added an extra class to handle synchronised access to the data. So basically, all the work has been done already
First, the place where the log is being fetched is the GetLog() method in CruiseServer. I’ve added the method to retrieve the cached log in here – which replaces the previous call (this just retrieved the log from the specified project).
Next, I copied over the RetrieveLogData() method from the dashboard project and changed it to do the old retrieve log call.
Finally, I made one other small modification – I added a lock { } statement to the code. This is to ensure that only one SynchronisedData is ever added for each log file. There is a small possibility of a race condition when the cached data is fetched – because it needs to do a few extra steps if the data is not cached. Adding this lock { } statement completely removes this possibility.
Caching – Tick
So, that’s caching added to both the server and the dashboard. Like I said in my last post, I expect this to average out the memory usage, rather than reducing the average memory level. Hopefully these will work as planned.
If not, I have one final trick up my sleeve – compressing the data. Stay tuned…
Previously
In my last post (read it here) I took a holistic approach to reducing memory usage on the (server) machine. In it I suggested three items to reduce memory usage:
- Caching on the dashboard
- Caching on the server
- Compressing the build logs
In this post I’ll look at implementing the dashboard caching.
To help with the recap, here is a diagram from my last post:
Basically this is stating most of the views to a log will be to the most recent (failed) build, less views to the last successful build and even fewer views for historical builds. This means most of the effort can be focused around caching a very small set of builds.
Introducing Caching
For caching, the first question is what should be cached.
When a request is processed for a build report it goes through this general process:
The request comes in, the process retrieves the log from the server, generates the build request (the actual report) and returns it to the client. I’ve put the generation into a black box, because that’s how I’m going to treat it (it is a complex process within the box). With this simplified view, there are two obvious cache options:
When the process needs a log, it can check the log cache first. If there is a log there, then this log will be used – otherwise it will fetch the log from the server, store it in the cache and then continue.
The second cache is to bypass the entire process – to cache the result and use it. When I investigated the dashboard further I discovered that build reports are already cached (this is via an action proxy – CachingActionProxy.) This proxy action caches each build response into the cache using the raw URL of the incoming request. Since this is working at the moment, I’m not going to change it. I’m not sure how this will work with security in place – so this will need to be investigated.
Before I delve into the implementation, there is an issue that needs to be considered. Retrieving a log can potentially be long-running processes (i.e. it takes more than a few milliseconds to perform.) The last thing we want is multiple users attempting to access a log almost simultaneously and trigger the same process multiple times! Because of this, the design needs to take into account potential race conditions.
Fetching and Caching the Log
The build logs are loaded in BuildRequestTransformer. This is the class within the dashboard process that is responsible for fetching the logs and generating the results – although it passes on these tasks to two very different places.
Within this class there is one method that needs to be modified – Transform(). This method currently looks like:
string log = buildRetriever.GetBuild(buildSpecifier, sessionToken).Log;return transformer.Transform(log, transformerFileNames, xsltArgs);
Since I am changing the way the logs are retrieved, I’m going to change it to:
var log = this.RetrieveLogData(buildSpecifier, sessionToken);return transformer.Transform(log, transformerFileNames, xsltArgs);
Of course, all the fun happens in RetrieveLogData() (the rest is just tidying things up.)
So, what happens in RetrieveLogData()? First, a unique cache key is generated. This is the combination of the server, project and build names, plus the user’s session token (if there is one). The session token means we’re going to a per-user cache, but unfortunately it is required since each user could potentially see different information.
Once the key is generated, the method checks if there is already a log under that key – if so it will use that log. If not, it generates a new instance of a SynchronisedData, adds this to the cache and then loads the log from the server.
And that’s all there really is to it – nice and simple.
What about SynchronisedData?
You might have noticed I have added a new class – SynchronisedData (or you might have just assumed that it already existed – I’m good with either.) This class is responsible for ensuring that only one instance of the data exists (or at least tries to.)
Basically, this class allows data to be added to the cache before it is retrieved. Internally it uses a ManualResetEvent to let any other threads know when it is safe to use the data. There is of course a catch – if a thread does not initialise the instance (e.g. retrieves it from the cache), it must call the WaitForLoad() method before retrieving the data – otherwise there might not be any data to use.
Otherwise this class is very simple, it has a single property – Data, plus a couple of methods – WaitForLoad() and LoadData(). WaitForLoad() just makes sure it is safe to use the Data property, while LoadData() allows the caller to load some data into the property.
The caller that initialises the instance is responsible for loading the data via the LoadData() method, all other callers need to use WaitForLoad() to ensure it is safe to use the data.
A Final Word on Caching
Someone might look at just this post and ask why am I suggesting caching as a solution for memory usage? Caching generally increases memory usage, since a process is generating and storing items, instead of just generating and disposing them.
And I have to agree. I expect memory usage to go up on average with these changes. The key phrase in that sentence is “on average”. The basic problem I think is happening is we are getting memory spikes. A number of people are trying to access the same log at the same time – therefore the server and the dashboard are generating multiple copies of the same log and then generating multiple copies of the same report.
Caching solves this problem by exposing the same log and the same report to all users, rather than a copy of the log and report. It does open some possibilities on race conditions (hence the locking code) and also increases average memory usage. But it should (hopefully) stop out of memory exceptions – which can really ruin someone’s day
And a Final Comment
There was a comment on my last post about clearing the cache when the configuration changes. For the log files, this is not an issue as it will be the same log (always, no matter what, logs NEVER change!) But, the build reports might change. Since this is handled by an already existing handler, I’m not sure whether it will handle config changes (my guess is it won’t.)
In my next post I’ll look at making some changes to the server, so stay tuned…
The Problem
This is a continuation on my earlier post on memory issues (read it here). In that post I made some minor changes to reduce memory usage (mainly around string handling). However those changes didn’t fix the underlying problem – CruiseControl.NET is working with large strings in memory.
Now normally the OS and .NET will allow us to play around with large objects in memory without too many issues. Normally…
Sometimes it is possible to use up all the available memory (including swap files, etc.) Why? Because we put too many big strings into memory – and that’s exactly what CruiseControl.NET does. And to make things worse, CruiseControl.NET likes to compound things by the way it works.
So, how does CruiseControl.NET work?
First off, we have the server. This is basically a polling application – every five seconds it checks if a build can start and if so, triggers the build (actually it’s a little bit worse than that, each project has it’s own thread which polls every five seconds, but that’s another matter altogether.) As the build runs it builds up a set of task results, which then get merged together into one big log file:
So, it is possible to have multiple projects, each having multiple large task results in memory. Then, when these get merged into the build log, that’s even more memory used. But wait, there’s more!
As well as the project threads, the server is also responsible for serving results to anyone who requests one of these build logs. This is done by loading the log into memory and then returning the entire log (via .NET Remoting) to the client:
So, as well as having multiple task results and multiple build logs in memory, it is also possible to have multiple instances of the build logs – one per request handler (although in theory these would be quick requests and shouldn’t hold onto memory.) And yes, there’s still more!
The CruiseControl.NET server only allows .NET Remoting clients – which isn’t very helpful if you need to go through a firewall. Plus, people need some way of seeing the results from the tasks – CCTray doesn’t display them. So to handle these situations there is the Web Dashboard. This functions as a client application and does all sorts of things – including manipulating build logs in memory (normally an XSL-T transform):
And where does the Web Dashboard normally sit? On the same machine as the server (unless you’ve changed the default install and installed the dashboard on a separate machine).
So, this gives us a picture of having large strings in memory in several places. The project threads might be okay by themselves, but when you add the communications side, it is very easy to start consuming large amounts of memory with big build logs. Especially when there are a sizeable number of people trying to get results from a build (and what’s the first thing people do when they see a broken build – they go to the server to see what’s broken!)
Pruning Some Memory
Now that we’ve quickly reviewed the possible places where build logs are stored, what can we do to reduce the amount of memory?
Two initial approaches come to mind:
- Reduce the size of the log file
- Reduce the number of times each log file is loaded into memory
Option #1 can be done either in CruiseControl.NET or externally. First, we can split the mega-log into smaller logs, second we could write less data, third we could compress the data.
I’ve been looking at what I can do to split the mega-log file into build result specific log files, so a client could just grab the data they need (e.g. just the NAnt/NUnit/MSBuild/etc. results) instead of everything at once. Unfortunately this is a MASSIVE change, that will need to modify both the server and the clients in a large number of places. So for the moment I have spun off a “play” branch in the Subversion repository where I will attempt to do this (oh, and resolve all the associated problems with having multiple result files.) So this is a no-go for the current release.
Next, writing less data – well CruiseControl.NET doesn’t put that much CruiseControl.NET-related data into the log files. Most of the data comes from the external tools. So if this needs to be done, then people need to reduce the amount of data the tools generate – hence this becomes an external task.
So, that leaves compressing the data. Personally I like this approach as it would also reduce the amount of network traffic. Unfortunately there are a couple of gottas – there is no guarantee that compression would actually save space and it requires changes to the clients to decompress the data. But I still think the idea has merit.
Returning back to the two initial approaches, the other approach is to reduce the number of times each log is loaded. This is the approach I want to investigate some more.
Log Viewing Usage Patterns
From my experience with CruiseControl.NET, there are two general usage patterns:
- THE BUILD HAS BROKEN – what’s wrong?
- What happened to an older build?
And typically the second pattern also happens a lot as part of pattern #1 when people try to find out what has changed to cause the build to break. Very rarely do people go and look at a historical build.
The following picture sums up this pattern, with the size of the arrow indicating how much usage a build log would get:
So, most people only view the logs when something is broken and they primarily focus on the log of the build that actually failed. A smaller set of people would also check the previous successful build to see if there is anything pertinent to the failure. Finally a very small set of people might go through older logs to see what has happened (your manager is checking up on the amount of work you’ve been doing, etc.)
This sounds like a very good scenario for caching. Since there is a (potentially) small amount of data, this could be cached on the server, the dashboard or even both.
Bring on the Cache(s)
On the server side, we could just cache the log files – although we wouldn’t want to cache too many big ones.
The dashboard offers us a few different possibilities for caching. First, we could take the same approach as for the server and cache the the log files. Second, we could cache the parsed XML documents (since the entire document must be loaded and parsed before it can be transformed). Finally, we could cache the transformed output.
However, like anything in the dashboard, the way the build logs are transformed is not a simple process. Instead we have a number of interfaces and their implementations to go through. The build report generation is a plug-in, which generates multiple actions. Each action can have one or more style sheets, which actually get loaded and processed by a class in the Core library.
To Be Continued…
I’ve gone back and reviewed the memory usage scenarios for CruiseControl.NET with a view of reducing the usage on the server. This took a step back and viewed the wider picture, with a view of all the items that CruiseControl.NET is doing on a (server) machine. Unlike my previous post, this takes into account multiple users doing similar things at the same time.
I’ve come up with a couple of ideas for reducing memory usage – caching and compression – and both of them have some gottas that need to be investigated.
So rather than making this an even longer post, I’m going to stop at this point and look into caching and compression in some future posts.
So stay tuned, more fun to come