Archive

Posts Tagged ‘Memory Reduction’

Beware Excessive Caching

Recently I added a few tweaks to CC.NET to try and reduce the memory usage (read the posts here and here). Unfortunately, this had the opposite affect to what I had hoped and instead increased memory usage!

The reason why is simple – I was too aggressive in my caching. I had added caches to both the dashboard and the server, and set the sliding expiry time to one hour. While this worked fine with smaller log files, it absolutely killed the dashboard for larger log files.

I tested on my machine with a 45Mb log file (big yes, but not outside the realm of possible). With the above settings I could view six log files – no more! Additionally, this locked the memory for an hour, both in the server and the dashboard (double memory usage – *ouch*)

Coming back to the original goal, it was to smooth out the peaks and dips in memory usage so it is a more even approach. So I’ve cut back on the amount of caching.

First, I reduce the amount of time log files were cached on the server. In my original scenario I was predicting that most people were interested in the most recent build, beyond that they were less interested. So I cut the caching time right back to five minutes (this is the default polling time within CC.NET for the interval trigger.)

And since I realise that I don’t know what other people have, I have made this a configurable setting. If your server has more memory, you can increase the cache time and hold more logs in memory. You can also reduce the time, but this can potentially lead to cache churn, so be careful with no going too low!

Second, I removed caching from the dashboard altogether. This is because the dashboard already caches the build pages, so this was leading to double caching again. Since people are most interested in the build pages rather than the log, I decided it was acceptable to remove the dashboard caching.

The result, I was now unable get out of memory exceptions with the 45Mb build logs on my machine (at least when viewing them in the dashboard). There will still be an absolute limit somewhere, but until we move to streams this is an ok solution.

Of course, to really fix the problem we will change to using streams, but there is no timeframe for these changes yet…

Reducing Strings: Transferring Data

29 November, 2009 Leave a comment

A Multi-OS Challenge

CruiseControl.NET has gone through a few iterations of file transfer. Prior to version 1.4, the only files that could be transferred was the build log and other special files like the statistics or RSS feed. To handle these files there were special methods in the Remote API, and all they did was transfer the entire file across the network as a single string (serialised via .NET Remoting).

With CruiseControl.NET 1.4 we added the ability to transfer any file that was located in the artefacts folder of a project. This was a new method in the Remote API, but it now allowed the remote client to ask for a file. If the file was there, it was transferred across the web. And this is where things got interesting!

Originally, this method generated a MarshalByRef-based object that would connect back to the server and transfer the data a block at a time. This ensured that only small blocks of data were transferred, thus reducing the memory usage on the server.

Unfortunately Windows 2008 did not like this approach! IPSec blocked the object from connecting back to the server, so no data was transferred (yes, it worked on every other OS we tried, just one that stopped it.) After several frustrating attempts to resolve the issue, we gave up and went to transferring the entire file as a single byte[] buffer over the network. Once again memory usage went up :-(

So, for CruiseControl 2.0 we are back to the issue of how can we transfer large files across the network?

Let’s Chat

The problem with transferring a single large block of data across the network is it needs to be loaded into memory first, and then transferred across the network. Even .NET Remoting does not like transferring large blocks across the network, but it does it. Under the hood .NET Remoting takes the large block of data and breaks it into smaller blocks, which then get transferred across the network. This is what our original approach did, but IPSec did not like it!

So, how can we go back to the block-based approach and get past IPSec? The answer is to move from our nice chunky interface, were .NET Remoting handles the blocks, to a more chatty interface (yes, I know, normally not a good practise, but sometimes we don’t live in an ideal world!)

The current situation looks something like this:

image

With .NET Remoting handling the blocking within the red arrow. The new approach will look like this:

image

Yes, it will be a multi-step approach. The steps are:

  1. Open the file on the server
  2. Transfer each block of data
  3. Close the file on the server

This approach requires tighter coupling between the client and the server (sigh). When the client opens the file, a file identifier if returned. This file identifier is what the client then passes to the server for all subsequent operations.

So, why a file identifier? The identifier is used to identify which instance of the opened file is being used (the file is opened in read-mode so it can be opened multiple times). The server then maps the identifier to the stream instance (which is only held on the server). When each block is transferred, the stream is repositioned so the next block is ready. This means the client does not care where it is in the process, it only cares that there is more data to fetch.

The other downside for this approach is the file now needs to be closed on the server – other the number of open streams will slowly increase (i.e. a memory leak!) So closing the file will clean up after the transfer has been completed.

The other approach I looked at was for the client to pass in the starting position. But in the end I decided against this approach for two reasons:

  1. It would require the client know what data it wanted (i.e. the position within the file). If it requested a position beyond the file length, then the server would need some error handling to not fail.
  2. It would require opening the file, positioning to the correct location and closing the file every data fetch! I’m not sure on the performance of this, but it sounds slow.

Of course, if we run into issues with my current approach we can switch to the alternate and test it out, but for now, it’s the approach we are using :-)

Being Helpful

Now, rather than forcing the clients to implement their own versions of file transfer (and potentially duplicate their code), I’ve added a new method to CruiseServerClient in the Remote API. This method is called TransferFile() and it literally does that – it handles all the logic of transferring a file from the server into a local stream.

To use this method is as simple as the following code:

client.TransferFile(projectName, fileName, outputStream);

Where client is an instance of CruiseServerClient (this can be generated from a CruiseServerClientFactory).

I have gone through and removed the old file transfer mechanism (in both the web dashboard and CCTray) and changed to using this method. This means all file transfers will now use the new approach and use less memory. However in a future post I’ll do some tests to see how much (if any) memory is being saved.

Coming Up

Now that we have a file transfer mechanism, the next step is to look at handling the new build log format. In my next post (or posts) I’ll look at the modifications to retrieve the build data and the changes to the dashboard.

Stay tuned…

Reducing Strings: A Quick Look Back

24 November, 2009 Leave a comment

Where Have We Come From?

Before I start modifying the remote interfaces to use the new stream-based data storage, I thought I’d take a quick look back at the memory performance of these interfaces. Again I’ll be using ANTS Memory Profiler 5 to look at how memory is being used.

Just to show the impact of strings in the system I’ll take a look at the original system before I added caching, and then after to see what impact this change had. These will then provide the baselines for looking at how the stream changes can reduce the memory footprint of the CruiseControl.NET server.

But before I get into the analysis, let’s revise how data is sent from the server to a client (e.g. CCTray, the dashboard, etc.) The entire build log is stored in a single file on disk. The client makes a request to the server via .NET Remoting. This then opens the file, reads it all into memory and returns the entire file back to the client as a string. The caching changes modify this procedure only by ensuring there is a single instance of each build log in memory – subsequent calls to retrieve the same log will use the cached log (note, this is cached in memory).

The Method

This is not as nice and tidy as testing the memory usage for a project. The server needs to be up and running and the client then makes the call to the server. However since we are not interested in the client memory usage (yet), we need to be monitoring the server and we cannot directly tell when a memory usage scenario is beginning. To get around this limitation I wrote a small application (stress client) that makes multiple calls to the server to fetch the latest build log for a project. The build log I am fetching is 6Mb in size, the application will attempt to download this log concurrently multiple times to give the profiler plenty of attempts to analyse. On my memory I was able to get four concurrent instances connecting at once, each instance fetched the same build log ten times.

With this in place, I started the profiler and started profiling the server (ccnet.exe). This was run without any start-up parameters and left running for the duration at the test. I waited 30 seconds after starting the application and after the test had finished to make sure the relevant data was collected. After the 30 seconds for start-up had passed I ran the stress client and waited for it to finished. 30 seconds after the client had finished I shut down the server. I performed two sets of tests. The first set of tests I just ran the profiler and collected the memory stats. The second set of tests I captured the memory usage every 5-10 seconds during the test (this was done manually).

These tests were run three times, with the following versions:

  1. This is the original CruiseControl.NET server before the caching was added
  2. This is CruiseControl.NET server with caching added
  3. This is CruiseControl.NET server with caching added, plus an extra modification to garbage collect after each build log fetch

The 1.5.0 codebase was used for all three versions. Only the get build log method was modified for each version.

The Data

These are the results from the original version (#1):

Run1

For the original version, just accessing the build log allocated 234Mb of memory to the server – of which most is used by the large object heap.

Changing to the modified version of the server we get the following results:

Run2

The first thing to notice is the private bytes is also half the original version – and the large object heap is very flat. In the original version a new string was added to the heap every time a client accessed the build log, in the new version there is only one version – hence the reason for the heap being much flatter. However the application is still using a lot of memory – there is a total of 80Mb assigned to all the heaps.

So, the final change is to force garbage collection:

Run3

Again, there is a drastic reduction in memory – this time there is only a maximum of 40MB in all heaps – and the total memory allocated has also been reduced. However, looking at the large object heap, there has been no change – so the question is what has changed and why?

Digging Deeper

The ANTS memory profiler has the ability to take a snapshot of what is currently being used by an application. This shows what objects have been allocated and what is associated with what. The only down side is the snapshots have to be manually taken, but the graphs help to show where snapshots need to be taken. These snapshots can then be viewed in the application to see where the memory is being used and what is holding onto the memory.

For this analysis I only took snapshots of versions 2 and 3 to see exactly what was using memory. Version 2 showed the following information:

Memory-1

While version 3 showed the following:

Memory-2

This shows that string and byte[] has the most memory usage in both versions, but it is the amount of byte[] memory that has changed. The following shows why this is the case:

byte-data

The remoting infrastructure is generating blocks of byte[] data (byte[4096] to be exact) which are then being transmitted to the client. When the data is returned to the client the remoting infrastructure is breaking it down into blocks of byte[] data, these are what then get sent over the wire. These blocks are only 4096b in size, so they are not stored on the large object heap, but the large number of them (4,800 in the example for version 2) means they are taking up a large amount of space. So, in order to reduce memory usage we also need to look into this area also. This is important to note, because in CruiseControl.NET 1.5.0 (and earlier versions) all data transfer is handled by sending a single block of data. We did try during the development of 1.5.0 to do a block transfer, but IPSec in Win2K8 blocked it, so we had to reverse the changes. So we need to find a permanent solution to this issue that will work on all OS versions!

In Summary

We had a quick look at memory consumption under 1.5.0 seeing what could be done to reduce the amount of memory usage. However we’ve pretty much run into a stumbling block with the current log size and that it is a single (huge) block of data. For CruiseControl.NET 2.0 we have already split the build log into smaller files, but we also need to look at what can be done to reduce memory usage while transferring over the wire.

So, in my next post I’ll take a look at an alternate transfer mechanism. Stay tuned…

Reducing Strings: Converting Tasks

23 November, 2009 Leave a comment

A Quick Review: Task Classifications

A few posts ago (here), I classified the tasks into how they generate output. Broadly, the output can be split between the build log or directly generated to the file system. There is also a third type of output – external output. This is output that is generated by an external application and stored directly to the file system.

These three types all need to be modified to generate index entries, so the results can be accessed later on. Tasks that write to the build log (e.g. via strings in the old version) or those that write to streams (e.g. direct output to the file system) are easy enough – we already have methods on TaskContext to  generate indexed streams. However external applications that generate files directly need consideration.

So, for this post I am going to convert two tasks, one that generates build log data and the other that manipulates external data. The tasks I will convert are:

  • ExecutableTask
  • NDependTask

This is also the order of complexity of the conversion :) I omitted a task from the internal generation, as those tasks often have additional (and complex) logic. I will hopefully look at them in a future post.

General Execution: ExecutableTask

One of the most basic tasks in the system is the executable task. This task is used for executing any external application, which also makes it one of the most versatile tasks. This task attempts to execute the application, nothing more, nothing less. Any output to standard out or standard error is captured by CruiseControl.NET and written to the build log.

In the current version of CruiseControl.NET (1.5 or earlier), generating the build output from the executable task is a two stage process. First, everything is written to a StringBuilder and then dumped into a string. In the second stage, this string is converted into XML.

For my first version of the conversion to streams, I did a similar process – except using streams. The process executor would write the standard out and standard error to a stream. Once this had finished, it would then go through and convert the results into XML. This is what I used to check the proof of concept was working.

However, this was still a two stage process – generate the results and format them. There is no reason why this is required – it should be possible to generate the results directly as XML – which is what I have done.

Most of the changes to handle this were done in ProcessExecutor. Previously I had modified the signature for Execute() to take in two streams – one for standard out and the other for standard error. These streams were then wrapped in StreamWriter instances and passed onto the runnable process. When the external application wrote to standard out or standard error, the output was sent to the corresponding StreamWriter, which sent the data to the stream.

My new change was to introduce a new class – XmlStreamWriter. This class wrapped a StreamWriter and overrode some of its methods. The most important method it overrides is WriteLine() – when this method is called it wraps the line in an XML block and ensures the data is valid (e.g. has no invalid XML characters). It also appends a line number and the date/time, just to make the output more meaningful. The other methods that I overrode were to start/end the XML document (since an XML document can have only one root) and to clean up when necessary.

Now, all ExecutableTask does is start two result streams and pass these through the levels to ProcessExecutor. It also tells the logic to wrap these streams in the new XmlStreamWriter class, which does the automatic conversion. And that’s it – nice and easy. the infrastructure does the rest of the work – generating the index entries, writing them to the log file, ensuring the output is XML, etc.

Analysing Code: NDependTask

The NDependTask calls an external application called NDepend. This application performs a static analysis on a codebase and produces a number of statistics about the code, including complexity, cyclic redundancy and interrelationships. The reason I choose this task is it generates a number of files which need to be included in the build results. These files include both XML data and non-XML data (e.g. images). Additionally, since it is an external task it generates standard out and standard error to be included.

The first part of the conversion is easy – this is almost identical to ExecutableTask – initialise the streams for standard out and standard error, pass them through the layers to where the application is actually executed and store the result for later.

It’s the second part that requires some additional changes. After the external application has finished, the task checks for any new files that were generated. These files are then imported into the results.

To handle this I added a new method to TaskContext – ImportResultFile(). This method has a similar method signature to CreateResultStream() but with two important differences – it has a filename argument and it does  not return anything. This method simply generates a new index entry and moves the file to the results folder. Now external files can be included in the results index.

A Simple Path

Now we have a nice simple migration path for tasks to use the new index. For tasks that generate data they can use the CreateResultStream() method on TaskContext. This will generate a new stream for the data, plus the associated index entries. These streams can then be passed into ProcessExecutor for handling the stdout and stderr output.

For tasks that generate files via an external application, the ImportResultFile() method can be used. This moves the external file into the data folder, plus generates the associated index entries. This also offers the choice to move or copy the files so the calling task can tidy up if necessary.

These two methods, either separate or together, handle most of the data generation for the current tasks/publishers. There are a few tasks/publishers that won’t like this approach, mainly because they need to manipulate files (i.e. read/write access), but as these tasks/publisher tend to be outside the norm I’m going to ignore them for the moment.

The next challenge is to look into accessing these new index entries and the associated files. So stay tuned for a review of how we currently use the build logs and how it will change…

Reducing Strings: Additional Task Details

23 November, 2009 Leave a comment

An Extra Feature

In looking at reducing strings (and thus memory usage) I realise we also have another opportunity to enhance CruiseControl.NET. A while back CruiseControl.NET was enhanced to display current task status. This would display a list of all the tasks for a build, and the current progress through the tasks. However, this information is only ever stored in memory – as soon as the server restarts or another build is triggered (even if it is just a source control check) the information is lost!

With the new changes it is easily possible to persist this information – plus we can go one better and associate the task output with the task information :-) So in this post, I’ll look into how these changes work.

Currently

When the task outputs are added to the results they are added to an array (this is all done in the task context). These are accumulated per context – and when a context is completed the outputs are merged into the parent context. When the build is completed, all the outputs are written directly to the build log, one element per output.

Each output has a number of attributes – task type, result name, task identifier and data type. Some of these are used to identify the output, while other attributes identify the task that it is associated with.

The Changes

The changes fall into two basic categories: first the task context now has the task level attributes (task type and identifier) and the outputs are now associated with a task element in the build log. To handle this, I have split TaskResultDetails into two classes: TaskResult and TaskOutput. TaskResult holds the task level attributes, plus the outcome of the task, while TaskOutput holds the output details.

The task context has been expanded to allow the task type and identifier (currently the task name/description) to be set. At the same time, a new TaskResult is generated for the context (remembering that a context is being generated per task). To enable linking between the task status and the outcome to be stored, the context now requires an IIntegrationResult instance. The status will be set here (it is currently) and the context will pull the status when the context is finalised.

The ImportResultFile() and CreateResultStream() methods have been modified to generate TaskOutput instances now. These instances get stored in the associated TaskResult for the context.

Finally, TaskResult now contains a list of child TaskResult instances. When a context is merged into a parent context the associated TaskResult is appended to the child instances of the parent. When the project-level context is finalised, the associated TaskResult is what is serialised.

Serialisation

Both TaskResult and TaskOutput have a WriteTo() method that takes in an XmlWriter. When these methods are called, they will output the properties of each instance to the writer, and then call the WriteTo() method on any child items (either TaskResult or TaskOutput). To ensure this is written to the build log, XmlIntegrationResultWriter is responsible for calling this method on the project-level context.

The following XML is generated:

<build date="2009-11-12 03:37:37" buildtime="00:00:00" buildcondition="ForceBuild">
  <result type="project" identifier="Test" outcome="Unknown">
    <result type="exec" identifier="exec #1" outcome="CompletedSuccess">
      <output file="D:\Open Source\CruiseControl.NET 2.0\project\console\bin\Debug\Test\Artifacts\233e1e69d-f0ab-441c-bc6f-030e52b61f58.data" name="stdout" data="data/xml" />
      <output file="D:\Open Source\CruiseControl.NET 2.0\project\console\bin\Debug\Test\Artifacts\23\2d1a5ede-b2c8-432e-bf87-af01a75ebd2d.data" name="stderr" data="data/xml" />
    </result>
    <result type="exec" identifier="exec #2" outcome="CompletedSuccess">
      <output file="D:\Open Source\CruiseControl.NET 2.0\project\console\bin\Debug\Test\Artifacts\23\7f9f9bfe-2d5d-43b1-be1f-a61303701b72.data" name="stdout" data="data/xml" />
      <output file="D:\Open Source\CruiseControl.NET 2.0\project\console\bin\Debug\Test\Artifacts\23\1b65cf49-4902-441f-848b-c2f22234d6b7.data" name="stderr" data="data/xml" />
    </result>
  </result>
</build>

Now it is easy to see which tasks ran, what the outcome of the tasks were and what output they generated.

 

In Conclusion

This (hopefully) concludes the changes to the build engine on the server for the stream-based output. In the next set of posts I will delve into using the new output.

Stay tuned…

Reducing Strings: XmlLogPublisher is Dead, Long Live XmlLogPublisher

21 November, 2009 Leave a comment

Good Bye XmlLogPublisher

As part of the changes for implementing the task context I am removing XmlLogPublisher. This is to provide a standardised way of generating the log – while still exposing the required functionality to the other tasks that need it.

In the current version of CruiseControl.NET users need to add an xmllogger element to their project configuration (preferably in the publishers section, but there is no reason why they can’t put it in the tasks section). When this task is run it will generate the log file, which can then be used by sequent tasks. However, to complicate things, some tasks generate their own version of the log file (normally in memory) – e.g. StatisticsPublisher and EmailPublisher. Plus if the user adds a publishers section and forgets to add the xmllogger element, then no log will be written!

Note: if the user does not define a publishers section, a publishers section will be generated by default that contains the xmllogger element.

So, to standardise and simplify things, I am moving the functionality from XmlLogPublisher into TaskContext. This functionality will be called automatically by Finialise() in TaskContext (this method is called when an integration finishes).

Currently XmlLogPublisher generates the folder for saving the log in, generates the name of the log file and then writes out the log (via XmlIntegrationResultWriter). I have moved these into TaskContext as separate methods (as GenerateLogFolder(), GenerateLogFilename() and WriteCurrentLog()). The Finalise() method calls all of these, but I have exposed them as public methods so they can be called from other locations.

Additionally, I made a small change to XmlIntegrationResultWriter. Because I want to write the current result entries out via the writer, I changed the constructor to take in an enumeration of TaskResultDetails. I also changed the WriteTaskResults() to output these results (previously it was outputting the data from the tasks). This required changing the statistics and e-mail publishers to use TaskContext instead of XmlIntegrationResultWriter. Unfortunately it also means these are both breaking at the moment, as they are expecting a valid log file to be generated, and it has now been changed to only have index entries! In a future post I will look at modifying these so they work again :)

Finally to round out this change, I added another new method to TaskContext. This method (GenerateResultsSnapshot()) will generate the current snapshot of all the results. This breaks the normal working of TaskContext, in that it allows a child context to access a property on the parent context. This means that property (the list of index entries) now needs to be locked before it can be accessed. I did consider using the reader/writer lock instead of the general monitor lock, but it sounds like the .NET 2.0 version is flawed, so I decided to avoid it.

So, in saying farewell to XmlLogPublisher, I have moved most of its functionality to TaskContext. I’ve also expanded both TaskContext and XmlIntegrationResultWriter to handle writing the index entries to the log file.

Hello XmlLogPublisher

Having removed most of the functionality from XmlLogPublisher, I rebuilt it to use the new functionality from TaskContext. Why? So it is still possible to generate a build log part way through the build process. This is required for those tasks/publishers that want to do something with a physical build log file. For example, the FTP publisher might want to send the build log to a remote machine, the package publisher might want to include it in a package, etc. The new version allows this functionality.

However, I also expanded some other functionality to make this even easier. As part of the changes in removing the functionality, I removed the log directory configuration setting to Project. While the new XmlLogPublisher still has this setting, I wanted to expose the project-level setting to the publisher. Currently tasks/publishers work in isolation – they don’t know anything about the project that owns them (beyond what is passed about in the IIntegrationResult.) So, I wanted to expose project configuration settings, but in an immutable way.

To do this, I have added a new class called ProjectConfiguration. This class exposes a number of the projects from Project in an immutable form. These are loaded from the project when the class is initialised and are read-only. I then added this new class as a property on TaskContext. Now, any task with an associated TaskContext can access these project configuration settings. At the moment there are only a few settings, but over time I will expand ProjectConfiguration to include most of the configuration settings (yes, it is a work in progress!)

So, now the new XmlLogPublisher can use its own log folder, the project’s log folder (if set) or the project’s artefact folder. At the same time, other tasks can now access project settings in a safe, immutable way.

So, that’s the first set of changes – XmlLogPublisher has been changed to fit the new structure, and the index entries are now being written to the disk. In the next set of changes I’ll look at modifying some tasks/publishers to use the new indexing methods.

Stay tuned…

Reducing Streams: Stream Metadata and Indexing

19 November, 2009 Leave a comment

Data about Data (and so it goes…)

In my last post (here) I reviewed all the tasks and publishers currently implemented. As part of the review there were a number of tasks/publishers identified that generated output that is not currently included in the build log. As part of the changes, we want to be able to handle these files with a standard process as well.

In general, the strategy is to let each task write directly to disk, instead of writing to memory (in a string). It is the task’s job to decide what to write and how. In an even earlier post (here) I introduced the concept of a TaskContext. This context is responsible for starting new streams and storing references to them. While this concept is still valid, it needs a couple of extensions.

My Initial Plan

In my initial plan I was going to let the task choose the file name. The task would call an open stream method on the context and tell the context the file name and a result type name. The context would then open a new file stream using the file name in a folder for the build. If the file name was already in use, the context would generate a unique name (normally by inserting a literal and number into the name). To handle multi-threading, each context required its own folder, and when a context was finished, the owner would then move all the files from that folder into its own folder, again ensuring uniqueness of the names.

This added some extra complexity, just to produce easily readable file names. It did not guarantee that a file name would match a the expected task (i.e. there could be a nunit.xml that did not come from the NUnit task). The task type name was only stored in the index file, and the index file name was not unique (it would overwrite previous versions!)

Additionally, the context only handled generating new streams, it didn’t handle importing files generated externally (e.g. output from external applications). These would still exist outside the generation process, and they were not included in the index file.

Finally, to handle the duo-stream output from external tasks, I had put a stream merging process into place. This would literally take two (or more) streams and merge them together, plus it also converted the streams into an XML format (e.g. one element per line, with a line type per element). Nice maybe, but of questionable value since it adds additionally processing overhead!

Changing the Process

In retrospect I made things harder by trying to be nice and also by having the entire indexing outside of the current way of doing things (since this would also have a minimal impact on the current way of doing things.) Now, I think that was a bad decision – these changes are going to be breaking anyway, so lets try and do things properly!

First, unique file names: the name of the file needs to be unique, no matter which task generates it, whether there are duplicates of a task type, or multiple files of the same name being merged. The uniqueness needs to be enforced even when different threads generate files, and whether the files are generated internally or externally. Rather than inventing a new process for unique names I’ll use a tried and true method – GUIDs. When a new result stream is opened the physical name of the stream will be a GUID – no more, no less.

The index entries for each stream will contain the necessary data to identify what each file is for and the type of data contained. These metadata items are:

  • Task type: the type of task that generated the result. This will be the name of the task in the configuration (e.g. nunit, msbuild, nant, etc.) and it will be up to the calling task to pass in.
  • Result name: this is the internal name of the file. Examples would be stdout, stderr, test results, etc. Again, it is up to the task to pass in this name and it should be unique per task type (although not sure how to enforce this!)
  • Task identifier: this is an identifier for the actual instance of the task. Again, this will come from the calling task and could be something like the user-configured task name, an internally generated identifier or the date/time. Basically, it is used to group streams together within an actual instance of a task.
  • Task order: this is the order in which the result was generated. The context will generate this number and associate it with a result. When a child context is merged, the owning context is responsible for ensuring the order numbers are unique. As such this identified the order in which tasks are completed.
  • Data type: this is the type of data the file contains. Ideally, this will be the mime type of the data, so it can be used in serving the files via the dashboard. Again, this will need to come from the calling task.

Creating Index Entries

Now that I know what I want to store, and who is responsible for populating streams, the question is how are these index entries and streams generated?

Based on the sources of data, there needs to be two ways of generating the indexes. First, there are the internally generated entries. In this scenario the task will request a stream and pass in the required metadata. The context will generate the stream and at the same time generate an index entry. After this, it is the responsibility of the task to manage the stream (including closing it!)

The second scenario is an externally generated file. In this case the task will tell the context where the file is, plus the required metadata. The context will then import (move/copy?) the file into the build folder and add the index entry. The task does not need to do anything else.

And Finally, Some Context

I’ve written a bit about the context and some of how it works, but I still haven’t covered how it is generated or how it writes the indexes.

An initial context will be generated when a project starts building. At the moment this will be associated with the Project, although later on I may move it (especially when we come to splitting the context from the configuration). As each task is run, a child context is started for the task. The task then runs within this context, generating its own indexes. If the task starts child tasks (e.g. parallel task, sequential task, etc.) it must generate a child context for each child task.

After a task is completed, the associated context is finalised (either by the project or the parent task). Now an important point – each task or project only works with one context, and no other task or project will change that context! This ensures that the context will not be messed up by two tasks trying to modify the context at the same time. There is only one exception to this rule – when a context is finalised. At this time the context is locked so the owning task cannot use it anymore and all the index entries are transferred to the parent context.

If the context does not have a parent context (i.e. it is the project-level context), it will generate the final index. This index will then be included in the standard build log. This will require a slight shift in the way the build log is generated. Currently the build log is generated by the xmllogger task, which means it can be generated at different points in a build. This task will be removed from the available tasks, and the build log generated automatically at the end of the process.

At this point you are wondering how can publishers access the build log? For example, the e-mail publisher may want to generate an e-mail containing the results of the build, the FTP publisher might want to send the log to a remote machine, etc. This will be handled in two ways:

  1. Each task context will have access to the current list of indexes. This will need to be accessed in a thread safe way, just in case the user has configured a multi-thread project. Either way, the task context will have access to the entire set of index entries from its parent and all ancestors, all the way to the project context. This means the task will only be able to access index entries from finalised task contexts.
  2. At will be possible to generate a build log at any point. These build logs will only be in-memory, but will contain all the current build details, plus the current list of index entries (see point 1. above). At the end of a build this will be the same file that is generated for the overall build log. The new build log structure will replace the build results that currently exist with the index entries.

Time To Work

This describes my plan on how I will implement things. Coming up next, some actual implementation details.

Reducing Strings: The Current Situation

18 November, 2009 2 comments

What’s Currently Happening

Before I move onto converting the current tasks from using strings to streams, I thought it would be a good time to review the existing tasks and publishers. Looking at the tasks they can be divided into four categories:

  1. Those that don’t generate any output
  2. Those that generate output in the build log
  3. Those that generate external output
  4. Those that generate both

The following tasks do not generate any output:

  • ArtifactCleanUpTask
  • ConditionalPublisher
  • CruiseServerControlTask
  • EmailPublisher
  • ForceBuildPublisher
  • ModificationReaderTask
  • NullTask
  • ParallelTask
  • SequentialTask
  • SynchronisationTask

Since these tasks do not generate any output, we don’t have to worry about these tasks (yeah!!)

The second category are those tasks that generate output for the build log. The following tasks fall into this category:

  • DevenvTask
  • DupFinderTask
  • ExecutableTask
  • FinalBuilderTask
  • GendarmeTask
  • MsBuildTask
  • NAntTask
  • NCoverProfileTask
  • NUnitTask
  • PowerShellTask
  • RakeTask

These are the tasks that I have been primarily concerned with, as they currently generating strings. These strings are then concatenated into one massive string before being written into a file. Since the build log is an XML file, all the strings are XML’enised before they are concatenated. This means if the string is valid XML, it is concatenated directly, otherwise it is wrapped into a CDATA section.

These tasks can work in two general ways. First, they can generate output that is stored in memory as strings (e.g. output from stdout or stderr) – these strings are not persisted to disk at all. Second, they can generate output that is persisted disk, which later gets read into memory and then appended into the log file.

The third category is those tasks that generate individual files directly. These tasks are:

  • BuildPublisher
  • FtpTask
  • ModificationHistoryPublisher
  • ModificationWriterTask
  • PackagePublisher
  • RssPublisher
  • StatisticsPublisher
  • XmlLogPublisher

These tasks all generate output, but they are responsible for storing the files. They also require custom methods to retrieve the data (normally implemented in CruiseServer and/or Project). Plus, these files can be XML or non-XML (e.g. images, HTML, etc). And as a final complication, these files can read/write – i.e. the task may re-open an existing version of the file, add some more output and save it again.

These tasks open a second area for improvement – providing a standardised way for all output to be stored. But I’ll write more about this soon.

The final category generates both data for the build log and individual data files. These tasks are:

  • HttpStatusTask
  • MergeFilesTask
  • NCoverReportTask
  • NDependTask

These tasks basically have both data for the build log and individual data files. The log data is all standard XML, while the non-log data can be anything!

The Strategy for Standardisation

Currently I am changing to use a TaskContext for generating output streams and indexing these streams. This approach can also be expanded to handling the external output files.  There are two main differences between the two: external files can include non-XML data and they can also modify existing files. To simplify things, I decided on to exclude the tasks that modify existing files. These tasks are:

  • ModificationHistoryPublisher
  • ModificationWriterTask
  • RssPublisher
  • StatisticsPublisher

The reason why is very simple – the tasks that edit existing files have a set location for the files. These files are managed outside of a build, i.e. they are not build-related. In contrast, the output form the other tasks are specific to the build they were generated in. Once the build has finished, they are never modified (at least in theory).

To handle the XML/non-XML data, I’ll add an extra attribute to the index – the data type. This will be set by the task that generates the index. This will be the standard mime types, although it will involve some work in setting these properly.

Executable Tasks

A second way that tasks can be categorised is whether they execute external application or not. Most (but not all) of these tasks inherit from BaseExecutableTask. To actually execute the application they normally call TryToExecute(), which returns a ProcessResult. The current implementation of this class contains a string containing the standard output and a second string containing standard error. It is then up to the calling task what it does with the results. Currently the results are handling in three ways:

  1. The standard output and standard error strings are XML’enised (as above) and concatenated to the build log.
  2. The standard error only is XML’enised and concatenated to the build log.
  3. Both standard output and standard error are converted into XML lines (all XML reserved characters are converted) and then concatenated to the build log.

The following tasks fall into this category that inherit from BaseExecutableTask:

  • DupFinderTask
  • ExecutableTask
  • GendarmeTask
  • MsBuildTask
  • NAntTask
  • NCoverProfileTask
  • NCoverReportTask
  • NDependTask
  • PowerShellTask
  • RakeTask

Additionally NUnitTask calls an external application, but it does not inherit from BaseExecutableTask. This task should also be converted to inherit from BaseExecutableTask, so I will include this task in the list of changes.

The subtle difference between how the output is concatenated means my original approach needs to be revised. Now I’m planning on modifying TryToExecute() so it takes into two streams as arguments, and it is up to the calling task on how these streams will be generated and how they will be processed after the task has executed.

This leaves me with a general outline of how to proceed. Next I will take a look at the approach I’ve implemented and how it affects the existing tasks.

Dashboard Caching

1 October, 2009 1 comment

Previously

In my last post (read it here) I took a holistic approach to reducing memory usage on the (server) machine. In it I suggested three items to reduce memory usage:

  1. Caching on the dashboard
  2. Caching on the server
  3. Compressing the build logs

In this post I’ll look at implementing the dashboard caching.

To help with the recap, here is a diagram from my last post:

image

Basically this is stating most of the views to a log will be to the most recent (failed) build, less views to the last successful build and even fewer views for historical builds. This means most of the effort can be focused around caching a very small set of builds.

Introducing Caching

For caching, the first question is what should be cached.

When a request is processed for a build report it goes through this general process:

image

The request comes in, the process retrieves the log from the server, generates the build request (the actual report) and returns it to the client. I’ve put the generation into a black box, because that’s how I’m going to treat it (it is a complex process within the box). With this simplified view, there are two obvious cache options:

image

When the process needs a log, it can check the log cache first. If there is a log there, then this log will be used – otherwise it will fetch the log from the server, store it in the cache and then continue.

The second cache is to bypass the entire process – to cache the result and use it. When I investigated the dashboard further I discovered that build reports are already cached (this is via an action proxy – CachingActionProxy.) This proxy action caches each build response into the cache using the raw URL of the incoming request. Since this is working at the moment, I’m not going to change it. I’m not sure how this will work with security in place – so this will need to be investigated.

Before I delve into the implementation, there is an issue that needs to be considered. Retrieving a log can potentially be long-running processes (i.e. it takes more than a few milliseconds to perform.) The last thing we want is multiple users attempting to access a log almost simultaneously and trigger the same process multiple times! Because of this, the design needs to take into account potential race conditions.

Fetching and Caching the Log

The build logs are loaded in BuildRequestTransformer. This is the class within the dashboard process that is responsible for fetching the logs and generating the results – although it passes on these tasks to two very different places.

Within this class there is one method that needs to be modified – Transform(). This method currently looks like:

string log = buildRetriever.GetBuild(buildSpecifier, sessionToken).Log;
return transformer.Transform(log, transformerFileNames, xsltArgs);

Since I am changing the way the logs are retrieved, I’m going to change it to:

var log = this.RetrieveLogData(buildSpecifier, sessionToken);
return transformer.Transform(log, transformerFileNames, xsltArgs);

Of course, all the fun happens in RetrieveLogData() (the rest is just tidying things up.)

So, what happens in RetrieveLogData()? First, a unique cache key is generated. This is the combination of the server, project and build names, plus the user’s session token (if there is one). The session token means we’re going to a per-user cache, but unfortunately it is required since each user could potentially see different information.

Once the key is generated, the method checks if there is already a log under that key – if so it will use that log. If not, it generates a new instance of a SynchronisedData, adds this to the cache and then loads the log from the server.

And that’s all there really is to it – nice and simple.

What about SynchronisedData?

You might have noticed I have added a new class – SynchronisedData (or you might have just assumed that it already existed – I’m good with either.) This class is responsible for ensuring that only one instance of the data exists (or at least tries to.)

Basically, this class allows data to be added to the cache before it is retrieved. Internally it uses a ManualResetEvent to let any other threads know when it is safe to use the data. There is of course a catch – if a thread does not initialise the instance (e.g. retrieves it from the cache), it must call the WaitForLoad() method before retrieving the data – otherwise there might not be any data to use.

Otherwise this class is very simple, it has a single property – Data, plus a couple of methods – WaitForLoad() and LoadData(). WaitForLoad() just makes sure it is safe to use the Data property, while LoadData() allows the caller to load some data into the property.

The caller that initialises the instance is responsible for loading the data via the LoadData() method, all other callers need to use WaitForLoad() to ensure it is safe to use the data.

A Final Word on Caching

Someone might look at just this post and ask why am I suggesting caching as a solution for memory usage? Caching generally increases memory usage, since a process is generating and storing items, instead of just generating and disposing them.

And I have to agree. I expect memory usage to go up on average with these changes. The key phrase in that sentence is “on average”. The basic problem I think is happening is we are getting memory spikes. A number of people are trying to access the same log at the same time – therefore the server and the dashboard are generating multiple copies of the same log and then generating multiple copies of the same report.

Caching solves this problem by exposing the same log and the same report to all users, rather than a copy of the log and report. It does open some possibilities on race conditions (hence the locking code) and also increases average memory usage. But it should (hopefully) stop out of memory exceptions – which can really ruin someone’s day :-)

And a Final Comment

There was a comment on my last post about clearing the cache when the configuration changes. For the log files, this is not an issue as it will be the same log (always, no matter what, logs NEVER change!) But, the build reports might change. Since this is handled by an already existing handler, I’m not sure whether it will handle config changes (my guess is it won’t.)

In my next post I’ll look at making some changes to the server, so stay tuned…

Cutting Down the Memory

30 September, 2009 5 comments

The Problem

This is a continuation on my earlier post on memory issues (read it here). In that post I made some minor changes to reduce memory usage (mainly around string handling). However those changes didn’t fix the underlying problem – CruiseControl.NET is working with large strings in memory.

Now normally the OS and .NET will allow us to play around with large objects in memory without too many issues. Normally…

Sometimes it is possible to use up all the available memory (including swap files, etc.) Why? Because we put too many big strings into memory – and that’s exactly what CruiseControl.NET does. And to make things worse, CruiseControl.NET likes to compound things by the way it works.

So, how does CruiseControl.NET work?

First off, we have the server. This is basically a polling application – every five seconds it checks if a build can start and if so, triggers the build (actually it’s a little bit worse than that, each project has it’s own thread which polls every five seconds, but that’s another matter altogether.) As the build runs it builds up a set of task results, which then get merged together into one big log file:

image

So, it is possible to have multiple projects, each having multiple large task results in memory. Then, when these get merged into the build log, that’s even more memory used. But wait, there’s more!

As well as the project threads, the server is also responsible for serving results to anyone who requests one of these build logs. This is done by loading the log into memory and then returning the entire log (via .NET Remoting) to the client:

image

So, as well as having multiple task results and multiple build logs in memory, it is also possible to have multiple instances of the build logs – one per request handler (although in theory these would be quick requests and shouldn’t hold onto memory.) And yes, there’s still more!

The CruiseControl.NET server only allows .NET Remoting clients – which isn’t very helpful if you need to go through a firewall. Plus, people need some way of seeing the results from the tasks – CCTray doesn’t display them. So to handle these situations there is the Web Dashboard. This functions as a client application and does all sorts of things – including manipulating build logs in memory (normally an XSL-T transform):

image

And where does the Web Dashboard normally sit? On the same machine as the server (unless you’ve changed the default install and installed the dashboard on a separate machine).

So, this gives us a picture of having large strings in memory in several places. The project threads might be okay by themselves, but when you add the communications side, it is very easy to start consuming large amounts of memory with big build logs. Especially when there are a sizeable number of people trying to get results from a build (and what’s the first thing people do when they see a broken build – they go to the server to see what’s broken!)

Pruning Some Memory

Now that we’ve quickly reviewed the possible places where build logs are stored, what can we do to reduce the amount of memory?

Two initial approaches come to mind:

  1. Reduce the size of the log file
  2. Reduce the number of times each log file is loaded into memory

Option #1 can be done either in CruiseControl.NET or externally. First, we can split the mega-log into smaller logs, second we could write less data, third we could compress the data.

I’ve been looking at what I can do to split the mega-log file into build result specific log files, so a client could just grab the data they need (e.g. just the NAnt/NUnit/MSBuild/etc. results) instead of everything at once. Unfortunately this is a MASSIVE change, that will need to modify both the server and the clients in a large number of places. So for the moment I have spun off a “play” branch in the Subversion repository where I will attempt to do this (oh, and resolve all the associated problems with having multiple result files.) So this is a no-go for the current release.

Next, writing less data – well CruiseControl.NET doesn’t put that much CruiseControl.NET-related data into the log files. Most of the data comes from the external tools. So if this needs to be done, then people need to reduce the amount of data the tools generate – hence this becomes an external task.

So, that leaves compressing the data. Personally I like this approach as it would also reduce the amount of network traffic. Unfortunately there are a couple of gottas – there is no guarantee that compression would actually save space and it requires changes to the clients to decompress the data. But I still think the idea has merit.

Returning back to the two initial approaches, the other approach is to reduce the number of times each log is loaded. This is the approach I want to investigate some more.

Log Viewing Usage Patterns

From my experience with CruiseControl.NET, there are two general usage patterns:

  1. THE BUILD HAS BROKEN – what’s wrong?
  2. What happened to an older build?

And typically the second pattern also happens a lot as part of pattern #1 when people try to find out what has changed to cause the build to break. Very rarely do people go and look at a historical build.

The following picture sums up this pattern, with the size of the arrow indicating how much usage a build log would get:

image

So, most people only view the logs when something is broken and they primarily focus on the log of the build that actually failed. A smaller set of people would also check the previous successful build to see if there is anything pertinent to the failure. Finally a very small set of people might go through older logs to see what has happened (your manager is checking up on the amount of work you’ve been doing, etc.)

This sounds like a very good scenario for caching. Since there is a (potentially) small amount of data, this could be cached on the server, the dashboard or even both.

Bring on the Cache(s)

On the server side, we could just cache the log files – although we wouldn’t want to cache too many big ones.

The dashboard offers us a few different possibilities for caching. First, we could take the same approach as for the server and cache the the log files. Second, we could cache the parsed XML documents (since the entire document must be loaded and parsed before it can be transformed). Finally, we could cache the transformed output.

However, like anything in the dashboard, the way the build logs are transformed is not a simple process. Instead we have a number of interfaces and their implementations to go through. The build report generation is a plug-in, which generates multiple actions. Each action can have one or more style sheets, which actually get loaded and processed by a class in the Core library.

To Be Continued…

I’ve gone back and reviewed the memory usage scenarios for CruiseControl.NET with a view of reducing the usage on the server. This took a step back and viewed the wider picture, with a view of all the items that CruiseControl.NET is doing on a (server) machine. Unlike my previous post, this takes into account multiple users doing similar things at the same time.

I’ve come up with a couple of ideas for reducing memory usage – caching and compression – and both of them have some gottas that need to be investigated.

So rather than making this an even longer post, I’m going to stop at this point and look into caching and compression in some future posts.

So stay tuned, more fun to come :-)

Follow

Get every new post delivered to your Inbox.