Reducing Strings 3: Some Preliminary Results
Posted by Craig Sutherland on 16 October, 2009
Introduction
This is the third post in converting CruiseControl.NET from using string to streams. The other posts are:
In this post I’m going to take another slight detour from the planned changed. The reason – to see whether the changes are actually working!
The reason why is pretty simple – one of the other devs on the project asked for some proof that my changes would reduce memory usage. Now my rough tests and knowledge of the system said they would, but I thought I should do some proper tests to prove it. And here they are!
Methods
The aim of this set of tests to see whether memory usage is reduced with the new stream-based processing. Since I’ve only converted one task at the memory, all my tests will use that task.
First I put together a small application that would write out “The Preventer of Information Services” to standard output 100,000 times. Using the same test, but writing to a file, generates a ~5 Mb file – not too small and not too big (previous tests on my machine showed I could handle files of up to ~50 Mb within CC.NET).
Next, I put together a simple ccnet.config that would run the application. I then ran this config using the –p switch to execute the project and stop (ccnet -p=Test). This allows me to quickly and easily run just that task without any confounding tasks.
Finally, I profiled the test using ANTS Memory Profiler 5.1 (see credits below).
I performed the same test with both the 1.5 codebase and the new 2.0 “experimental” codebase.
Results
Here are the results from running in 1.5:
This is looking at the private bytes – this is the amount of memory that has been allocated to the process.
And this is looking at the # bytes in all heaps (Gen 0, gen 1, gen 2 and large object).
There is a slow but steady increase in memory for around two thirds of the time, and then there is a huge massive jump in memory usage around the 2:20s mark. There’s also a jump in memory usage right at the end, but as the application is terminating it only appears briefly.
Overall execution time was around 3:12s. Total memory allocated was in excess of 128Mb, while the actual memory usage was fast catching up!
And now, the results from the 2.0 codebase:
There is a huge difference – in both graphs!
The private bytes only reached 32Mb – a quarter the amount of memory allocated for the 1.5 codebase. And the change is even more dramatic for the bytes in all heaps – I had to use the tooltip to find out how much memory was used – a mere 1.31 Mb at the most. Additionally, this high usage point was right at the beginning of the task executable, as opposed to a spike at the end. Finally, the memory usage is actually dropping at the end of the execution in the 2.0 codebase.
Finally, and this surprised me, the execution time was better for the 2.0 codebase (approximately 2:13s). However this might be because my machine is running at near physical memory capacity anyway, so the extra memory needs to come from the paging file.
Analysis
In the 1.5 codebase there are a number of memory jumps. I think these jumps are being caused by different steps in the process of persisting the results to the log file. The slow increase is due to the test task writing to standard output. The first big jump is when the task is completed and the data is being converted into an XML format. The final jump is when the data is actually being merged into the log file for writing to disk.
In contrast, the 2.0 codebase has a much lower memory usage level. The memory level is staying low because the output is being written directly to disk and then freed. The memory manager in .NET can then reuse the memory instead of needing to ask for more.
The second difference lies in the way the results are converted – this is done on a line by line basis in the new code. In the 1.5 codebase everything is done in memory, thus requiring sufficient memory for both the original string and the new string, plus any memory for working space. In the 2.0 codebase the conversion is done on a line by line basis. The line is loaded from the original stream, converted and written to the new stream. Again, a much smaller memory footprint being needed.
Finally, the spike at the end is eliminated as the data is already in the files, so there is no need to merge everything together (although in theory this could still be done.)
Summary
So, in conclusion, the stream changes have significantly reduced the memory footprint – at least for my simple test. Comparing the two codebases, the 2.0 codebase only requires a quarter of the total memory allocated, and internally less than 1% of the old heap allocations! For those machines like mine with only limited physical memory, the changes also result in faster performance!
Credits
I was able to perform these tests thanks to Red Gate. They provided a free software license of their .NET Developer’s Bundle to allow me to do some performance tuning, and I am very thankful to them for this (it turns out they use CruiseControl.NET, so hopefully this will help them too). Their website (http://www.red-gate.com/index.htm) contains the details on their products, plus a lot of useful information on how to use their tools.
For these tests I used the ANTS Memory Profiler, I haven’t tried the other tools in the bundle yet.
I really liked the way you could see what was happening in real-time, it was very easy to see the memory jumps and tie it into what was happening in the application. They also have a whole heap of additional information that is available – like snapshots of how the memory is being used and what has changed. But for this post the graphs had all the information – in future posts I’ll delve into some of the other information available.
So if you are looking for a memory profiling tool, I definitely recommend taking a look at the ANTS Memory Profiler, it takes the guesswork out of what is happening in memory
RSS - Posts
Dave Cameron said
Wow! Great summary. Really good results too. Thanks for doing this work Craig. It is a big difference in the code, but it will make a big difference for users too.
I ended up doing some profiling with dotTrace. The result files ended up being enormous. And the memory usage was horrendous. The worst bit was actually on the exception path. While one of the exceptions (about being out of memory) was bubbling, the code tried to append “Error: ” to the standard output and, exactly as you described, it resulted in an enormous new string being allocated. Anyway, I guess streams really are the way to solve this.
Bring on 2.0!
ruben willems said
Craig, job well done !
This should be the big fish in reducing the memory footprint.
Now the following questions arise :
° did I understand correctly that all output is written to file now, and not kept in memory with a string.
° what with more than 1 task : how do you prevent that the output of task A does not overwrite the file of task B (both could be nant tasks)
° what when one queries results : before it was load buildlog into memory and to xsl. So maybe on that part we should also look if there is an xsl transform which could work on streams.
I did not have the time yet to look into the code of the branch
too many other things on my head.