A New Task: Finding Duplicates
Constant Improvement
As well as the core engine for CruiseControl.NET, there are a number of tasks that call external applications. These include tasks that perform:
- Code analysis
- Unit tests
- Run build scripts
And so on. It is very easy to integrate an external application with CruiseControl.NET – all it needs is a command-line interface.
As part of my job I am constantly looking for new tools and utilities that can improve the CI process at my work. When I see new tools or utilities I’ll give them a try. And slowly I build up a list of tools that I think would work nicely with CruiseControl.NET.
One of the acronyms that gets mentioned a lot now-a-days is DRY – Don’t Repeat Yourself. Of course, in theory this is a simple principle, but when the code base starts expanding, it becomes harder and harder to see duplication. To get around this, there are some tools that can be run to detect duplication.
For a long time I haven’t bothered with this analysis as most tools are commercial (and therefore cost money that I don’t have), but not that long ago somebody mentioned DuplicateFinder. This is a little command-line tool that will scan a codebase and report any duplicates.
So I decided to take a break from reducing memory and add in this tool as a new task.
Some Initial Decisions
DuplicateFinder consists of three files:
- The library that performs the analysis
- A command-line application
- A MSBuild task
So the first decision is which should I use? For pure speed, the library is the best approach, but this would require adding a new library the project, one that would only be used for a single task! And the MSBuild task is only useful within a MSBuild script.
So in the end I decided to call the application – even though this will be slightly slower and require an external dependency. But it means if people don’t want this analysis, they don’t have to have the binaries.
DuplicateFinder has three output options:
- Console – text
- Console – XML
- XML file
So, the next decision is which output to add, and in the end I choose console – XML for two reasons. First, CC.NET uses XML for its reporting, so this kept it nice and easy. And writing to the console means there is not an extra file floating around that needs to be cleaned up later.
After these decisions, it was easy to add the new task.
The New Task
The new task is called dupfinder. It has the following parameters:
| Name | Description | Type | Required | Default |
| executable | The path to the dupfinder executable. | String | No | dupfinder in the working folder |
| inputDir | The folder containing the files to scan. | String | Yes | n/a |
| fileMask | The file mask to scan for. There can be multiple values here separated by a space, e.g. *.cs *.vb. | String | Yes | n/a |
| focus | The name of a file to focus on – all other files will be compared against this file. | String | No | none |
| timeout | The period, in seconds, before the task will timeout. | Integer | No | 600 |
| threshold | The number of consecutive lines that have to be the same before an item is reported as a duplicate | Integer | No | 5 |
| width | The minimum number of non-space white characters in a line to be matched. | Integer | No | 2 |
| recurse | Whether to check all sub-directories (true) or just the selected folder (false). | Boolean | No | False |
| shortenNames | Whether to remove the input folder name from the file names (true) or not (false). | Boolean | No | False |
| includeCode | Whether to include the lines of code that were duplicated in the results (true) or not (false). | Boolean | No | False |
| excludeLines | Any lines to exclude from the analysis. | String array | No | none |
| excludeFiles | Any files to exclude from the analysis. | String array | No | none |
And here is an example of how to configure it:
<dupfinder recurse="true" width="5" shortenNames="true" includeCode="true" timeout="1200"><inputDir>C:\...\Trunk\project\remote</inputDir><fileMask>*.cs</fileMask><executable>C:\...\Trunk\Tools\DuplicateFinder\DupFinder.exe</executable><excludeLines><line>using System.*</line></excludeLines><excludeFiles><file>AssemblyInfo.cs</file></excludeFiles></dupfinder>Some Extras
DuplicateFinder provides a nice simple XML output, but there were a couple of changes I wanted to it. First, the filename contained the entire path of the files. In CC.NET, the full path will point to somewhere on the server, with individual developers potentially having a completely different location.
So I have added the shortenNames parameter to the task. This will load the output XML and trim the filenames. This just iterates through all of the filenames and removes the input path from the name.
Secondly, the output didn’t tell me the code that was duplicated – it just has the starting position and the number of lines. To see the duplicated lines I need to open one of the lines, navigate to the lines and then look.
To get around this I added the includeCode parameter. This will add the lines that were duplicated into the results, so in the dashboard I can easily display them.
The Dashboard
Since this is an analysis type task, I have also added a couple of reports to the dashboard. First, in the project summary there is a summary of the analysis:
Clicking on the “Duplicate Finder Report” will bring up the detailed report:
Yes, nothing fancy, but it does allow someone to quickly see which files have duplicates and which lines are duplicated.
And to make it even easier to install, I have put together a dashboard package that contains the necessary files and settings. All an administrator needs to do is install the package and the report will appear in the dashboard
![]()
And Finally, the Why?
Now, any application that can be run from the command-line can be run from a NAnt or MSBuild script, or even via a exec task, so why do I add these new tasks to CruiseControl.NET?
One reason – integration! Adding a task directly to CruiseControl.NET means we can do some nice things with it. For one we can include the progress of the task in the project status – it’s nice to see that we’ve finished the build and are onto unit tests or code analysis, etc. Second, it means only one task – instead of a build task and a merge results task.
And finally, as I have done with the duplicate finder task, we can extend the basic functionality to include new functionality that will add value.
So, if you know of other tools that you think would be nice in CruiseControl.NET, let me know and I can investigate adding them in
![]()
nice job !
But we must make sure that there are not too many of these tasks in CCNet core. All these tasks are also maintenance
As long as it is calling exe A with arguments X Y Z that should not be a big problem.