Home > CruiseControl.Net > A New Task: Finding Duplicates

A New Task: Finding Duplicates

Constant Improvement

As well as the core engine for CruiseControl.NET, there are a number of tasks that call external applications. These include tasks that perform:

  • Code analysis
  • Unit tests
  • Run build scripts

And so on. It is very easy to integrate an external application with CruiseControl.NET – all it needs is a command-line interface.

As part of my job I am constantly looking for new tools and utilities that can improve the CI process at my work. When I see new tools or utilities I’ll give them a try. And slowly I build up a list of tools that I think would work nicely with CruiseControl.NET.

One of the acronyms that gets mentioned a lot now-a-days is DRY – Don’t Repeat Yourself. Of course, in theory this is a simple principle, but when the code base starts expanding, it becomes harder and harder to see duplication. To get around this, there are some tools that can be run to detect duplication.

For a long time I haven’t bothered with this analysis as most tools are commercial (and therefore cost money that I don’t have), but not that long ago somebody mentioned DuplicateFinder. This is a little command-line tool that will scan a codebase and report any duplicates.

So I decided to take a break from reducing memory and add in this tool as a new task.

Some Initial Decisions

DuplicateFinder consists of three files:

  • The library that performs the analysis
  • A command-line application
  • A MSBuild task

So the first decision is which should I use? For pure speed, the library is the best approach, but this would require adding a new library the project, one that would only be used for a single task! And the MSBuild task is only useful within a MSBuild script.

So in the end I decided to call the application – even though this will be slightly slower and require an external dependency. But it means if people don’t want this analysis, they don’t have to have the binaries.

DuplicateFinder has three output options:

  • Console – text
  • Console – XML
  • XML file

So, the next decision is which output to add, and in the end I choose console – XML for two reasons. First, CC.NET uses XML for its reporting, so this kept it nice and easy. And writing to the console means there is not an extra file floating around that needs to be cleaned up later.

After these decisions, it was easy to add the new task.

The New Task

The new task is called dupfinder. It has the following parameters:

Name Description Type Required Default
executable The path to the dupfinder executable. String No dupfinder in the working folder
inputDir The folder containing the files to scan. String Yes n/a
fileMask The file mask to scan for. There can be multiple values here separated by a space, e.g. *.cs *.vb. String Yes n/a
focus The name of a file to focus on – all other files will be compared against this file. String No none
timeout The period, in seconds, before the task will timeout. Integer No 600
threshold The number of consecutive lines that have to be the same before an item is reported as a duplicate Integer No 5
width The minimum number of non-space white characters in a line to be matched. Integer No 2
recurse Whether to check all sub-directories (true) or just the selected folder (false). Boolean No False
shortenNames Whether to remove the input folder name from the file names (true) or not (false). Boolean No False
includeCode Whether to include the lines of code that were duplicated in the results (true) or not (false). Boolean No False
excludeLines Any lines to exclude from the analysis. String array No none
excludeFiles Any files to exclude from the analysis. String array No none

And here is an example of how to configure it:

<dupfinder recurse="true" width="5" shortenNames="true" includeCode="true" timeout="1200">
  <inputDir>C:\...\Trunk\project\remote</inputDir>
  <fileMask>*.cs</fileMask>
  <executable>C:\...\Trunk\Tools\DuplicateFinder\DupFinder.exe</executable>
  <excludeLines>
    <line>using System.*</line>
  </excludeLines>
  <excludeFiles>
    <file>AssemblyInfo.cs</file>
  </excludeFiles>
</dupfinder>

Some Extras

DuplicateFinder provides a nice simple XML output, but there were a couple of changes I wanted to it. First, the filename contained the entire path of the files. In CC.NET, the full path will point to somewhere on the server, with individual developers potentially having a completely different location.

So I have added the shortenNames parameter to the task. This will load the output XML and trim the filenames. This just iterates through all of the filenames and removes the input path from the name.

Secondly, the output didn’t tell me the code that was duplicated – it just has the starting position and the number of lines. To see the duplicated lines I need to open one of the lines, navigate to the lines and then look.

To get around this I added the includeCode parameter. This will add the lines that were duplicated into the results, so in the dashboard I can easily display them.

The Dashboard

Since this is an analysis type task, I have also added a couple of reports to the dashboard. First, in the project summary there is a summary of the analysis:

image

Clicking on the “Duplicate Finder Report” will bring up the detailed report:

image

Yes, nothing fancy, but it does allow someone to quickly see which files have duplicates and which lines are duplicated.

And to make it even easier to install, I have put together a dashboard package that contains the necessary files and settings. All an administrator needs to do is install the package and the report will appear in the dashboard :-)

And Finally, the Why?

Now, any application that can be run from the command-line can be run from a NAnt or MSBuild script, or even via a exec task, so why do I add these new tasks to CruiseControl.NET?

One reason – integration! Adding a task directly to CruiseControl.NET means we can do some nice things with it. For one we can include the progress of the task in the project status – it’s nice to see that we’ve finished the build and are onto unit tests or code analysis, etc. Second, it means only one task – instead of a build task and a merge results task.

And finally, as I have done with the duplicate finder task, we can extend the basic functionality to include new functionality that will add value.

So, if you know of other tools that you think would be nice in CruiseControl.NET, let me know and I can investigate adding them in :-)

About these ads
Categories: CruiseControl.Net Tags:
  1. 23 October, 2009 at 9:37 pm | #1

    nice job !

    But we must make sure that there are not too many of these tasks in CCNet core. All these tasks are also maintenance :-(
    As long as it is calling exe A with arguments X Y Z that should not be a big problem.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: