Better Code

Swimming in Someone Else's Pool

Duplication Considered... OK, Actually

| Comments

When writing a test, sometimes the only way to initially write the test is by duplicating the implementation code in the test to assert that the expected value is returned. This can happen for a number of reasons, but often arises because you are trying to test what is being performed, rather than how it’s being done, especially where the details of what’s being done are complicated and out of your control.

For a slightly contrived example, let’s say that you’re hashing a document to provide a checksum, and you want to check that a large file hashes correctly:

test/test_document.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
require 'test/unit'
require 'document'

class TestDocument < Test::Unit::TestCase
  def setup
    @file_content = File.read("test/fixtures/file-to-checksum.txt")
  end

  def test_large_file
    correct_checksum = Digest::SHA1.new.update(@file_content).digest
    doc = Document.new(@file_content)
    assert_not_nil doc.checksum
    assert_equal correct_checksum, doc.checksum
  end
end

Now, the implementation is almost trivial:

lib/document.rb
1
2
3
4
5
6
7
8
9
10
11
require 'digest/sha1'

class Document
  def initialize(content)
    @content = content
  end

  def checksum
    Digest::SHA1.new.update(content).digest
  end
end

This looks clumsy, and feels wrong when you write it, but it’s actually fine, and you shouldn’t worry about doing it.

It feels wrong because it seems that if you are simply duplicating the same code in two places, then you’re missing out on the benefit of a test: if you have the same code in two places, how can that possibly be a useful check on the correctness of that same code? It’s true, directly duplicated code like this doesn’t check that what it is doing is being done correctly. That’s not the only function of test code, though.

One of the functions of a good test suite is to give you confidence that a refactoring hasn’t broken anything. Tests which duplicate the implementation give you that confidence: you know that if you find a better implementation later, you can swap it in on the implementation side, and the test will tell you that you’ve done it correctly.

In our case, let’s say that for whatever reason, at some point in the future it makes sense to switch from Ruby’s Digest library to using OpenSSL directly. We can swap out the implementation code like so:

lib/document.rb
1
2
3
4
5
6
7
8
9
10
11
require 'digest/sha1'

class Document
  def initialize(content)
    @content = content
  end

  def checksum
    OpenSSL::Digest.digest("sha1", @content)
  end
end

The test still passes, so we know that our checksum is still valid.

In this specific case, rather than duplicate the code, I would probably choose to serialise the digest value in the test file. In general, though, the expected value you need to assert on won’t be easily serialisable, and this technique is particularly useful in those cases.

Tests aren’t just there to help you write correct code. They also help you keep code correct over time. As long as your duplicated-code test isn’t the only coverage of the relevant functionality, it’s not something to worry about in the first instance, and it certainly isn’t a reason to leave the test out.

Comments