Monday, August 12, 2013

Unit tests: When to Mock, When to Stub

Most of my recent development has been in Java, where we use JUnit4 and Mockito for unit tests. Having transitioned from EasyMock for mocking, I see a lot of the same mistakes when switching to Mockito.

Verifying implementation details instead of outcomes

If you've done a decent amount of mocking, you've probably seen something like the following:


public class ColorPreferenceChanger {

  private final UserService userService;

  @Inject
  ColorPreferenceChanger(UserService userService) {
    this.userService = userService;
  }
  
  /**
   * Updates the logged in user's color preference to the
   * given color.
   */
  public setColorPref(Color color) {
    User user = userService.getUser();
    userService.setColorPref(user, color);
  }
}

@RunWith(JUnit4.class)
public class ColorPreferenceChangerTest {

  @Mock private UserService userService;

  @Before
  public void setUp() {
    MockitoAnnotations.initMocks();
  } 

  @Test
  public void updatePreferenceShouldSetColorPreference() {

    when(userService.getUser()).thenReturn(fakeUser);

    colorPreferenceChanger.setFavoriteColor(Color.BLUE);

    verify(userService).getUser();
    verify(userService).setColorPref(fakeUser, Color.BLUE);
    verifyNoMoreInteractions(userService);

  }

}

The verifications in bold are verifying unnecessary implementation details and increase the brittleness of the test. You can think of your unit tests as a readable demonstration that your code behaves as expected under certain situations. Verifying that userService.getUser() was invoked proves exactly nothing about whether the method behaved correctly. What if the user was fetched and then ignored? What if the setColorPref method was never invoked after the user was fetched? This is a really common pattern among developers who are accustomed to writing EasyMock or other old mocking frameworks where stubbing and verification are done in (nearly) the same way.

Generally speaking, stubbing and verifying the same call is a code smell. Stubbing (when/then calls) are used to set up the environment for your test. In Behavior Driven Development (BDD), tests are split into 3 sections:
  1. Given
  2. When
  3. Then
From a mocking perspective, this corresponds with
  1. Given (this stubbed service)
  2. When (I invoke my method)
  3. Then (these state-changing methods should have been invoked)
Taking our example above, here are the given/when/then sections:

@Test
public void setFavoriteColorShouldUpdateColorPreference() {

  // Given
  when(userService.getUser()).thenReturn(fakeUser);

  // When
  colorPreferenceChanger.setFavoriteColor(Color.BLUE);

  // Then
  verify(userService).setColorPref(fakeUser, Color.BLUE);

}

Because we verify that that the correct color preference was set with fakeUser, verifying the get interaction is completely redundant! The test would fail if the setColorPref method was invoked with any other user.


But I won't know if the stubbed method was invoked with the correct arguments!


If you stub for the values you expect, you will know, because your test won't pass without the missing data (otherwise, why are you stubbing it?). In other words, stub for specific values. Instead of:

when(userService.getDocument(anyLong()))
    .thenReturn(fakeDocument);

Do:

when(userService.getDocument(1234))
    .thenReturn(fakeDocument);

Frameworks like Mockito will return null or empty for unexpected invocations and your test will fail with a NullPointerException or similar error.


That will be really hard to troubleshoot when the test fails!


When the test fails, you can add in the verify() call as a troubleshooting measure, similar to how you might add a println statement in the middle of your code to quickly troubleshoot something. This will give you a more helpful diagnostic message about incorrect arguments. But it doesn't belong in the checked-in test because it doesn't verify any meaningful output of your method. It confuses the reader of your test by mixing verification of state-changing methods that matter with interaction/getter methods that don't.


What's wrong with verifyNoMoreInteractions()?


As mentioned in the Mockito docs, Methods like verifyNoMoreInteractions() or verifyZeroInteractions() can be overused, leading to brittle and hard-to-maintain tests. If you were to modify the class you're testing to call another method, you would need to update all existing tests to stub and verify the new method - even tests that have nothing to do with the new method! For instance, if we were to modify the ColorPreferenceChanger to send an email notification to a user when their preference has changed, we would need to update all existing tests to take this into account. Ideally, we should just need to add a single new test or modify an existing test to verify that the email is sent. Mockito intentionally uses nice mocks by default to make it easy to test specific slices of behavior without over-specifying your test. Think twice before modifying this philosophy if you want to avoid brittle tests.


Something bad could happen if the mock class is touched!


There are some methods that you want to make sure are never invoked. For this, Mockito provides verify/never.

verify(userService, never()).deleteUserAccount(fakeUser);

For the developer reading your unit test, this calls out the problematic method far more clearly, and better documents your intent.

However, consider that you may not need the negative verification at all! Negative verifications come from a place of fear. I find them useful when the outcome is an actual possibility for the given class. For instance, I would only verify that deleteUserAccount was never invoked if there was an actual code path in the class that would result in that outcome. Just because deleteUserAccount is an available method on the userService does not mean that I should verify that it was never invoked, if my class has no reason to ever invoke it. Unit tests are not meant to handle adversarial situations where you assume someone is writing harmful code for your class. Throwing in verifyNoMoreInteractions for your mock as a trip-wire increases the brittleness of the test.

That isn't to say that it's not a useful tool - just don't reach for it for every test.

It would be remiss of me not to mention Martin Fowler's excellent paper: Mocks Aren't Stubs.

Thursday, May 5, 2011

PHP and HTML hacked

Last week I ran into the exact hack described here:

http://frazierit.com/blog/?p=103

Nearly all php and html files had a script injected that would cause a client browser to perform a request from airschk.com. Apparently this is used to enlist browsers into making distributed pagerank requests from Google and reporting the results to a third party. The Dr3w had a Perl one-liner for sanitizing html files affected by the attack, but not for the php files. Here's a quick python script I threw together that addresses the php infection:

removeWebBug.py



#!/usr/bin/python

import argparse, sys, string, re

parser = argparse.ArgumentParser(description='Remove defaced code from infected php files')
parser.add_argument('--filelist', metavar='F', nargs=1, type=str, help='List of possibly infected files. Files will be overwritten, so make a backup.')
args = parser.parse_args()

filelist = args.filelist
if not filelist:
 parser.print_usage()
 exit()
filelist_file = open(filelist[0], "r")
filenames = filelist_file.readlines()

php_infection_reg = [re.compile("//{{[0-9a-f]{8}"), re.compile("//}}[0-9a-f]{8}")]
for filename in filenames:
 filename = filename.rstrip()
 print "Processing %s" % filename

 infected_file = open(filename, "r")
 lines = infected_file.readlines()
 clean_result = []

 snip = False
 remove_count = 0
 for line in lines:
  if not snip and php_infection_reg[0].match(line):
   snip = True
  if snip:
   remove_count += 1
  else:
   clean_result.append(line)
  if snip and php_infection_reg[1].match(line):
   snip = False
   print 'Removing %d lines' % remove_count

 if not snip:
  infected_file.close()

  # Save in place
  clean_file = open(filename, "w")
  clean_file.writelines(clean_result)
  clean_file.close()
 else:
  'End sequence not found. Aborting!'



Conveniently, the attackers left some tags around their handiwork that makes it easy to excise just the portions that don't belong.

Definitely make a backup before running the above script. It could conceivably remove more than just the exploit, though I haven't seen that happen.

find . -name "*.php" > filelist.txt
python ./removeWebBug.py --filelist filelist.txt | less

The output of the script should look something like:

Processing ./php/lib/modifier.sprintf.php
Removing 105 lines
Processing ./php/lib/modifier.strip_linefeeds
.php
Removing 105 lines

Tuesday, March 8, 2011

Managing Configuration in Git

A common pattern I run into when using Git (my favorite version control software, by far) is wanting to manage local project configuration that does not get committed to a central repository. This might be something like a local email address for testing, or local database settings. The cleanest option is to set up these configuration options outside of your project or within a directory or file that is not tracked by version control. If you do want to track local configuration in your local version control, you have two options:

  1. Use 'git stash' to store away your local changes whenever you need a clean working set.
  2. Track local changes as a commit or set of commits.
The drawback for solution 1 is that it is easy to accidentally commit your local configuration changes. It's also easy to lose your local configuration if you are frequently stashing other changes too.

The drawback for solution 2 is that you can also accidentally commit local configuration changes. You do, however, have a reference to your configuration changes that you are unlikely to lose. What I'll be describing below is my shortcut for handling solution 2.

For this pattern, I use three branches.
The 'head' branch points to the latest commit from the central repository.
The 'config' branch points to your configuration commits that you don't intend to commit to the central repository.
The 'dev' branch points to your new commits that you will likely commit to a central repository.


head -> .. -> config -> .. -> dev

To start out, create the head branch to track the latest commit from the central repository.


git checkout -b head

Then create the config branch to track configuration changes.


git checkout -b config
git add foo.config bar.config
git commit -m "Local configuration"

Finally, create the dev branch for your current development efforts.


git checkout -b dev
git commit -a -m "Added feature X"

If you perform a git fetch or git pull, we'll want to update the head branch to include the upstream changes. We will periodically 'rebase' the config branch on top of our head branch. This will put the configuration commits after the head branch and allow you to work with the configuration changes applied.
The instructions to follow make judicious use of the 'git rebase' command, which is a command that rewrites history. Please read the man page for this command and never use it on published commits.


git checkout head
git pull (or fetch, or pull --rebase)
git checkout config
git rebase head
git checkout dev
git rebase config

When you are ready to commit your changes back to the repository, you'll need to pull out your local configuration changes.


(published commits) -> head -> (configuration commits) -> config -> (unpublished commits) -> dev

The following command applies only the changes after the config branch to the head branch.


git rebase --onto head config dev

The result is a version history that looks like the following. As you can see, the local configuration changes have been removed.


(published commits) -> head -> (unpublished commits) -> dev

At this point, you can perform a 'git push', or if you are using git-svn, perform 'git svn dcommit'.


git push origin
(or)
git svn dcommit

Now to include configuration changes in your development branch again:


# Stick our config changes back on top of the latest revision
git checkout config
git rebase head
# Move development branch on top of configuration changes
git checkout dev
git rebase config

Complicated, right? I found myself doing this sequence again and again and finally came up with some aliases that cut down on the number of keystrokes. These aliases do everything above in two simple commands.
Add the following to your ~/.gitconfig:


[alias]
        # Useful for omitting and restoring configuration commits
        configdrop = "!sh -c 'git rebase --onto $1 $2 $3 && git checkout $1 && git rebase $3' -"
        configrestore = "!sh -c 'git checkout $2 && git rebase $1 && git checkout $3 && git rebase $2' -"

So instead of doing everything above you can simply do:


// Drop config changes
git configdrop head config dev
// Push your changes to the central repository (make sure everything looks good in gitk first!)
git push / git svn dcommit
// Put your config changes back in place
git configrestore head config dev

You could merge those three commands into one big alias but then you wouldn't have the opportunity to review your change before pushing to the upstream repository.

These aliases use the 'git rebase' command, which is a command that rewrites history. Please read the man page for this command and never use this alias on published commits.