Data driven tests

I’m not sure if anybody uses the terminology “data driven test” but if you explain what it is, experienced people will tel you that they are bad. Data driven tests are tests with the same code repeating over many different pieces of data.

Let’s show an example. For my startup project Keep on Posting, I have a method that turns a blog url into a feed url. That method is critical for my application and there are many things that can go wrong, so I test it by querying a sample of real blogs. The test would be something like this (this is in Ruby):

class BlogToolsTest
  BLOGS_AND_FEES =>
      "http://blog.sandrafernandez.eu" => "http://blog.sandrafernandez.eu/feed/",
      "http://www.lejanooriente.com" => "http://www.lejanooriente.com/feed/",
      "http://pupeno.com" => "https://pupeno.com/feed/",
      "http://www.avc.com/a_vc" => "http://feeds.feedburner.com/avc",
  }

  def test_blog_to_feed_url
    BLOGS_AND_FEEDS.each_pair do |blog_url, feed_url|
      assert_true feed_url == BlogTools.blog_to_feed(blog_url)
    end
  end
end

Note: I’m using assert_true instead of assert_equal to make a point; these kind of tests tend to user assert_true.

The problem with that is that eventually it’ll fail and it’ll say something like:

false is not true

Oh! so useful. Let’s see at least where the error is happening… and obviously it’ll point to this line:

      assert_true feed_url == BlogTools.blog_to_feed(blog_url)

which is almost as useless as the failure message. That’s the problem with data drive tests. You might be tempted to do this an re-run the tests:

  def test_blog_to_feed_url
    BLOGS_AND_FEEDS.each_pair do |blog_url, feed_url|
      puts blog_url
      puts feed_url
      assert_true feed_url == BlogTools.blog_to_feed(blog_url)
    end
  end

but if your tests take hours to run, like the ones I often found while working at Google, then you are wasting time. Writing good error messages ahead of time help:

  def test_blog_to_feed_url
    BLOGS_AND_FEEDS.each_pair do |blog_url, feed_url|
      assert_true feed_url == BlogTools.blog_to_feed(blog_url), "#{blog_url} should have returned the feed #{feed_url}"
    end
  end

and if half your cases fail and the whole suit takes an hour to run and you have 1000 data sets you’ll spend hours staring at your monitor fixing one test every now and then, because as soon as one case fails, the execution of the tests is halted. If you are coding in a language like Java, that’s as far as you can take it.

With Ruby you can push the boundaries and write it this way (thanks to executable class bodies):

class BlogToolsTest
  BLOGS_AND_FEES =>
      "http://blog.sandrafernandez.eu" => "http://blog.sandrafernandez.eu/feed/",
      "http://www.lejanooriente.com" => "http://www.lejanooriente.com/feed/",
      "http://pupeno.com" => "https://pupeno.com/feed/",
      "http://www.avc.com/a_vc" => "http://feeds.feedburner.com/avc",
  }

  BLOGS_AND_FEEDS.each_pair do |blog_url, feed_url|
    define_method "test_#{blog_url}_#{feed_url}" do
      assert_true feed_url == BlogTools.blog_to_feed(blog_url), "#{blog_url} should have returned the feed #{feed_url}"
    end
  end
end

That will generate one method per item of data, even if one fails, the rest will be executed as they are separate isolated tests. They will also be executed in a potential random order so you don’t have tests depending on tests and even if you don’t get a nice error message, you’ll know which piece of data is the problematic through the method name.

Note: that actually doesn’t work because blog_url and feed_url have characters that are not valid method names, they should be replaced, but I wanted to keep the example concise.

Since I’m using shoulda, my resulting code looks like this:

class BlogToolsTest
  BLOGS_AND_FEES =>
      "http://blog.sandrafernandez.eu" => "http://blog.sandrafernandez.eu/feed/",
      "http://www.lejanooriente.com" => "http://www.lejanooriente.com/feed/",
      "http://pupeno.com" => "https://pupeno.com/feed/",
      "http://www.avc.com/a_vc" => "http://feeds.feedburner.com/avc",
  }

  BLOGS_AND_FEEDS.each_pair do |blog_url, feed_url|
    should "turn blog #{blog_url} into feed #{feed_url}" do
      assert_equal feed_url, BlogTools.blog_to_feed(blog_url), "#{blog_url} did not resolve to the feed #{feed_url}"
    end
  end
end

and running them in RubyMine looks like this:

Advertisements

Better assert difference?

Rails come with some awesome assertion methods for writing tests:

assert_difference("User.count", +1) do
  create_a_user
end

That asserts that the count of user was incremented by one. The plus sign is not needed, that’s just an integer, I add it to make things clear. You can mix several of this expressions into one assert_difference:

assert_difference(["User.count", "Profile.count"], +1) do
  create_a_user
end

That works as expected, it asserts that both users and profiles were incremented by one. The problem I have is that I often found myself doing this:

assert_difference "User.count", +1 do
  assert_difference "Admin.count", 0 do
    assert_difference "Message.count", +3 do  # We send three welcome messages to each user, like Gmail.
      create_a_user
    end
  end
end

That looks ugly. Let’s try something different:

assert_difference("User.count" => +1, "Admin.count" => 0, "Message.count" => +3) do
  create_a_user
end

Well, that looks nicer, and straightforward, so I implemented it (starting from Rails 3 assert_difference):

def assert_difference(expressions, difference = 1, message = nil, &block)
  b = block.send(:binding)
  if !expressions.is_a? Hash
    exps = Array.wrap(expressions)
    expressions = {}
    exps.each { |e| expressions[e] = difference }
  end

  before = {}
  expressions.each {|exp, _| before[exp] = eval(exp, b)}

  yield

  expressions.each do |exp, diff|
    error = "#{exp.inspect} didn't change by #{diff}"
    error = "#{message}.\n#{error}" if message
    assert_equal(before[exp] + diff, eval(exp, b), error)
  end
end

Do you like it? If you do, let me know and I might turn this into a patch for Rails 3 (and then let them now, otherwise they’ll ignore it).

Update: this is now a gem.

The sad truth about testing web applications

There are many ways to test a web application. In the lowest level, we have unit tests; in the highest levels we have HTTP test, those that use the HTTP protocol to talk to running instance of your application (maybe running it on demand, maybe expecting it to be running on a testing server).

There are several ways to write HTTP tests. Two big families: with and without a web browser. Selenium is a popular way to write tests with a browser. A competing product is Web Driver which I understand can use a browser or other methods. If you’ve never seen Selenium before is pretty impressive. You write a tests that says something like:

  1. go to http://…
  2. click here
  3. click there
  4. fill field
  5. fill field
  6. submit form
  7. assert response

and when you run it you actually see a Firefox window pop up and perform that sequence amazingly fast. Well, it’s amazingly fast the first three runs, while you still have two tests or less. After that it’s amazingly slow, tedious, flaky and intrusive.

For the other family of tests, without a web browser, aside of Web Driver we have HttpUnitHtmlUnit and most of the Ruby on Rails testing frameworks. The headless solution tend to be faster and more solid, but the scenarios are not as realistic (only one JavaScript engine, if you are lucky, no rendering issues, like slowdowns, etc).

When you are testing, as soon as you touch the HTTP protocol everything becomes much harder and less useful. If you want to be totally confident a web application is working you need to test at the HTTP level, but the return-of-investment for those tests is very low: they are hard to write and not very useful.

Hard to write

They are hard to write because you are not calling methods with well-defined interfaces (list of arguments) but essentially calling one method HTTP-request, passing different parameters to get different results. You don’t have any code-completion, you don’t have any formal way to know which arguments to pass. Anything can be valid.

In a unit test you may have something like:

add_user("john");

when in a HTTP test you’ll have something like

http.send_request("/user/create", "username=john");

When you are writing a unit test, figure out the name of the add_user function and its arguments is easy. Some IDEs would autocomplete the name and show you the argument list. And if the name of add_user changes, some refactoring tools will even fix your tests for you.

But “/user/create” and “username=john” are strings. To figure them out you’ll have to know how your application handles routing, and how the parameters are passed and parsed. If your application changes from “/user/create” to “/user/add” the test will just break, and most likely, with a not-very-useful error message. Which takes into the next issue…

They are not very useful

They are not very useful because their failures are cryptic. When you write a test that calls method blah, which calls method bleh, which calls method blih, and then bloh and bluh and bluh divides by zero, you get an exception and a stack trace. Something like:

bluh:123: Division by zero! I can't divide by zero (I'm not Haskell)
bloh:234: bluh(...)
blih:452: bloh(...)
bleh:34: blih(...)
blah:94: bleh(...)
blah_test:754: blah(...)

You know that the test blah_test failed on line 754 when calling blah, which called bleh on line 94, which called blih on line 34, which called bloh on line 452 which called bluh on line 234 which dived by zero on line 123. You jump to bluh, line 123, and you may find something like:

a = i / 0;

where you replace the zero with something else; or most likely:

a = i / j;

where you have to track where j came from. Either it was calculated there or generated from another method and passed as an argument. The stack-trace gives you all the information you need to find where j was generated or where it came from. That’s a very useful test.

When you have HTTP in the middle, tests become much less useful. The stack trace of a failure would look something like:

http_request:123: Time out, server didn't respond.
blah_test:45: http_request(...)

That means that blah_test failed on line 45 making an http request call which failed with a timeout. Did your application divide by 0 and crashed? Did it try to calculate pi and it’s still doing it? Did it failed to connect to the database? Where did it actually fail? You don’t know. The only thing you know is that something went wrong. Time to open the log files and figure it out.

You open the log file and you find there’s not enough information there. You make the application log much, much more. So much that you’ll fill a terabyte in an hour. You run the test again and this time it just passes, no errors.

When you are at the HTTP level there are many, many things that are flaky and can go wrong. Let’s invent one example here: the web server you were using for the tests wants to DNS resolve everything it can. Every host name is resolved to the ip, and every ip is reverse-resolved to a name. When you run the test there was a glitch and your name servers were down. Now they are working correctly and they’ll never fail for another year. Good luck figuring it out from a time-out message.

The other way in which HTTP tests fail is something like this:

blah_test:74: Index out of bound for this array

You go to line 74 and it’s something like:

assert_equal("username", data[0]);

If data[0] caused an out-of-bound error, then the array data is empty. How can it be empty? It contains the response from the server and you know the server is responding with something usable because you are using the app right now.

What happened was that the log in box used to have the id, in HTML, "login" and it is now "log-in". That means the HTML parsing methods on blah_test don’t find the log in box and fail to properly fill the array data. Yet another case of tests exposing bugs, in the tests. And the real-life failures are much, much more complex like this.

My recommendation

All this makes the return of investment of writing HTTP tests quite low. They are very hard to write and they provide very little information when they fail. They do provide good information when they pass: if it works at the HTTP level, probably everything else works too.

I’d recommend any project not to write any HTTP test unless every other possible test, unit and integration, is already written.

Quick and dirty continuous testing with Ruby on Rails

I like the idea of continuous testing: seeing that you broke it, the moment you broke it and the same for fixing. There are some tools to do it, but since they are not packaged, yet, for my operating system of choice I went through the quick and dirty route:

watch -n 5 rake test 2> /dev/null

Continue reading “Quick and dirty continuous testing with Ruby on Rails”