Tag: HTTP

The sad truth about testing web applications

There are many ways to test a web application. In the lowest level, we have unit tests; in the highest levels we have HTTP test, those that use the HTTP protocol to talk to running instance of your application (maybe running it on demand, maybe expecting it to be running on a testing server).

There are several ways to write HTTP tests. Two big families: with and without a web browser. Selenium is a popular way to write tests with a browser. A competing product is Web Driver which I understand can use a browser or other methods. If you’ve never seen Selenium before is pretty impressive. You write a tests that says something like:

  1. go to http://…
  2. click here
  3. click there
  4. fill field
  5. fill field
  6. submit form
  7. assert response

and when you run it you actually see a Firefox window pop up and perform that sequence amazingly fast. Well, it’s amazingly fast the first three runs, while you still have two tests or less. After that it’s amazingly slow, tedious, flaky and intrusive.

For the other family of tests, without a web browser, aside of Web Driver we have HttpUnitHtmlUnit and most of the Ruby on Rails testing frameworks. The headless solution tend to be faster and more solid, but the scenarios are not as realistic (only one JavaScript engine, if you are lucky, no rendering issues, like slowdowns, etc).

When you are testing, as soon as you touch the HTTP protocol everything becomes much harder and less useful. If you want to be totally confident a web application is working you need to test at the HTTP level, but the return-of-investment for those tests is very low: they are hard to write and not very useful.

Hard to write

They are hard to write because you are not calling methods with well-defined interfaces (list of arguments) but essentially calling one method HTTP-request, passing different parameters to get different results. You don’t have any code-completion, you don’t have any formal way to know which arguments to pass. Anything can be valid.

In a unit test you may have something like:

add_user("john");

when in a HTTP test you’ll have something like

http.send_request("/user/create", "username=john");

When you are writing a unit test, figure out the name of the add_user function and its arguments is easy. Some IDEs would autocomplete the name and show you the argument list. And if the name of add_user changes, some refactoring tools will even fix your tests for you.

But “/user/create” and “username=john” are strings. To figure them out you’ll have to know how your application handles routing, and how the parameters are passed and parsed. If your application changes from “/user/create” to “/user/add” the test will just break, and most likely, with a not-very-useful error message. Which takes into the next issue…

They are not very useful

They are not very useful because their failures are cryptic. When you write a test that calls method blah, which calls method bleh, which calls method blih, and then bloh and bluh and bluh divides by zero, you get an exception and a stack trace. Something like:

bluh:123: Division by zero! I can't divide by zero (I'm not Haskell)
bloh:234: bluh(...)
blih:452: bloh(...)
bleh:34: blih(...)
blah:94: bleh(...)
blah_test:754: blah(...)

You know that the test blah_test failed on line 754 when calling blah, which called bleh on line 94, which called blih on line 34, which called bloh on line 452 which called bluh on line 234 which dived by zero on line 123. You jump to bluh, line 123, and you may find something like:

a = i / 0;

where you replace the zero with something else; or most likely:

a = i / j;

where you have to track where j came from. Either it was calculated there or generated from another method and passed as an argument. The stack-trace gives you all the information you need to find where j was generated or where it came from. That’s a very useful test.

When you have HTTP in the middle, tests become much less useful. The stack trace of a failure would look something like:

http_request:123: Time out, server didn't respond.
blah_test:45: http_request(...)

That means that blah_test failed on line 45 making an http request call which failed with a timeout. Did your application divide by 0 and crashed? Did it try to calculate pi and it’s still doing it? Did it failed to connect to the database? Where did it actually fail? You don’t know. The only thing you know is that something went wrong. Time to open the log files and figure it out.

You open the log file and you find there’s not enough information there. You make the application log much, much more. So much that you’ll fill a terabyte in an hour. You run the test again and this time it just passes, no errors.

When you are at the HTTP level there are many, many things that are flaky and can go wrong. Let’s invent one example here: the web server you were using for the tests wants to DNS resolve everything it can. Every host name is resolved to the ip, and every ip is reverse-resolved to a name. When you run the test there was a glitch and your name servers were down. Now they are working correctly and they’ll never fail for another year. Good luck figuring it out from a time-out message.

The other way in which HTTP tests fail is something like this:

blah_test:74: Index out of bound for this array

You go to line 74 and it’s something like:

assert_equal("username", data[0]);

If data[0] caused an out-of-bound error, then the array data is empty. How can it be empty? It contains the response from the server and you know the server is responding with something usable because you are using the app right now.

What happened was that the log in box used to have the id, in HTML, "login" and it is now "log-in". That means the HTML parsing methods on blah_test don’t find the log in box and fail to properly fill the array data. Yet another case of tests exposing bugs, in the tests. And the real-life failures are much, much more complex like this.

My recommendation

All this makes the return of investment of writing HTTP tests quite low. They are very hard to write and they provide very little information when they fail. They do provide good information when they pass: if it works at the HTTP level, probably everything else works too.

I’d recommend any project not to write any HTTP test unless every other possible test, unit and integration, is already written.

Advertisements

Redirecting on load

other-doorOf all the bad practices I see on the web this ranks as very bad and I believe it’s not mentioned enough. It’ll easily make it to my personal top 5.

I go to a web site, like example.com, and I immediately get redirected to an ugly URL beast, like example.com/news/today?date=2009-06-30&GUID=5584839592719193765662.Wha? Why? First, the site broke any chance I had of making a bookmark of it with just one click. I don’t want to bookmark yesterday’s news (look at the URL, it has a date), and what’s that GUID? Oh well, I go and make the bookmark, pointing to example.com, by hand, because I have no other way.

Even if it only redirected me to example.com/news/today it’d be pretty bad. That URL may not work tomorrow due to changing software. Or what can be even worse: the software and the content get revamped, the URLs changed and everything is cool again, and since the developers are smart people they leave old URLs working. So my bookmark works, but shows obsolete information.

With my crazy browsing habits (open a trillion tabs, fast, fast, faster) I go to a page, leave it loading, and when I go back and see a weird URL I end up wondering whether I accidentally clicked on something or something weird happened. I have to go back and check.

It gets even worse when the URL is rather obscure. My e-banking site has this issue. I go to the bank home page where I can find the e-banking link. I click it and it opens the e-banking page, which sells you the service and in a small corner has a link to the real e-banking application where you can log in and see the big red numbers. I’d say they have a deeper problem than redirecting. They see the bank as a company with its useless propaganda home page and e-banking as a product with its useless propaganda home page and then, the actual e-banking site, somewhere else. They should just have the log in on their home page, like any other on-line service. But I digress.

Back to redirecting. I click log in and it opens, in another window, a web site with a URL that is measured in meters. Long, ugly and scary. I never even thought of bookmarking that because I’m sure it won’t work the second time. So my bookmark is to the previous page. Just today, after a year of using it, I discovered that there’s a nice short well-formed URL for the log in page, something like: bank.com/ebanking/login which immediately redirects to the ugly one. Thanks to the amazing speeds of Switzerland internet connection and today’s browsers I never noticed.

If the bank had just been serving the content through that URL, they would have saved more time over a year than it took me to write this post. Literally. I can’t understand why they don’t do it properly. If they are passing session information, they should use session state on the server side and a cookie. If they have a modular structure where the app is located elsewhere, instead of redirecting you they should use a reverse proxy. It takes a day to configure Apache for such a thing if you don’t know what you are doing.

I’ve been using it for ages to serve Plone sites that are in a subdirectory in a Zope web server which runs in an alternate port, yet the front end is Apache and you are never redirected anywhere. You go to example.com which hits my Apache server and inside makes a request to zope.example.com:8080/example.com and serves you the result, you never leave example.com. Even if you go to the secure version, the SSL part is handled by Apache since Zope is not that good (or wasn’t) at it.

There are cases to redirect someone on a web site. When the content is no longer available or temporarily unavailable. When the user just submitted a form, you redirect if the form was successfully processed to another page that shows the result of the form (the record created or whatever). There are many reasons to do that but that’s for another post.

There’s no reason to redirect on load. Please, don’t do it.

Reviewed by Daniel Magliola. Thank you! Use Other Door picture by cobalt123.