NEXT_Cube-IMG_7154

How I found one of the earliest browsers in history

Yesterday, the web celebrated its 25th birthday and to join in, I want a little story. A couple of years ago I found a NeXTcube. I’m not going to say where it is to avoid vandalism (the computer is publicly accessible under some circumstances without much oversight), but this is the story. Sir Tim Berners-Lee coded the earliest version of the web in his NeXTcube workstation when he was working at CERN, so, I was always interested in this machines, from a historical/playful point of view.

The cube that was in front of me was more or less abandoned and I asked the owner if I could play with it. He was very reticent but I was more relentless and I got to play with it. He told me that Next computer belonged, at one point, to CERN and that it has not been used since then. I decided to explore it.

The first interesting thing I found was a file containing a lot of email addresses from people that seemed to work at CERN or be related to CERN in some form or fashion. The owner of the computer decided to be overly professional and deleted the file.

The second interesting thing I found completely blew my mind. There was a folder called WorldWideWeb and inside it several files called WorldWideWeb_0.1.0.tar, 0.1.1.tar, 0.2.0.tar and so on. Could this be? I opened one by one and indeed they were apps. I started with the oldest and executed them one by one.

The first one raised an error as it tried to contact cernvax.cern.ch (this Next cube was disconnected) and then it crashed:

WorldWideWeb_0.1.0

I kept on going and eventually one started. It was very plain but I knew what it was. I quickly went back to my terminal, open vi, and wrote a small HTML file, which then I passed as a parameter to the little WorldWideWeb_0.2. It worked… it displayed an h1 as a title!

I was jumping out of my skin. I don’t want to publish the whole picture to avoid releasing private information, but I’m standing, next to the cube, pointing and what could possible be the earliest version of the web browser that still works today, displaying a web site I just coded (it says Hello World):

WorldWideWeb_0.2

Then I discovered the browser allowed me to edit the page, directly there, without having to do anything special, and I remembered that Sir Tim Berners-Lee originally designed the web to be read-write, not read-only.

That was one of the most exciting moments of my life. When I got home I wrote an email to Sir Tim Berners-Lee, telling him of my finding and where he could find that computer, just in case he wanted to get ahold of those binaries (I couldn’t find any source code anywhere on that machine). He never replied, I don’t know if he ever got my email. I bet he gets a lot of it and that he’s a very busy man.

Update: explained a bit why I don’t want to reveal where this happened.

desk or wall - dark

Finally happy with the creation of a web site

In the past, I never managed to build a web site and feel happy with the process. Every time I finished building a web site I would have a list of things to never do again. Until now! So, I thought I’d share.

First, by web site I mean content, like watuapp.com or screensaver.ninja, I don’t mean a web app. I’m happy with how I build web apps although I’m constantly improving and learning and trying new things. When it comes to content, you have to balance some opposing forces:

  • It should look amazing.
  • It should be flexible.

It should look amazing because it’s going to be compared to the ones produced by expert teams of designers at tech companies and if your web site is not indistinguishable from those, your web site will be perceived as unprofessional and as a result, so will you and your product.

I had web sites that looked very good. A graphic designed was hired to produce a pretty illustration of the web site and then a coder turned that picture into HTML and CSS. New pages were created by hand-coding them in HTML. The process of setting up the web site initially was ok, but after that, the workflow was horrendous.

Changes to the web site would come from non-coders, like the CEO, people in marketing or sales, copywriters, and they would be given to a developer to execute. Then we would have to prioritize between improving our web site or improving our product. Almost always product wins… only when the web site got the point of being embarrassingly out-of-date or broken we would consider doing something about it. This situation is annoying and frustrating for both developers and content generators.

The way to solve it is with a Content Management System, where things get flexible. With a CMS suddenly anyone with a browser and the right access can edit the site, add new pages, correct typos, add a FAQ, change a title, write a new blog post, etc. It’s as easy as Microsoft Word and the output is as generic, boring and bland as that of your average Word document. It might be ok for text files, but on the web, that screams unprofessional.

The problem is a tricky one. You might think there’s a nice separation between design and content but that isn’t true. A content writer might decide to have only one column of text instead of two because there’s not enough copy. But the difference between one and two columns is a big one when it comes to design. The content might call for a picture or even worst, a drawing. The design establishes the palette and style of that drawing.

A screenshot of Screensaver Ninja's web site

A screenshot of Screensaver Ninja’s web site at the time of this writing.

I just finished rebuilding the web site for Screensaver Ninja and for the first time I’m happy with the result. Not only how it looks, but the amount of work and funds require as well as the flexibility and workflow going forward.

The CMS we are using is WordPress and we host it at wpengine, which I really recommend. Not the cheapest, but if you care about your web site and you can afford it, you should go there.

One potential approach to having a beautiful site would be to go to 99designs and run a contest for a WordPress theme. My hesitation is around the flexibility of the result. Will the new design be completely hard-coded or will I be able to change the copy? What about changing more involved aspects like the amount of columns or images. I’m not sure and asking around did not reach any useful answers. If you have taken this approach, would you mind sharing how it works with me?

The approach we took was to use a very flexible and advance WordPress theme called X. We chose one of their many templates for a page that we felt would match our current branding and the message we wanted to communicate. We proceeded to fill it up with our own copy following this tenets:

  • Change as little as possible.
  • Ignore all images, just leave them blank.

Once copy was done, we hired a designer through a freelancing marketplace and ask her to produce images to fill in the blanks. We showed her our web site with the blank images as well as the original template with sample images and asked her to keep the style and palette. We provided some ideas for the images and she came up with some as well. After a couple of iterations we had all the needed images.

And that’s it. That’s how we produced that site. Yes, it’s possible that out there there are other sites that look exactly the same, but it’s not a big issue. It’s like worrying that somewhere out there there’s someone with the same t-shirt as you. The chances of you two being seen by the same person in a short period of time is small and even if that happens, whether the t-shirts fits you or not is more important. Nobody will care about your originality of clothing if they look horrible and the same goes for your web site.

Next time i have to build a web site for a product, I’ll do this exercise again and I recommend it to all entrepreneurs that are working in a small company and need to be efficient.

The iPad's lack of Flash is a win-win situation

It seems iPad lack of Flash support is the debate of the moment. On one camp: “You can’t use the web without Flash”, on the other camp: “We don’t need no stinking Flash”. Although I do realize how a technology like Flash is sometimes needed, I’m more on the second camp. The less flash, the better.

I think iPad’s lack of Flash cause two things to happen:

  • Slow down the adoption of the iPad: surely someone will say “No Flash, No iPad“.
  • Speed up the adoption of HTML5: surely someone will consider using HTML5 to support the tablet.

Giving that the iPad is a closed device, probably the closedest non-phone computer consumers ever had access to and that HTML5 is good progress for the web, I consider both results of the iPad not having Flash positive. If I have to say anything about it, it’d be: please, stop trying to wake Steve Jobs up regarding this, you’ll ruin it.

The sad truth about testing web applications

There are many ways to test a web application. In the lowest level, we have unit tests; in the highest levels we have HTTP test, those that use the HTTP protocol to talk to running instance of your application (maybe running it on demand, maybe expecting it to be running on a testing server).

There are several ways to write HTTP tests. Two big families: with and without a web browser. Selenium is a popular way to write tests with a browser. A competing product is Web Driver which I understand can use a browser or other methods. If you’ve never seen Selenium before is pretty impressive. You write a tests that says something like:

  1. go to http://…
  2. click here
  3. click there
  4. fill field
  5. fill field
  6. submit form
  7. assert response

and when you run it you actually see a Firefox window pop up and perform that sequence amazingly fast. Well, it’s amazingly fast the first three runs, while you still have two tests or less. After that it’s amazingly slow, tedious, flaky and intrusive.

For the other family of tests, without a web browser, aside of Web Driver we have HttpUnitHtmlUnit and most of the Ruby on Rails testing frameworks. The headless solution tend to be faster and more solid, but the scenarios are not as realistic (only one JavaScript engine, if you are lucky, no rendering issues, like slowdowns, etc).

When you are testing, as soon as you touch the HTTP protocol everything becomes much harder and less useful. If you want to be totally confident a web application is working you need to test at the HTTP level, but the return-of-investment for those tests is very low: they are hard to write and not very useful.

Hard to write

They are hard to write because you are not calling methods with well-defined interfaces (list of arguments) but essentially calling one method HTTP-request, passing different parameters to get different results. You don’t have any code-completion, you don’t have any formal way to know which arguments to pass. Anything can be valid.

In a unit test you may have something like:

add_user("john");

when in a HTTP test you’ll have something like

http.send_request("/user/create", "username=john");

When you are writing a unit test, figure out the name of the add_user function and its arguments is easy. Some IDEs would autocomplete the name and show you the argument list. And if the name of add_user changes, some refactoring tools will even fix your tests for you.

But “/user/create” and “username=john” are strings. To figure them out you’ll have to know how your application handles routing, and how the parameters are passed and parsed. If your application changes from “/user/create” to “/user/add” the test will just break, and most likely, with a not-very-useful error message. Which takes into the next issue…

They are not very useful

They are not very useful because their failures are cryptic. When you write a test that calls method blah, which calls method bleh, which calls method blih, and then bloh and bluh and bluh divides by zero, you get an exception and a stack trace. Something like:

bluh:123: Division by zero! I can't divide by zero (I'm not Haskell)
bloh:234: bluh(...)
blih:452: bloh(...)
bleh:34: blih(...)
blah:94: bleh(...)
blah_test:754: blah(...)

You know that the test blah_test failed on line 754 when calling blah, which called bleh on line 94, which called blih on line 34, which called bloh on line 452 which called bluh on line 234 which dived by zero on line 123. You jump to bluh, line 123, and you may find something like:

a = i / 0;

where you replace the zero with something else; or most likely:

a = i / j;

where you have to track where j came from. Either it was calculated there or generated from another method and passed as an argument. The stack-trace gives you all the information you need to find where j was generated or where it came from. That’s a very useful test.

When you have HTTP in the middle, tests become much less useful. The stack trace of a failure would look something like:

http_request:123: Time out, server didn't respond.
blah_test:45: http_request(...)

That means that blah_test failed on line 45 making an http request call which failed with a timeout. Did your application divide by 0 and crashed? Did it try to calculate pi and it’s still doing it? Did it failed to connect to the database? Where did it actually fail? You don’t know. The only thing you know is that something went wrong. Time to open the log files and figure it out.

You open the log file and you find there’s not enough information there. You make the application log much, much more. So much that you’ll fill a terabyte in an hour. You run the test again and this time it just passes, no errors.

When you are at the HTTP level there are many, many things that are flaky and can go wrong. Let’s invent one example here: the web server you were using for the tests wants to DNS resolve everything it can. Every host name is resolved to the ip, and every ip is reverse-resolved to a name. When you run the test there was a glitch and your name servers were down. Now they are working correctly and they’ll never fail for another year. Good luck figuring it out from a time-out message.

The other way in which HTTP tests fail is something like this:

blah_test:74: Index out of bound for this array

You go to line 74 and it’s something like:

assert_equal("username", data[0]);

If data[0] caused an out-of-bound error, then the array data is empty. How can it be empty? It contains the response from the server and you know the server is responding with something usable because you are using the app right now.

What happened was that the log in box used to have the id, in HTML, "login" and it is now "log-in". That means the HTML parsing methods on blah_test don’t find the log in box and fail to properly fill the array data. Yet another case of tests exposing bugs, in the tests. And the real-life failures are much, much more complex like this.

My recommendation

All this makes the return of investment of writing HTTP tests quite low. They are very hard to write and they provide very little information when they fail. They do provide good information when they pass: if it works at the HTTP level, probably everything else works too.

I’d recommend any project not to write any HTTP test unless every other possible test, unit and integration, is already written.

Redirecting on load

other-doorOf all the bad practices I see on the web this ranks as very bad and I believe it’s not mentioned enough. It’ll easily make it to my personal top 5.

I go to a web site, like example.com, and I immediately get redirected to an ugly URL beast, like example.com/news/today?date=2009-06-30&GUID=5584839592719193765662.Wha? Why? First, the site broke any chance I had of making a bookmark of it with just one click. I don’t want to bookmark yesterday’s news (look at the URL, it has a date), and what’s that GUID? Oh well, I go and make the bookmark, pointing to example.com, by hand, because I have no other way.

Even if it only redirected me to example.com/news/today it’d be pretty bad. That URL may not work tomorrow due to changing software. Or what can be even worse: the software and the content get revamped, the URLs changed and everything is cool again, and since the developers are smart people they leave old URLs working. So my bookmark works, but shows obsolete information.

With my crazy browsing habits (open a trillion tabs, fast, fast, faster) I go to a page, leave it loading, and when I go back and see a weird URL I end up wondering whether I accidentally clicked on something or something weird happened. I have to go back and check.

It gets even worse when the URL is rather obscure. My e-banking site has this issue. I go to the bank home page where I can find the e-banking link. I click it and it opens the e-banking page, which sells you the service and in a small corner has a link to the real e-banking application where you can log in and see the big red numbers. I’d say they have a deeper problem than redirecting. They see the bank as a company with its useless propaganda home page and e-banking as a product with its useless propaganda home page and then, the actual e-banking site, somewhere else. They should just have the log in on their home page, like any other on-line service. But I digress.

Back to redirecting. I click log in and it opens, in another window, a web site with a URL that is measured in meters. Long, ugly and scary. I never even thought of bookmarking that because I’m sure it won’t work the second time. So my bookmark is to the previous page. Just today, after a year of using it, I discovered that there’s a nice short well-formed URL for the log in page, something like: bank.com/ebanking/login which immediately redirects to the ugly one. Thanks to the amazing speeds of Switzerland internet connection and today’s browsers I never noticed.

If the bank had just been serving the content through that URL, they would have saved more time over a year than it took me to write this post. Literally. I can’t understand why they don’t do it properly. If they are passing session information, they should use session state on the server side and a cookie. If they have a modular structure where the app is located elsewhere, instead of redirecting you they should use a reverse proxy. It takes a day to configure Apache for such a thing if you don’t know what you are doing.

I’ve been using it for ages to serve Plone sites that are in a subdirectory in a Zope web server which runs in an alternate port, yet the front end is Apache and you are never redirected anywhere. You go to example.com which hits my Apache server and inside makes a request to zope.example.com:8080/example.com and serves you the result, you never leave example.com. Even if you go to the secure version, the SSL part is handled by Apache since Zope is not that good (or wasn’t) at it.

There are cases to redirect someone on a web site. When the content is no longer available or temporarily unavailable. When the user just submitted a form, you redirect if the form was successfully processed to another page that shows the result of the form (the record created or whatever). There are many reasons to do that but that’s for another post.

There’s no reason to redirect on load. Please, don’t do it.

Reviewed by Daniel Magliola. Thank you! Use Other Door picture by cobalt123.

Proper linking ettiquete

This has been mentioned thousands of times on the interwebs, but in case there’s at least one person reading this that didn’t know it, I’m explaining it again. Using hyperlinks in a piece of text doesn’t mean it has to stop being proper, readable English (or any other language). For example, imagine the phrase:

It was a nice movie, click here to read more about it.

Read it again. Now close your eyes and imagine someone reading it out loud. It doesn’t make any sense, does it?

Hyperlinks already carry the meaning that there’s more information behind them. No need to repeat it with “to read about it”. And they also carry the information about being clicked, so no need to say “click here”. And in some interfaces you don’t click, and I can think of already two cases:

  • People using the keyboard and only the keyboard to navigate. They are more than you think. I myself would be doing it much more if it wasn’t so hard on so many broken web sites.
  • People using a phone, like the iPhone. You don’t click, nothing clicks. It’s called tapping.

For computers “click here” doesn’t provide any proper meta-data. There are services that extract a lot of information about links. Google being one example. Let’s analyze what would happen to Google if you do it correctly, like:

It was a nice movie.

That was short, wasn’t it? Half the size and no-nonsense, but I digress. Google would index that link as a “nice movie” and that’s good because you are adding information to the web, you are expressing your opinion and when people search for “nice movie” they are more likely to find the movie you pointed to. Maybe you are the only one believing that’s a nice movie, but when lots of people link to it as a “nice movie”, Google will catch that.

Also, imagine that your page gets turned into plain text, or printed, or spoken, or whatever:

  • It was a nice movie, click here to read more about it.
  • It was a nice movie.

Which one makes more sense?

Now, we can take it a step further. Something else you can do to make your text more readable, more robust and nicer overall is to do more or less proper attribution. I’m not talking about academic proper attribution, I’m taking about simple things. I’ve recently found this sentence in the Stack Overflow article Advice for Computer Science College Students:

I’ve read an article from Joelonsoftware.com a few years agohttp://www.joelonsoftware.com/articles/CollegeAdvice.html

which I promptly edited, thanks to my karma earnings, to be:

I’ve read the article Advice for Computer Science College Students from Joel on Software a few years ago.

Aside from the proper period at the end of a sentence, do you see how and why my version is more readable, contains much more information (while being shorter on text on the screen) and can resist being turned into text, speech or braille? So, next time you write something, please, remember that even if you are using a computer, you are still writing a proper language.

Sometimes the links are so important that you want them to get to a text or spoken version. In that case, imagine how you would write it if you were speaking or writing with a pen on paper:

I really like Joel on Software, which you can read on http://joelonsoftware.com.

which you can then later enhance for the web:

I really like Joel on Software, which you can read on http://joelonsoftware.com.

Now there’s extra information in there. The URL is there three times, one in text, two in hyperlinks. But the text is not longer and it’s not harder to read (unless you pick up hyperlink colors badly) and it gives the user more places to link, machines that look for context information more to pick up from. It’s a win-win.

Reviewed by Daniel Magliola. Thank you!

Pylons or Django?

I am trying to decide whether to use Pylons or Django. Both are frameworks for building Python web applications, but with opposing philosophies.

Django tries to be everything. It comes with its own ORM, its own template engine, its own everything. That gives you a nice developing experience because everything fits together and because very nice applications can be built on top of all those components, like the admin tool, which is amazing. Continue reading