Erlang, the language for network programming Issue 2: binary pattern matching

Much is being said about the excellent capabilities of Erlang to write distributed fault-tolerant programs, but little has been said about how easy and fun it is to write servers (those programs at the other end of the line) with it. And by easy I don’t just mean that you can put up a web server in two lines of code and hope it’ll work, I mean it’ll be easy to built robust servers.

One example of this is ejabberd, a free Jabber server.

I’ll start this second part, the one with real networking programming, with a bet. Think about the IPv4 protocol, its header is like this:

0                   1                   2                   30
  1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version|  IHL  |Type of Service|          Total Length         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|         Identification        |Flags|      Fragment Offset    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Time to Live |    Protocol   |         Header Checksum       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       Source Address                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Destination Address                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Options                    |    Padding    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

you can check RFC791, page 11 for more information. At a glance, the first 4 bits are the version, the next 4 bits the IHL (Internet Header Length), then we have a whole byte, 8 bits, of Type of Service. The next two bytes are the total length and I am already tired of it, you get the picture right?

Pick whatever language you want (except Erlang, that’s mine now, but it can be yours latter) and think about how many lines of code would take you to parse that beast, the IP header. Think about how much time it takes you to write those lines, and test them.

Done? come on! really think about it, otherwise the game is boring. Close your eyes, picture the lines of code. If you can’t, go and write some pseudo-code similar to your favorite language to do the parsing. Done? OK.

Here’s my bet: I bet that I can do it, in Erlang, in far less lines than you! I bet you that I can code it so fast that I’d be finished of writing the code to parse the whole header before you finish the code to parse the first line. And while you are testing I’ll go to the beach because I’ll just trust my code to run without problems.

Do you think I am crazy? I’ll confirm it with another bet: I’ll bet that after reading this article you’ll be able to do the same super-programming that I claimed capable of in the previous paragraph. Keep reading!

One of the Erlang features that really help us write servers is binary pattern matching. To understand it, first you need to understand pattern matching; for that you can read the previous issue. Erlang provides a way to write binary data directly on the source code:

<<"hello">>

That is a binary containing the string “hello” as ASCII. In hexadecimal notation it’d be: 68 65 6c 6c 6f; in decimal: 104 101 108 108 111; and in binary: 01101000 01100101 01101100 01101100 01101111. Another one:

<<1, 2, 3>>

this one contains three bytes, the first one being 1, the second being 2 and the third being 3; in hexadecimal: 01 02 03. So far, nothing impressive, let’s get there:

<<1, 2, 3:16>>

it contains four bytes, the first and second are 1 and 2 respectively. The third and fourth both form a 3, so in hexadecimal: 01 02 00 03. The 16 after the colon specifies how many bits the previous value will use (the default for integers is 8).

What did you say? integers? Yes. Is that a type? Yes. You can also have types, for instance

<<1/integer, 2.34/float>>

will generate, in hexadecimal, 01 40 02 b8 51 eb 85 1e b8 (the standard size for a float is 64 bits). We can also define endianness, sigdness and unit. Enough of that.

Binaries, as any other structure like lists or tuples, can participate in pattern matching. Let’s do some pattern matching in binaries:

Packet = <<"Erlang is a general-purpose programming language.">>,<<A, B:16, C:32, D:64, E/binary>> = Packet.

When developing a program, Packet would normally come from a file or a network connection, here I just defined it so you can see its contents. The second line breaks Packet into various different pieces and assigns identifiers to them (A, B, C, D and E). Let’s see the results:

> A.69
> B.29292
> C.1634625312
> D.
7598452597831722350
> E.
<<"eral-purpose programming language.">>

Isn’t it cool? Think about how many lines of code would have taken you to do the same in any other programming language (if you find anything that can beat Erlang at doing that in one line of code, I am interested in taking a look at it).

A typical problem when learning to code is: how do I check if the third bit is 1 or 0 on that byte? In Erlang:

check_third(<<_:2, 1:1, _:5>>)
    -> "It's a one";
check_third(_)
    -> "It's a zero".

If the parameter to check_third has a 1 in its third bit, we’ll get “It’s a one”, otherwise, “It’s a zero”. If you need to do this inside a function (without calling a function, you can use case, another Erlang construct but I am not going to describe here).

You can use the identifiers generated in a pattern in the pattern itself:

<<Size:8, String:Size/binary-unit:8>>

reads as:

The first 8 bits, a byte, will be taken, as a integer and named Size. String will be measured in binary-units of size 8 bits (bytes) and it will contain Size of those units.

For example:

> <<Size:8, String:Size/binary-unit:8>> = <<5, "hello">>.<<5,104,101,108,108,111>>> Size.5> String.<<"hello">>
> binary_to_list(String).
"hello"

And if the string is shorter or longer, we’ll have a pattern mismatch, a clean exception. The last call turns the binary into a list (strings are list of characters in Erlang).

By now, you may start to imagine why I’ve made such a bet at the beginning of this document. Here you can see my line that will match an IPv4 header:

<<Version:4, IHL:4, TypeOfService:8, TotalLength:16,  Identification:16, FlagX:1, FlagD:1, FlagM:1,  FragmentOffset:13, TTL:8, Protocol:8,  HeaderCheckSum:16, SourceAddress:32, DestinationAddress:32, Rest/binary>> = Packet.

Impresive isn’t ? If you want to see more I can recommend you my unfinished DNS parser in the Serlvers project.

Advertisements

One thought on “Erlang, the language for network programming Issue 2: binary pattern matching

  1. Excellent! I worked on half a dozen other languages after zeroing on Erlang for a simple ip decoding program. A swiss army knife for network programming.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s