Erlang strings and string formatting troubles

I’m currently diving head-first into Erlang and all it’s awesomeness. If you’ve never really jumped in, Erlang may seem kinda… weird. And if you had a closer look, it may seem even weirder. Well, I feel like I’m now slowly getting past the weird phase and start to enjoy it quite a lot.

But there’s still some things that surprise you. For example, how strings are built. When you look at any tutorial, they will tell you that strings are basically just list of integers, which then get interpreted as ASCII (or UTF8) when printing out (which leads to some more weirdness if you actually try to print a list of integers that just accidentally are ASCII-representable).

For those who have never even seen a bit of Erlang, here’s an example:

1> [104,101,108,108,111].
"hello"

Oookay, weird, but if you know it, no big deal. But today, when trying to hack up my own small 30-line test framework, I stumbled across something very strange indeed. A part of said framework should format function names into a constant-width string, so I can then display functions and their results in a tabular manner. But it should also be able to break the “tabular” rule if the function’s name is too long. So my first approach was this:

%% Print out a formatted string containing a string like "module:func()"
%% but padded to a minimum or 40 chars.
show_name(Mod, Name) ->
    Identifier = io_lib:format("~s:~s()", [Mod, Name]),
    Len = max(40, string:len(Identifier)),

    % That weird syntax is similar to a C format string, but
    % in erlang's own weird way. The asterisk signifies that
    % we read the length of the string from the argument list,
    % instead of specifying it directly.
    io:format("~-*s", [Len, Identifier]).

Here’s what some of it’s output looked like:

test_ninetynine:test_dedup()
test_ninetynine:test_dup_to_sublist()
test_ninetynine:test_run_length_encoding
test_ninetynine:test_modified_run_length

When testing the function, it would always chop off the ends of some longer identifiers! What’s going on???

Manually calling string:len() with the formatted identifier as copied from the terminal always turned out right, and passing it to max() seemed to work. However, passing the freshly-generated value Identifier to string:len() returned only 5 - WTF? Like our freshly-generated string is always shorter than 40 chars? Impossible! Can’t be!

Clearly, io_lib:format() returns something really weird. It would output correctly through io:format(), but when we check it’s length, it’s way too short..

So it seems to return something other than a list of integers - It must be! Kinda makes sense, string formatting can be a performance bottleneck.. So let’s investigate! This is the moment when you should break the “write module, compile, test” cycle and do some interactive testing.

1> Identifier = io_lib:format("~s:~s()", [some_module, some_function]).
["some_module",58,"some_function",40,41]
2> string:len(Identifier).
5
3> io:format("~s~n", [Identifier]).
some_module:some_function()

Aha! Already at the first step can we see that io_lib:format() doesn’t return a fully formatted string. Instead, it returns a list that contains some substrings, then some characters, some more strings, and more characters.

Well, mystery solved! But just to confirm, let’s continue: string:len() is quite naive and just returns the number of elements of the given list, which indeed is 5. Damn! I’d be glad if somebody could explain this to me. Why is this function not returning the actual number of characters in the string? I’ll clearly need to learn a lot more!

Anyway, with the new knowledge, I think I can now at least “fix” my broken show_name function:

%% Print out a formatted string containing a string like "module:func()"
%% but padded to a minimum or 40 chars.
show_name(Mod, Name) ->
    % feed the "formatted" string through the flatten() function
    % so we can then actually count the ACTUAL number of chars..
    Identifier = lists:flatten(io_lib:format("~s:~s()", [Mod, Name])),
    Len = max(40, string:len(Identifier)),

    % That weird syntax is similar to a C format string, but
    % in erlang's own weird way. The asterisk signifies that
    % we read the length of the string from the argument list,
    % instead of specifying it directly.
    io:format("~-*s", [Len, Identifier]).

Yep, that seems to work! And now, since I think it may contain some more useful nuggets - the rest of my test runner module:

-module(runtests).
-export([test_module/1]).

%%% A module for running test functions from another module.
%%%
%%% This module is capable of finding all exported test functions
%%% in a given module (identified by arity 0 and the "test_" prefix.
%%%
%%% Compile the modules you want to test, then call test_module/1 to
%%% execute all tests.

show_name(Mod, Name) ->
    Identifier = lists:flatten(io_lib:format("~s:~s()", [Mod, Name])),
    Len = max(40, string:len(Identifier)),
    io:format("~-*s ->", [Len, Identifier]).

%% Runs a test case, prints its result, then returns the result
do_test(Mod, Name) ->
    show_name(Mod, Name),
    Result = Mod:Name(),
    io:format(" ~w~n", [Result]),
    Result.

is_test(Name, Arity) ->
    Arity == 0
    andalso
    string:str(atom_to_list(Name), "test_") == 1.

%% Runs all the tests
test_module(Mod) ->
    Tests = [
         F ||
         {F, A} <- Mod:module_info(exports),
         is_test(F, A)
    ],
    lists:foldl(
        fun (T, R) -> do_test(Mod, T) and R end,
        true,
        Tests
    ).

Just to show you how it looks now, some final lines of output when running it:

29> runtests:test_module(test_ninetynine).
test_ninetynine:test_last()              -> true
test_ninetynine:test_last_two()          -> true
test_ninetynine:test_dup_to_sublist()    -> true
test_ninetynine:test_run_length_encoding() -> true
test_ninetynine:test_modified_run_length_encoding() -> true
test_ninetynine:test_decode_rle()        -> true

Lesson learned! Don’t assume anything at all when learning a new language! Things may be very different indeed, even if you expect it to be a trivial matter.