We continue our tutorial on Concuerror, explaining the errors it can detect and the options that you can use to filter out those that are not important for your application.
Read also the first part!
Index
Symbolic names
Concuerror makes an effort to replace every Erlang PID that appears in its report with a symbolic name. The first process spawned is labelled P and every other process is named after the process that spawned it, with an integer denoting the order in which it was spawned i.e. P’s first “child” is named P.1, P.1’s third child is named P.1.3 and so on.
If you prefer to see the raw PIDs you can use
--symbolic_names false
. Concuerror is reusing the same processes, so the
results should be consistent across different interleavings.
Abnormal exits
Every time a process exits abnormally, Concuerror will mark the interleaving as erroneous. This means that any exception that escapes to the top-level will trigger a warning. In our example, processes P, P.1.1 and P.1.1.1 exited abnormally.
Erroneous interleaving 1:
* At step 52 process P exited abnormally
Reason:
{timeout,{gen_server,call,[P.1,stop]}}
Stacktrace:
[{gen_server,call,2,[{file,"gen_server.erl"},{line,182}]},
{poolboy_tests_1,pool_startup,0,
[{file,"poolboy_tests_1.erl"},{line,8}]}]
* At step 76 process P.1.1.1 exited abnormally
Reason:
shutdown
Stacktrace:
[]
* At step 85 process P.1.1 exited abnormally
Reason:
shutdown
Stacktrace:
[{proc_lib,exit_p,2,[{file,"proc_lib.erl"},{line,260}]}]
Ignoring “after” timeouts
If we take a look at the trace we can see that P triggered the standard
timeout clause of a gen
call:
50: P: receive timeout expired after 5000 ms
in gen.erl line 213
[...]
52: P: exits abnormally ({timeout,{gen_server,call,[P.1,stop]}})
As explained here, Concuerror by default assumes that any receive statement may trigger the after clause, unless it is impossible for a matching message not to have already arrived.
However, let’s assume that we don’t care about such timeouts. We can use the
--after_timeout 1000
to treat any timeout higher than 1000ms as
infinity. Notice that the one of the tips we got earlier suggests the same
course of action:
Tip: A process crashed with reason '{timeout, ...}'. This may happen when a call
to a gen_server (or similar) does not receive a reply within some standard
timeout. Use the '--after_timeout' option to treat after clauses that exceed some
threshold as 'impossible'.
Treating abnormal exit reasons as normal
The other two processes exited abnormally because they were terminated by events
generated by the delivery of a stop
message to a supervisor. The supervisor
then sent shutdown
exit signals to the relevant processes. Again, let’s assume
that this is acceptable behaviour in our context, using --treat_as_normal
shutdown
(also suggested by a tip).
Tip: A process crashed with reason 'shutdown'. This may happen when a supervisor
is terminating its children. You can use '--treat_as_normal shutdown' if this is
expected behavior.
A report without problems
We now run Concuerror again, using the new options:
poolboy $ concuerror -f poolboy_tests_1.erl -m poolboy_tests_1 -t pool_startup \
--pa .eunit --after_timeout 1000 --treat_as_normal shutdown
This time the output finishes in:
[...]
Warning: Some abnormal exit reasons were treated as normal ('--treat_as_normal').
Done! (Exit status: completed)
Summary: 0 errors, ... interleavings explored
Concuerror finished the exploration, finding all the different states that the
program can reach. None of these interleavings had any processes terminating
abnormally with a reason different from shutdown
, nor any other errors.
Interleavings as a graph
Concuerror can also produce a
DOT file,
including all the interleavings. This feature can be enabled with --graph <filename>
.
You can then use dot
to convert
the DOT file to an image.
Racing events
The report includes a section where each pair of events that are racing is reported, to justify each additional interleaving. Let’s take a look at the output:
[...]
################################################################################
Race Pairs:
--------------------------------------------------------------------------------
You can disable race pair messages with '--show_races false'
* Exit signal (normal) from P.1 reaches P.1.1
is in race with
Message ({'DOWN',#Ref<0.0.0.508>,process,P.1.1.1,shutdown}) from P.1.1.1 reaches P.1.1
* Message ({'DOWN',#Ref<0.0.0.508>,process,P.1.1.1,shutdown}) from P.1.1.1 reaches P.1.1
is in race with
Timer #Ref<0.0.0.522>: expires, delivering {timeout,#Ref<0.0.0.522>,kill} to P.1.1
* P.1: exits normally
is in race with
P: true = erlang:demonitor(#Ref<0.0.0.407>, [flush])
in gen.erl line 215
We can see e.g. that the delivery of an exit signal from a linked process races
with the delivery of an arbitrary message, (since P.1.1
is trapping exits), a
similar race appears between a message from a timer and finally a process
exiting races with the cancellation of a monitor, as the monitor message will be
sent in one case but not in the other.
The same pair of events may be reported multiple times, but this should happen only if we are exploring the reverse interleaving of a different race.
If you see a pair of events that you believe should not be racing, feel free to report the discrepancy! Concuerror’s race detection accuracy is the most sensitive component regarding the total number of interleavings explored!
“Deadlocked” processes
Since we called poolboy:stop/1
, all processes were properly stopped. What
would happen if we had let a process lingering? We already have such a test,
pool_startup_no_stop/0
(we need to use poolboy:start/1
instead of
start_link
, as the normal exit signal from the initial process also causes a
shutdown).
Running this test, yields:
poolboy $ concuerror -f poolboy_tests_1.erl -m poolboy_tests_1 --pa .eunit \
-t pool_startup_no_stop
Concuerror started at 03 Jun 2014 12:00:00
Writing results in concuerror_report.txt
[...]
Done! (Exit status: warning)
Summary: 1 errors, 1/1 interleavings explored
The output file now contains the following description:
Erroneous interleaving 1:
* Blocked at a 'receive' ("deadlocked"; other processes have exited):
P.1 in gen_server.erl line 360
P.1.1 in gen_server.erl line 360
P.1.1.1 in gen_server.erl line 360
Concuerror reports any processes that have not exited, when no other processes
are available to run. Such processes are blocked at a receive statement and
are considered “deadlocked”. These warnings can be turned off by --ignore_error
deadlock
.
Concuerror will always reset the test to the initial state, so leaving some processes running may be a better alternative to suffering from all the races between exit signals that appear at termination.