Home > Replicating experimental results
Replicating experimental results
In this section we explain how to replicate the experimental results summarised by Table 1. Specifically, we have a docker image that can be used to:
- Synthesise litmus tests using Memalloy
- Run transactional litmus tests using
litmus7
- Replicate cut-down or full experiments
Getting started
We assume you have Docker installed locally.
-
Download and untar the material for this part of the artifact
$ wget http://johnwickerson.github.com/transactions/artifacts/reproducibility/docker.tgz $ tar zxvf docker.tgz $ cd docker
-
Either (a) download, untar and load our docker image (fast)
# The following file is 657Mb (md5: 33ba9cfb1661ce50c49fcb8b51534c2c) $ wget https://imperialcollegelondon.box.com/shared/static/ct9hntj1zxbushua90798l9g6gb1tvy5.gz -O image.tar.gz $ tar zxvf image.tar.gz $ docker load -i image.tar
or (b) build the image from scratch using
image/Dockerfile
(slow)$ docker build -t pldi18 image
-
Run the image with
results
andrepro
directories mounted (note you should be in thedocker/
directory)$ mkdir results $ docker run \ -v `pwd`/repro:/home/pldi18/memalloy/repro \ -v `pwd`/results:/home/pldi18/memalloy/results \ -it pldi18 /bin/bash
This drops you into a shell running in the container (notice the command prompt change). We’ve given you sudo rights so you can customise your environment with
sudo apt-get install <favourite_tool>
.NB: from here on we’ll use
$ # to denote your host machine shell > # to denote the docker container shell
Troubleshooting
- On Linux systems you may have to invoke
docker
commands withsudo
(this is the default behaviour). The Docker manual has instructions for enabling non-root users to usedocker
commands directly.
- On Linux systems you may have to invoke
Smoketest
To make sure you’re setup correctly, let’s synthesise litmus tests that differentiate weak and strong isolation (discussed in Section 3.3 of the paper).
-
At the docker container shell:
> cd memalloy > ./comparator -arch HW \ -satisfies models/txn_weak_atomicity.cat \ -satisfies models/sc.cat \ -violates models/txn_strong_atomicity.cat \ -events 3 -iter -expect 4 -stamp weak_vs_strong
You should expect output similar to:
============= Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8 21:37:58: Running Alloy, using Glucose (ho) on command 0. 21:37:59: Translation took 0s (3159 vars, 114 primary vars, 9111 clauses). Solving took 0s. 21:37:59: Solution 25 saved to /home/pldi18/memalloy/results/180221-213757/xml/test_25.xml. 21:37:59: No more solutions found. Alloy found 26 solutions in 2.19 sec Remove duplicates [==============================================] 100% Partitioned to 4 unique solutions in 0.59 sec Converting in /home/pldi18/memalloy/results/weak_vs_strong setup 0.04 sec alloy 2.19 sec rmdups 0.59 sec dump 0.32 sec
Troubleshooting
- If you get the error
ERROR: could not create dir structure
then you should ensure the docker container has write permissions to therepro/
andresults/
directories. On your host shell trychmod a+w
on both directories.
- If you get the error
-
Back on your host shell:
$ cd results/weak_vs_strong/png # and open the images in an image viewer
The four images should match Figure 3 of the paper with minor cosmetic differences: (1)
fr
(from-reads) edges are not drawn and (2)po
(program-order) edges are labelledsb
(sequenced-before). You should expect the following:test_27994c473be6881ee6101f16e9d3b39c.png
corresponding to (a)test_0db6f454ece96d9b6aee4ac17ba361a4.png
corresponding to (b)test_5816c2761ca4fc54653f9860483708f0.png
corresponding to (c)test_ca3865307ea520279d4b020ce57b11aa.png
corresponding to (d)
Troubleshooting
- If the images are not in the
results/
directory on your host machine then double-check that the directory was mounted when starting up the container (Step 3 of Getting started)
What’s inside the container
- The directory
memalloy
contains the source and a build of the Memalloy tool for generating (transactional) litmus tests - The directory
herdtools7
contains the source and a build of the herdtools7 suite, including a modifiedlitmus7
tool for running (transactional) litmus tests
Inside memalloy
The main directories of note are:
src
: the ocaml source forcat2als
,pp_comparator
,gen
, andweaken
tools explained belowtop
: containingcomparator.py
andremove_dups.py
explained belowdoc
: generated documentationmodels
: containing.cat
models including all of the models discussed in our paper. In particular,x86tso_txn.cat
,ppc_txn.cat
andaarch64_txn.cat
corresponding to Figures 5, 6, and 7 of the paper, respectively.alloystar
: the Alloy* solver from MIT
Memalloy consists of the top-level comparator.py
script, which given a user query (e.g., weak atomicity vs strong atomicity), drives:
cat2als
: to convert a.cat
model into an equivalent Alloy.als
modelpp_comparator
: to translate the user query into an Alloycomparator.als
problemalloystar
: to solvecomparator.als
and produce a set of.xml
files each representing an execution that satisfies the Alloy problemremove_dups.py
: to remove duplicate.xml
executions (due to symmetry) in a manner similar to Lustig et al. ASPLOS’17gen
: for converting each.xml
execution into.dot
,.png
, and.litmus
files. Thegen
tool can specialise.litmus
output for the X86, PPC and ARMv8 architectures.weaken
: for converting each forbidden execution into a set of allowed executions by applying relaxations as discussed in Section 4.2 of the paper
Inside herdtools7
We made minimal changes to the herdtools7 codebase to allow litmus7
to execute transactional litmus tests generated by Memalloy. To see these changes:
$ cd herdtools7
$ git diff
Note that other tools in the herdtool7 set (e.g., herd7
and diy7
) are not aware of transactions.
If your host machine has Intel TSX (check your processor on Intel Ark; a coarse check is to grep for the string “rtm
” in /proc/cpuinfo
) then you can try out litmus7
on a transactional variant of the Store Buffering litmus test, as follows:
$ cd herdtools7
$ litmus7 doc/SB+txs.litmus
The output will include a histogram of observed results similar to:
Histogram (3 states)
1735 :>0:EBX=0; 1:EBX=0; ok=0;
499114:>0:EBX=1; 1:EBX=0; ok=1;
499151:>0:EBX=0; 1:EBX=1; ok=1;
Note that the execution result 0:EBX=0 /\ 1:EBX=0 /\ ok=1
should not be observable due to the transactions around each thread.
Replicating Table 1
As described in Section 5.3, Table 1 requires (1) the synthesis of litmus tests using Memalloy; and (2) testing on TM-enabled Intel and IBM hardware. If your host machine does not support TM then you will be able to follow (1), but not (2).
-
Synthesis
We recommend running a cut-down experiment that will generate X86 and PPC litmus tests for |E|=2..4. This takes approximately 20 minutes.
> cd memalloy/repro > RESULTS_DIR=results-mini MAX_N=4 scripts/respin.sh
An even smaller experiment can be set with
MAX_N=3
and takes approximately 10 minutes.Raw results are output in the directory
repro/results-mini/fences-as-nodes/<arch>/<N>ev/
where arch isX86
orPPC
, andN
is the number of events. The file<arch>/<N>ev_experiment.log
gives a timed log and the Forbid and Allow set are given in<N>ev/litmus/<arch>/
and<N>ev/allow/litmus/<arch>/
, respectively.The full results for |E|=2..7 can be replicated using the default arguments of the
respin.sh
script. WARNING: this takes approximately 2 days to complete due to the 2h timeout per row of the table. Note this timeout applies only to the Alloy solve stage.> cd memalloy/repro > scripts/respin.sh
Troubleshooting
- If the directory
repro/
does not exist then double-check that the directory was mounted when starting up the container (Step 3 of Getting started)
- If the directory
-
Testing
Skip this step if your host machine does not support TM
If your host machine has Intel TSX (check your processor on Intel Ark; a coarse check is to grep for the string “
rtm
” in/proc/cpuinfo
) then we recommend running a cut-down experiment that will run the generated X86 litmus tests. This takes approximately 5 minutes.> cd memalloy/repro > scripts/mini_test_x86.sh
A series of log files
result_local.log
are output inrepro/results-mini
.Replicating the full set of results requires access to Intel and IBM hardware. In our experiments, we managed this through scripts to sync with remote machines. For each machine
<mach>
it is necessary to:- Build the litmus7 tool locally
- Define a config file
repro/machines/<mach>.cfg
which defines:SERVER
: the ssh address of the machineSERVERDIR
: a temporary directory on the machineARCH
: eitherX86
orPPC
LITMUS7
: the path of the locallitmus7
toolFLAGS
: additionallitmus7
flagsMACHINE
: the tag of the machine
Then the set of machines can be tested:
> cd memalloy/repro > for m in remote/machines/*.cfg; do ./scripts/remote/0_push_experiments.sh ${m}; ./scripts/remote/1_exec_experiments.sh ${m}; ./scripts/remote/2_pull_results.sh ${m}; done
For completeness, we give the scripts and config files (suitably anonymised) in
repro/scripts/remote
. -
Generating latex
If synthesis (step 1) and, optionally, testing (step 2) have successfully finished then the X86 half of Table 1 can be produced:
> cd memalloy/repro > ./scripts/mk_table.py -arch X86 results-mini/fences-as-nodes/X86 > latex/table.tex > cd latex > pdflatex top # and open `top.pdf` in a pdf viewer on your host machine
Comparing your table against Table 1:
- Synthesis time will depend on your host machine but should follow a similar trend. This is the wall clock time reported at the end of each
<N>ev_experiment.log
file. - The number of tests produced (T) for both the Forbid and Allow sets should match those from Table 1.
- Additionally, if testing (step 2) successfully finished then:
- The number of seen tests (S) for the Forbid set should be zero; hence, the number of not seen tests (!S) should equal (T)
- The number of seen (S) and not seen (!S) tests for the Allow set may differ depending on your host machine
The PPC half of Table 1 can be produced similarly (only the synthesis time and number of test produced (T) columns will contain valid data):
> cd memalloy/repro > ./scripts/mk_table.py -arch PPC results-mini/fences-as-nodes/PPC > latex/table.tex > cd latex > pdflatex top # and open `top.pdf` in a pdf viewer on your host machine
- Synthesis time will depend on your host machine but should follow a similar trend. This is the wall clock time reported at the end of each
-
Querying using sql
Similar to our full result dataset you can build an sqlite database of your results:
> cd memalloy/repro > ./scripts/mk_csv.py results-mini > out.csv > sqlite3 < scripts/sql/create.sql
The following query counts the number of Forbid tests that are seen; this should be zero.
> sqlite3 < scripts/sql/query_forbid_seen.sql
Cleaning up
Remove our docker image from your machine:
$ docker stop $(docker ps -a | awk '/pldi18/{print $1}')
$ docker rm $(docker ps -a | awk '/pldi18/{print $1}')
$ docker rmi pldi18