Home > Replicating experimental results

Replicating experimental results

In this section we explain how to replicate the experimental results summarised by Table 1. Specifically, we have a docker image that can be used to:

Synthesise litmus tests using Memalloy
Run transactional litmus tests using litmus7
Replicate cut-down or full experiments

Getting started

We assume you have Docker installed locally.

Download and untar the material for this part of the artifact

$ wget http://johnwickerson.github.com/transactions/artifacts/reproducibility/docker.tgz
$ tar zxvf docker.tgz
$ cd docker

Either (a) download, untar and load our docker image (fast)

# The following file is 657Mb (md5: 33ba9cfb1661ce50c49fcb8b51534c2c)
$ wget https://imperialcollegelondon.box.com/shared/static/ct9hntj1zxbushua90798l9g6gb1tvy5.gz -O image.tar.gz
$ tar zxvf image.tar.gz
$ docker load -i image.tar

or (b) build the image from scratch using image/Dockerfile (slow)

$ docker build -t pldi18 image

Run the image with results and repro directories mounted (note you should be in the docker/ directory)
```
$ mkdir results
$ docker run \
  -v `pwd`/repro:/home/pldi18/memalloy/repro \
  -v `pwd`/results:/home/pldi18/memalloy/results \
  -it pldi18 /bin/bash
```
This drops you into a shell running in the container (notice the command prompt change). We’ve given you sudo rights so you can customise your environment with sudo apt-get install <favourite_tool>.

NB: from here on we’ll use
```
$   # to denote your host machine shell
>   # to denote the docker container shell
```
Troubleshooting
- On Linux systems you may have to invoke docker commands with sudo (this is the default behaviour). The Docker manual has instructions for enabling non-root users to use docker commands directly.

Smoketest

To make sure you’re setup correctly, let’s synthesise litmus tests that differentiate weak and strong isolation (discussed in Section 3.3 of the paper).

At the docker container shell:

> cd memalloy
> ./comparator -arch HW \
  -satisfies models/txn_weak_atomicity.cat \
  -satisfies models/sc.cat \
  -violates models/txn_strong_atomicity.cat \
  -events 3 -iter -expect 4 -stamp weak_vs_strong

You should expect output similar to:

=============
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8
21:37:58: Running Alloy, using Glucose (ho) on command 0.
21:37:59: Translation took 0s (3159 vars, 114 primary vars, 9111 clauses).
Solving took 0s.
21:37:59: Solution 25 saved to /home/pldi18/memalloy/results/180221-213757/xml/test_25.xml.
21:37:59: No more solutions found.
Alloy found 26 solutions in 2.19 sec
Remove duplicates
[==============================================] 100%
Partitioned to 4 unique solutions in 0.59 sec
Converting in /home/pldi18/memalloy/results/weak_vs_strong
setup     0.04 sec
alloy     2.19 sec
rmdups    0.59 sec
dump      0.32 sec

Troubleshooting

If you get the error ERROR: could not create dir structure then you should ensure the docker container has write permissions to the repro/ and results/ directories. On your host shell try chmod a+w on both directories.

Back on your host shell:
```
$ cd results/weak_vs_strong/png
# and open the images in an image viewer
```
The four images should match Figure 3 of the paper with minor cosmetic differences: (1) fr (from-reads) edges are not drawn and (2) po (program-order) edges are labelled sb (sequenced-before). You should expect the following:
- test_27994c473be6881ee6101f16e9d3b39c.png corresponding to (a)
- test_0db6f454ece96d9b6aee4ac17ba361a4.png corresponding to (b)
- test_5816c2761ca4fc54653f9860483708f0.png corresponding to (c)
- test_ca3865307ea520279d4b020ce57b11aa.png corresponding to (d)
Troubleshooting
- If the images are not in the results/ directory on your host machine then double-check that the directory was mounted when starting up the container (Step 3 of Getting started)

What’s inside the container

The directory memalloy contains the source and a build of the Memalloy tool for generating (transactional) litmus tests
The directory herdtools7 contains the source and a build of the herdtools7 suite, including a modified litmus7 tool for running (transactional) litmus tests

Inside `memalloy`

The main directories of note are:

src: the ocaml source for cat2als, pp_comparator, gen, and weaken tools explained below
top: containing comparator.py and remove_dups.py explained below
doc: generated documentation
models: containing .cat models including all of the models discussed in our paper. In particular, x86tso_txn.cat, ppc_txn.cat and aarch64_txn.cat corresponding to Figures 5, 6, and 7 of the paper, respectively.
alloystar: the Alloy* solver from MIT

Memalloy consists of the top-level comparator.py script, which given a user query (e.g., weak atomicity vs strong atomicity), drives:

cat2als: to convert a .cat model into an equivalent Alloy .als model
pp_comparator: to translate the user query into an Alloy comparator.als problem
alloystar: to solve comparator.als and produce a set of .xml files each representing an execution that satisfies the Alloy problem
remove_dups.py: to remove duplicate .xml executions (due to symmetry) in a manner similar to Lustig et al. ASPLOS’17
gen: for converting each .xml execution into .dot, .png, and .litmus files. The gen tool can specialise .litmus output for the X86, PPC and ARMv8 architectures.
weaken: for converting each forbidden execution into a set of allowed executions by applying relaxations as discussed in Section 4.2 of the paper

Inside `herdtools7`

We made minimal changes to the herdtools7 codebase to allow litmus7 to execute transactional litmus tests generated by Memalloy. To see these changes:

    $ cd herdtools7
    $ git diff

Note that other tools in the herdtool7 set (e.g., herd7 and diy7) are not aware of transactions.

If your host machine has Intel TSX (check your processor on Intel Ark; a coarse check is to grep for the string “rtm” in /proc/cpuinfo) then you can try out litmus7 on a transactional variant of the Store Buffering litmus test, as follows:

    $ cd herdtools7
    $ litmus7 doc/SB+txs.litmus

The output will include a histogram of observed results similar to:

    Histogram (3 states)
    1735  :>0:EBX=0; 1:EBX=0; ok=0;
    499114:>0:EBX=1; 1:EBX=0; ok=1;
    499151:>0:EBX=0; 1:EBX=1; ok=1;

Note that the execution result 0:EBX=0 /\ 1:EBX=0 /\ ok=1 should not be observable due to the transactions around each thread.

Replicating Table 1

As described in Section 5.3, Table 1 requires (1) the synthesis of litmus tests using Memalloy; and (2) testing on TM-enabled Intel and IBM hardware. If your host machine does not support TM then you will be able to follow (1), but not (2).

Synthesis

We recommend running a cut-down experiment that will generate X86 and PPC litmus tests for |E|=2..4. This takes approximately 20 minutes.
```
> cd memalloy/repro
> RESULTS_DIR=results-mini MAX_N=4 scripts/respin.sh
```
An even smaller experiment can be set with MAX_N=3 and takes approximately 10 minutes.

Raw results are output in the directory repro/results-mini/fences-as-nodes/<arch>/<N>ev/ where arch is X86 or PPC, and N is the number of events. The file <arch>/<N>ev_experiment.log gives a timed log and the Forbid and Allow set are given in <N>ev/litmus/<arch>/ and <N>ev/allow/litmus/<arch>/, respectively.

The full results for |E|=2..7 can be replicated using the default arguments of the respin.sh script. WARNING: this takes approximately 2 days to complete due to the 2h timeout per row of the table. Note this timeout applies only to the Alloy solve stage.
```
> cd memalloy/repro
> scripts/respin.sh
```
Troubleshooting
- If the directory repro/ does not exist then double-check that the directory was mounted when starting up the container (Step 3 of Getting started)
Testing

Skip this step if your host machine does not support TM

If your host machine has Intel TSX (check your processor on Intel Ark; a coarse check is to grep for the string “rtm” in /proc/cpuinfo) then we recommend running a cut-down experiment that will run the generated X86 litmus tests. This takes approximately 5 minutes.
```
> cd memalloy/repro
> scripts/mini_test_x86.sh
```
A series of log files result_local.log are output in repro/results-mini.

Replicating the full set of results requires access to Intel and IBM hardware. In our experiments, we managed this through scripts to sync with remote machines. For each machine <mach> it is necessary to:
- Build the litmus7 tool locally
- Define a config file repro/machines/<mach>.cfg which defines:
  - SERVER: the ssh address of the machine
  - SERVERDIR: a temporary directory on the machine
  - ARCH: either X86 or PPC
  - LITMUS7: the path of the local litmus7 tool
  - FLAGS: additional litmus7 flags
  - MACHINE: the tag of the machine
Then the set of machines can be tested:
```
> cd memalloy/repro
> for m in remote/machines/*.cfg; do
    ./scripts/remote/0_push_experiments.sh ${m};
    ./scripts/remote/1_exec_experiments.sh ${m};
    ./scripts/remote/2_pull_results.sh ${m};
  done
```
For completeness, we give the scripts and config files (suitably anonymised) in repro/scripts/remote.
Generating latex

If synthesis (step 1) and, optionally, testing (step 2) have successfully finished then the X86 half of Table 1 can be produced:
```
> cd memalloy/repro
> ./scripts/mk_table.py -arch X86 results-mini/fences-as-nodes/X86 > latex/table.tex
> cd latex
> pdflatex top
# and open `top.pdf` in a pdf viewer on your host machine
```
Comparing your table against Table 1:
- Synthesis time will depend on your host machine but should follow a similar trend. This is the wall clock time reported at the end of each <N>ev_experiment.log file.
- The number of tests produced (T) for both the Forbid and Allow sets should match those from Table 1.
- Additionally, if testing (step 2) successfully finished then:
  - The number of seen tests (S) for the Forbid set should be zero; hence, the number of not seen tests (!S) should equal (T)
  - The number of seen (S) and not seen (!S) tests for the Allow set may differ depending on your host machine
The PPC half of Table 1 can be produced similarly (only the synthesis time and number of test produced (T) columns will contain valid data):
```
> cd memalloy/repro
> ./scripts/mk_table.py -arch PPC results-mini/fences-as-nodes/PPC > latex/table.tex
> cd latex
> pdflatex top
# and open `top.pdf` in a pdf viewer on your host machine
```
Querying using sql

Similar to our full result dataset you can build an sqlite database of your results:
```
> cd memalloy/repro
> ./scripts/mk_csv.py results-mini > out.csv
> sqlite3 < scripts/sql/create.sql
```
The following query counts the number of Forbid tests that are seen; this should be zero.
```
> sqlite3 < scripts/sql/query_forbid_seen.sql
```

Cleaning up

Remove our docker image from your machine:

    $ docker stop $(docker ps -a | awk '/pldi18/{print $1}')
    $ docker rm $(docker ps -a | awk '/pldi18/{print $1}')
    $ docker rmi pldi18