Home > Replicating experimental results
Replicating experimental results
In this section we explain how to replicate the experimental results summarised by Table 1. Specifically, we have a docker image that can be used to:
- Synthesise litmus tests using Memalloy
- Run transactional litmus tests using
litmus7 - Replicate cut-down or full experiments
Getting started
We assume you have Docker installed locally.
-
Download and untar the material for this part of the artifact
$ wget http://johnwickerson.github.com/transactions/artifacts/reproducibility/docker.tgz $ tar zxvf docker.tgz $ cd docker -
Either (a) download, untar and load our docker image (fast)
# The following file is 657Mb (md5: 33ba9cfb1661ce50c49fcb8b51534c2c) $ wget https://imperialcollegelondon.box.com/shared/static/ct9hntj1zxbushua90798l9g6gb1tvy5.gz -O image.tar.gz $ tar zxvf image.tar.gz $ docker load -i image.taror (b) build the image from scratch using
image/Dockerfile(slow)$ docker build -t pldi18 image -
Run the image with
resultsandreprodirectories mounted (note you should be in thedocker/directory)$ mkdir results $ docker run \ -v `pwd`/repro:/home/pldi18/memalloy/repro \ -v `pwd`/results:/home/pldi18/memalloy/results \ -it pldi18 /bin/bashThis drops you into a shell running in the container (notice the command prompt change). We’ve given you sudo rights so you can customise your environment with
sudo apt-get install <favourite_tool>.NB: from here on we’ll use
$ # to denote your host machine shell > # to denote the docker container shellTroubleshooting
- On Linux systems you may have to invoke
dockercommands withsudo(this is the default behaviour). The Docker manual has instructions for enabling non-root users to usedockercommands directly.
- On Linux systems you may have to invoke
Smoketest
To make sure you’re setup correctly, let’s synthesise litmus tests that differentiate weak and strong isolation (discussed in Section 3.3 of the paper).
-
At the docker container shell:
> cd memalloy > ./comparator -arch HW \ -satisfies models/txn_weak_atomicity.cat \ -satisfies models/sc.cat \ -violates models/txn_strong_atomicity.cat \ -events 3 -iter -expect 4 -stamp weak_vs_strongYou should expect output similar to:
============= Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8 21:37:58: Running Alloy, using Glucose (ho) on command 0. 21:37:59: Translation took 0s (3159 vars, 114 primary vars, 9111 clauses). Solving took 0s. 21:37:59: Solution 25 saved to /home/pldi18/memalloy/results/180221-213757/xml/test_25.xml. 21:37:59: No more solutions found. Alloy found 26 solutions in 2.19 sec Remove duplicates [==============================================] 100% Partitioned to 4 unique solutions in 0.59 sec Converting in /home/pldi18/memalloy/results/weak_vs_strong setup 0.04 sec alloy 2.19 sec rmdups 0.59 sec dump 0.32 secTroubleshooting
- If you get the error
ERROR: could not create dir structurethen you should ensure the docker container has write permissions to therepro/andresults/directories. On your host shell trychmod a+won both directories.
- If you get the error
-
Back on your host shell:
$ cd results/weak_vs_strong/png # and open the images in an image viewerThe four images should match Figure 3 of the paper with minor cosmetic differences: (1)
fr(from-reads) edges are not drawn and (2)po(program-order) edges are labelledsb(sequenced-before). You should expect the following:test_27994c473be6881ee6101f16e9d3b39c.pngcorresponding to (a)test_0db6f454ece96d9b6aee4ac17ba361a4.pngcorresponding to (b)test_5816c2761ca4fc54653f9860483708f0.pngcorresponding to (c)test_ca3865307ea520279d4b020ce57b11aa.pngcorresponding to (d)
Troubleshooting
- If the images are not in the
results/directory on your host machine then double-check that the directory was mounted when starting up the container (Step 3 of Getting started)
What’s inside the container
- The directory
memalloycontains the source and a build of the Memalloy tool for generating (transactional) litmus tests - The directory
herdtools7contains the source and a build of the herdtools7 suite, including a modifiedlitmus7tool for running (transactional) litmus tests
Inside memalloy
The main directories of note are:
src: the ocaml source forcat2als,pp_comparator,gen, andweakentools explained belowtop: containingcomparator.pyandremove_dups.pyexplained belowdoc: generated documentationmodels: containing.catmodels including all of the models discussed in our paper. In particular,x86tso_txn.cat,ppc_txn.catandaarch64_txn.catcorresponding to Figures 5, 6, and 7 of the paper, respectively.alloystar: the Alloy* solver from MIT
Memalloy consists of the top-level comparator.py script, which given a user query (e.g., weak atomicity vs strong atomicity), drives:
cat2als: to convert a.catmodel into an equivalent Alloy.alsmodelpp_comparator: to translate the user query into an Alloycomparator.alsproblemalloystar: to solvecomparator.alsand produce a set of.xmlfiles each representing an execution that satisfies the Alloy problemremove_dups.py: to remove duplicate.xmlexecutions (due to symmetry) in a manner similar to Lustig et al. ASPLOS’17gen: for converting each.xmlexecution into.dot,.png, and.litmusfiles. Thegentool can specialise.litmusoutput for the X86, PPC and ARMv8 architectures.weaken: for converting each forbidden execution into a set of allowed executions by applying relaxations as discussed in Section 4.2 of the paper
Inside herdtools7
We made minimal changes to the herdtools7 codebase to allow litmus7 to execute transactional litmus tests generated by Memalloy. To see these changes:
$ cd herdtools7
$ git diff
Note that other tools in the herdtool7 set (e.g., herd7 and diy7) are not aware of transactions.
If your host machine has Intel TSX (check your processor on Intel Ark; a coarse check is to grep for the string “rtm” in /proc/cpuinfo) then you can try out litmus7 on a transactional variant of the Store Buffering litmus test, as follows:
$ cd herdtools7
$ litmus7 doc/SB+txs.litmus
The output will include a histogram of observed results similar to:
Histogram (3 states)
1735 :>0:EBX=0; 1:EBX=0; ok=0;
499114:>0:EBX=1; 1:EBX=0; ok=1;
499151:>0:EBX=0; 1:EBX=1; ok=1;
Note that the execution result 0:EBX=0 /\ 1:EBX=0 /\ ok=1 should not be observable due to the transactions around each thread.
Replicating Table 1
As described in Section 5.3, Table 1 requires (1) the synthesis of litmus tests using Memalloy; and (2) testing on TM-enabled Intel and IBM hardware. If your host machine does not support TM then you will be able to follow (1), but not (2).
-
Synthesis
We recommend running a cut-down experiment that will generate X86 and PPC litmus tests for |E|=2..4. This takes approximately 20 minutes.
> cd memalloy/repro > RESULTS_DIR=results-mini MAX_N=4 scripts/respin.shAn even smaller experiment can be set with
MAX_N=3and takes approximately 10 minutes.Raw results are output in the directory
repro/results-mini/fences-as-nodes/<arch>/<N>ev/where arch isX86orPPC, andNis the number of events. The file<arch>/<N>ev_experiment.loggives a timed log and the Forbid and Allow set are given in<N>ev/litmus/<arch>/and<N>ev/allow/litmus/<arch>/, respectively.The full results for |E|=2..7 can be replicated using the default arguments of the
respin.shscript. WARNING: this takes approximately 2 days to complete due to the 2h timeout per row of the table. Note this timeout applies only to the Alloy solve stage.> cd memalloy/repro > scripts/respin.shTroubleshooting
- If the directory
repro/does not exist then double-check that the directory was mounted when starting up the container (Step 3 of Getting started)
- If the directory
-
Testing
Skip this step if your host machine does not support TM
If your host machine has Intel TSX (check your processor on Intel Ark; a coarse check is to grep for the string “
rtm” in/proc/cpuinfo) then we recommend running a cut-down experiment that will run the generated X86 litmus tests. This takes approximately 5 minutes.> cd memalloy/repro > scripts/mini_test_x86.shA series of log files
result_local.logare output inrepro/results-mini.Replicating the full set of results requires access to Intel and IBM hardware. In our experiments, we managed this through scripts to sync with remote machines. For each machine
<mach>it is necessary to:- Build the litmus7 tool locally
- Define a config file
repro/machines/<mach>.cfgwhich defines:SERVER: the ssh address of the machineSERVERDIR: a temporary directory on the machineARCH: eitherX86orPPCLITMUS7: the path of the locallitmus7toolFLAGS: additionallitmus7flagsMACHINE: the tag of the machine
Then the set of machines can be tested:
> cd memalloy/repro > for m in remote/machines/*.cfg; do ./scripts/remote/0_push_experiments.sh ${m}; ./scripts/remote/1_exec_experiments.sh ${m}; ./scripts/remote/2_pull_results.sh ${m}; doneFor completeness, we give the scripts and config files (suitably anonymised) in
repro/scripts/remote. -
Generating latex
If synthesis (step 1) and, optionally, testing (step 2) have successfully finished then the X86 half of Table 1 can be produced:
> cd memalloy/repro > ./scripts/mk_table.py -arch X86 results-mini/fences-as-nodes/X86 > latex/table.tex > cd latex > pdflatex top # and open `top.pdf` in a pdf viewer on your host machineComparing your table against Table 1:
- Synthesis time will depend on your host machine but should follow a similar trend. This is the wall clock time reported at the end of each
<N>ev_experiment.logfile. - The number of tests produced (T) for both the Forbid and Allow sets should match those from Table 1.
- Additionally, if testing (step 2) successfully finished then:
- The number of seen tests (S) for the Forbid set should be zero; hence, the number of not seen tests (!S) should equal (T)
- The number of seen (S) and not seen (!S) tests for the Allow set may differ depending on your host machine
The PPC half of Table 1 can be produced similarly (only the synthesis time and number of test produced (T) columns will contain valid data):
> cd memalloy/repro > ./scripts/mk_table.py -arch PPC results-mini/fences-as-nodes/PPC > latex/table.tex > cd latex > pdflatex top # and open `top.pdf` in a pdf viewer on your host machine - Synthesis time will depend on your host machine but should follow a similar trend. This is the wall clock time reported at the end of each
-
Querying using sql
Similar to our full result dataset you can build an sqlite database of your results:
> cd memalloy/repro > ./scripts/mk_csv.py results-mini > out.csv > sqlite3 < scripts/sql/create.sqlThe following query counts the number of Forbid tests that are seen; this should be zero.
> sqlite3 < scripts/sql/query_forbid_seen.sql
Cleaning up
Remove our docker image from your machine:
$ docker stop $(docker ps -a | awk '/pldi18/{print $1}')
$ docker rm $(docker ps -a | awk '/pldi18/{print $1}')
$ docker rmi pldi18