Home > Replicating experimental results

Replicating experimental results

In this section we explain how to replicate the experimental results summarised by Table 1. Specifically, we have a docker image that can be used to:

Getting started

We assume you have Docker installed locally.

  1. Download and untar the material for this part of the artifact

    $ wget http://johnwickerson.github.com/transactions/artifacts/reproducibility/docker.tgz
    $ tar zxvf docker.tgz
    $ cd docker
    
  2. Either (a) download, untar and load our docker image (fast)

    # The following file is 657Mb (md5: 33ba9cfb1661ce50c49fcb8b51534c2c)
    $ wget https://imperialcollegelondon.box.com/shared/static/ct9hntj1zxbushua90798l9g6gb1tvy5.gz -O image.tar.gz
    $ tar zxvf image.tar.gz
    $ docker load -i image.tar
    

    or (b) build the image from scratch using image/Dockerfile (slow)

    $ docker build -t pldi18 image
    
  3. Run the image with results and repro directories mounted (note you should be in the docker/ directory)

    $ mkdir results
    $ docker run \
      -v `pwd`/repro:/home/pldi18/memalloy/repro \
      -v `pwd`/results:/home/pldi18/memalloy/results \
      -it pldi18 /bin/bash
    

    This drops you into a shell running in the container (notice the command prompt change). We’ve given you sudo rights so you can customise your environment with sudo apt-get install <favourite_tool>.

    NB: from here on we’ll use

    $   # to denote your host machine shell
    >   # to denote the docker container shell
    

    Troubleshooting

    • On Linux systems you may have to invoke docker commands with sudo (this is the default behaviour). The Docker manual has instructions for enabling non-root users to use docker commands directly.

Smoketest

To make sure you’re setup correctly, let’s synthesise litmus tests that differentiate weak and strong isolation (discussed in Section 3.3 of the paper).

  1. At the docker container shell:

    > cd memalloy
    > ./comparator -arch HW \
      -satisfies models/txn_weak_atomicity.cat \
      -satisfies models/sc.cat \
      -violates models/txn_strong_atomicity.cat \
      -events 3 -iter -expect 4 -stamp weak_vs_strong
    

    You should expect output similar to:

    =============
    Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8
    21:37:58: Running Alloy, using Glucose (ho) on command 0.
    21:37:59: Translation took 0s (3159 vars, 114 primary vars, 9111 clauses).
    Solving took 0s.
    21:37:59: Solution 25 saved to /home/pldi18/memalloy/results/180221-213757/xml/test_25.xml.
    21:37:59: No more solutions found.
    Alloy found 26 solutions in 2.19 sec
    Remove duplicates
    [==============================================] 100%
    Partitioned to 4 unique solutions in 0.59 sec
    Converting in /home/pldi18/memalloy/results/weak_vs_strong
    setup     0.04 sec
    alloy     2.19 sec
    rmdups    0.59 sec
    dump      0.32 sec
    

    Troubleshooting

    • If you get the error ERROR: could not create dir structure then you should ensure the docker container has write permissions to the repro/ and results/ directories. On your host shell try chmod a+w on both directories.
  2. Back on your host shell:

    $ cd results/weak_vs_strong/png
    # and open the images in an image viewer
    

    The four images should match Figure 3 of the paper with minor cosmetic differences: (1) fr (from-reads) edges are not drawn and (2) po (program-order) edges are labelled sb (sequenced-before). You should expect the following:

    • test_27994c473be6881ee6101f16e9d3b39c.png corresponding to (a)
    • test_0db6f454ece96d9b6aee4ac17ba361a4.png corresponding to (b)
    • test_5816c2761ca4fc54653f9860483708f0.png corresponding to (c)
    • test_ca3865307ea520279d4b020ce57b11aa.png corresponding to (d)

    Troubleshooting

    • If the images are not in the results/ directory on your host machine then double-check that the directory was mounted when starting up the container (Step 3 of Getting started)

What’s inside the container

Inside memalloy

The main directories of note are:

Memalloy consists of the top-level comparator.py script, which given a user query (e.g., weak atomicity vs strong atomicity), drives:

Inside herdtools7

We made minimal changes to the herdtools7 codebase to allow litmus7 to execute transactional litmus tests generated by Memalloy. To see these changes:

    $ cd herdtools7
    $ git diff

Note that other tools in the herdtool7 set (e.g., herd7 and diy7) are not aware of transactions.

If your host machine has Intel TSX (check your processor on Intel Ark; a coarse check is to grep for the string “rtm” in /proc/cpuinfo) then you can try out litmus7 on a transactional variant of the Store Buffering litmus test, as follows:

    $ cd herdtools7
    $ litmus7 doc/SB+txs.litmus

The output will include a histogram of observed results similar to:

    Histogram (3 states)
    1735  :>0:EBX=0; 1:EBX=0; ok=0;
    499114:>0:EBX=1; 1:EBX=0; ok=1;
    499151:>0:EBX=0; 1:EBX=1; ok=1;

Note that the execution result 0:EBX=0 /\ 1:EBX=0 /\ ok=1 should not be observable due to the transactions around each thread.

Replicating Table 1

As described in Section 5.3, Table 1 requires (1) the synthesis of litmus tests using Memalloy; and (2) testing on TM-enabled Intel and IBM hardware. If your host machine does not support TM then you will be able to follow (1), but not (2).

  1. Synthesis

    We recommend running a cut-down experiment that will generate X86 and PPC litmus tests for |E|=2..4. This takes approximately 20 minutes.

    > cd memalloy/repro
    > RESULTS_DIR=results-mini MAX_N=4 scripts/respin.sh
    

    An even smaller experiment can be set with MAX_N=3 and takes approximately 10 minutes.

    Raw results are output in the directory repro/results-mini/fences-as-nodes/<arch>/<N>ev/ where arch is X86 or PPC, and N is the number of events. The file <arch>/<N>ev_experiment.log gives a timed log and the Forbid and Allow set are given in <N>ev/litmus/<arch>/ and <N>ev/allow/litmus/<arch>/, respectively.

    The full results for |E|=2..7 can be replicated using the default arguments of the respin.sh script. WARNING: this takes approximately 2 days to complete due to the 2h timeout per row of the table. Note this timeout applies only to the Alloy solve stage.

    > cd memalloy/repro
    > scripts/respin.sh
    

    Troubleshooting

    • If the directory repro/ does not exist then double-check that the directory was mounted when starting up the container (Step 3 of Getting started)
  2. Testing

    Skip this step if your host machine does not support TM

    If your host machine has Intel TSX (check your processor on Intel Ark; a coarse check is to grep for the string “rtm” in /proc/cpuinfo) then we recommend running a cut-down experiment that will run the generated X86 litmus tests. This takes approximately 5 minutes.

    > cd memalloy/repro
    > scripts/mini_test_x86.sh
    

    A series of log files result_local.log are output in repro/results-mini.

    Replicating the full set of results requires access to Intel and IBM hardware. In our experiments, we managed this through scripts to sync with remote machines. For each machine <mach> it is necessary to:

    • Build the litmus7 tool locally
    • Define a config file repro/machines/<mach>.cfg which defines:
      • SERVER: the ssh address of the machine
      • SERVERDIR: a temporary directory on the machine
      • ARCH: either X86 or PPC
      • LITMUS7: the path of the local litmus7 tool
      • FLAGS: additional litmus7 flags
      • MACHINE: the tag of the machine

    Then the set of machines can be tested:

    > cd memalloy/repro
    > for m in remote/machines/*.cfg; do
        ./scripts/remote/0_push_experiments.sh ${m};
        ./scripts/remote/1_exec_experiments.sh ${m};
        ./scripts/remote/2_pull_results.sh ${m};
      done
    

    For completeness, we give the scripts and config files (suitably anonymised) in repro/scripts/remote.

  3. Generating latex

    If synthesis (step 1) and, optionally, testing (step 2) have successfully finished then the X86 half of Table 1 can be produced:

    > cd memalloy/repro
    > ./scripts/mk_table.py -arch X86 results-mini/fences-as-nodes/X86 > latex/table.tex
    > cd latex
    > pdflatex top
    # and open `top.pdf` in a pdf viewer on your host machine
    

    Comparing your table against Table 1:

    • Synthesis time will depend on your host machine but should follow a similar trend. This is the wall clock time reported at the end of each <N>ev_experiment.log file.
    • The number of tests produced (T) for both the Forbid and Allow sets should match those from Table 1.
    • Additionally, if testing (step 2) successfully finished then:
      • The number of seen tests (S) for the Forbid set should be zero; hence, the number of not seen tests (!S) should equal (T)
      • The number of seen (S) and not seen (!S) tests for the Allow set may differ depending on your host machine

    The PPC half of Table 1 can be produced similarly (only the synthesis time and number of test produced (T) columns will contain valid data):

    > cd memalloy/repro
    > ./scripts/mk_table.py -arch PPC results-mini/fences-as-nodes/PPC > latex/table.tex
    > cd latex
    > pdflatex top
    # and open `top.pdf` in a pdf viewer on your host machine
    
  4. Querying using sql

    Similar to our full result dataset you can build an sqlite database of your results:

    > cd memalloy/repro
    > ./scripts/mk_csv.py results-mini > out.csv
    > sqlite3 < scripts/sql/create.sql
    

    The following query counts the number of Forbid tests that are seen; this should be zero.

    > sqlite3 < scripts/sql/query_forbid_seen.sql
    

Cleaning up

Remove our docker image from your machine:

    $ docker stop $(docker ps -a | awk '/pldi18/{print $1}')
    $ docker rm $(docker ps -a | awk '/pldi18/{print $1}')
    $ docker rmi pldi18