Automatically Comparing Memory Consistency Models

This repository contains materials for recreating and building upon the results published in the following paper:

The main results are summarised in Table 2 on page 11, and this page (Section 2) provides instructions for reproducing them. We also provide the data underlying the experimental testing of our new PTX MCM (Section 3).

This repository is maintained on GitHub.

The Alloy models in this repository are divided into four subdirectories:

The diagram below depicts how the models build on one another (an arrow from model A to model B indicates that A imports model B): Model dependencies

1. Getting Started

Installation

Most of our models rely only on the basic Alloy tool, but some depend on the higher-order quantification that is only supported in the AlloyStar tool. An unofficial copy of AlloyStar (incorporating a couple of minor tweaks) can be downloaded from our GitHub repository.

Running Alloy

Going Further

2. Reproducing Results

In the table below, the Task column refers to the row in Table 2 on page 11 of the paper. The File column refers to a file in this repository, and the Command column gives the command in that file that should be executed (by selecting it from the Execute menu). The Solver column identifies the SAT solver that was found to provide the fastest result. The Time column gives the number of seconds to encode (first number) and to solve (second number) the task. These numbers were obtained on a 64-bit Linux machine with four 16-core 2.1 GHz AMD Opteron processors and 128 GB of RAM; results obtained on different machines may vary considerably given the highly unpredictable nature of SAT solving. Finally, the Instance? column reports whether an instance is found or not.

Task File Command Solver Time /s Instance?
1 tests/Q2_c11_sra_simp.als 2nd Glucose 0.7+0.6 yes
2 tests/Q2_c11_swrf_simp.als 3rd Glucose 0.8+625 no
3 tests/Q2_c11_swrf_simp.als 1st Plingeling 2+214 yes
4 tests/Q2_c11_simp_orig.als 2nd Glucose 0.4+0.3 yes
5 tests/Q2_x86_mca.als 2nd Plingeling 0.8+607 no
6 tests/Q2_ppc_mca.als 2nd Glucose 1.5+0.06 yes
7 tests/Q2_sc_c11nodrf.als 1st Glucose 0.4+0.04 yes
8 tests/Q2_ptx.als 2nd Glucose 0.7+4 yes
9 tests/Q3_c11_seq.als 1st MiniSat 0.5+163 no
10 tests/Q3_c11_seq.als 2nd Plingeling 0.7+5 yes
11 tests/Q3_c11_mo.als 2nd Glucose 0.9+51 yes
12 tests/Q4_c11_x86a.als 1st Plingeling 0.7+13029 no
13 tests/Q4_c11_ppc_trimmed.als 1st Plingeling 8+91 yes
14 tests/Q4_opencl_amd.als 2nd Glucose 6+1355 yes
15 tests/Q4_opencl_amd.als 1st Plingeling 16+4743 yes
16 tests/Q4_opencl_ptx_orig.als 2nd Plingeling 2+11 yes
17 tests/Q4_opencl_ptx_cumul.als 1st Plingeling 4+9719 no

3. Testing our stronger PTX MCM

This section shows tests that differentiate the PTX1 model against the PTX2 model. That is, tests that can pass under PTX1, but not under PTX2. The test names refer to Sarkar et al.'s periodic table of litmus tests.

In order to show that PTX2 remains empirically sound, we must show that these tests are not observable on Nvidia chips. Many of the tests are available from Alglave et al.'s publicly-available experimental data. For these tests, we link to the existing results. Some tests are not included in Alglave et al.'s set. These tests are all single-address tests. Alglave et al. only consider the 4 traditional single-address tests (CoRR, CoRW, CoWR, CoWW). However, due to a combination of the PTX model not providing full coherence, and GPU scopes, we obtain larger single-address tests that do not reduce to any of those tests. For tests that are unavailable in Alglave et al.'s set, we run new tests on a GTX Titan chip.

7-event executions

Existing test results   There are a total of 12 tests with 7 events that are allowed under PTX1 but disallowed under PTX2 and for which Alglave et al. provide testing results. In no case is the weak behaviour observed.

Test shape GPU config GTX 540 GTX 660 GTX 750 Ti GTX Titan Tesla C2075
RWC + membar.cta + membar.gl P0 |cta (P1 |warp P2) 0/50k 0/100k 0/50k 0/50k 0/50k
RWC + membar.gl + membar.cta P0 |cta (P1 |warp P2) 0/50k 0/100k 0/50k 0/50k 0/50k
WRR+2W + membar.cta + membar.gl P0 |cta (P1 |warp P2) 0/50k 0/100k 0/50k 0/50k 0/50k
WRR+2W + membar.gl + membar.cta P0 |cta (P1 |warp P2) 0/50k 0/100k 0/50k 0/50k 0/50k
WRC + membar.cta + membar.gl P0 |cta (P1 |warp P2) 0/50k 0/100k 0/50k 0/50k 0/50k
WRC + membar.gl + membar.cta P0 |cta (P1 |warp P2) 0/50k 0/100k 0/50k 0/50k 0/50k
WWC + membar.cta + membar.gl P0 |cta (P1 |warp P2) 0/50k 0/100k 0/50k 0/50k 0/50k
WWC + membar.gl + membar.cta P0 |cta (P1 |warp P2) 0/50k 0/100k 0/50k 0/50k 0/50k
WRW+WR + membar.cta + membar.gl P0 |cta (P1 |warp P2) 0/50k 0/100k 0/50k 0/50k 0/50k
WRW+WR + membar.gl + membar.cta P0 |cta (P1 |warp P2) 0/50k 0/100k 0/50k 0/50k 0/50k
WRW+2W + membar.cta + membar.gl P0 |cta (P1 |warp P2) 0/50k 0/100k 0/50k 0/50k 0/50k
WRW+2W + membar.gl + membar.cta P0 |cta (P1 |warp P2) 0/50k 0/100k 0/50k 0/50k 0/50k

New test results    There are a total of 2 tests with 7 events that are allowed under PTX1 but disallowed under PTX2, and for which Alglave et al. do not provide testing results. We ran these two tests on a GTX Titan using Alglave et al.'s GPU-litmus tool. We do not observe the weak behaviour.

Test shape GPU config GTX Titan
RWC + membar.cta + membar.gl (single-address) P0 |cta (P1 |warp P2) 0/100k
WRR+2W + membar.cta + membar.gl (single-address) P0 |cta (P1 |warp P2) 0/100k

8-event executions

Existing test results   Alloy found 10 tests with 8 events that are allowed under PTX1 but disallowed under PTX2 and for which Alglave et al. provide testing results. In no case do we observe the weak behaviour.

Test shape GPU config GTX 540 GTX 660 GTX 750 Ti GTX Titan Tesla C2075
IRIW + membar.cta + membar.gl P0 |cta (P1 |warp P2 |warp P3) 0/50k 0/100k 0/50k 0/50k 0/50k
IRIW + membar.gl + membar.cta P0 |cta (P1 |warp P2 |warp P3) 0/50k 0/100k 0/50k 0/50k 0/50k
IRRWIW + membar.cta + membar.gl P0 |cta (P1 |warp P2 |warp P3) 0/50k 0/100k 0/50k 0/50k 0/50k
IRRWIW + membar.gl + membar.cta P0 |cta (P1 |warp P2 |warp P3) 0/50k 0/100k 0/50k 0/50k 0/50k
IRRWIW + membar.gl + membar.cta P2 |cta (P0 |warp P1 |warp P3) 0/50k 0/100k 0/50k 0/50k 0/50k
IRRWIW + membar.cta + membar.gl P2 |cta (P0 |warp P1 |warp P3) 0/50k 0/100k 0/50k 0/50k 0/50k
IRWIW + membar.cta + membar.gl P0 |cta (P1 |warp P2 |warp P3) 0/50k 0/100k 0/50k 0/50k 0/50k
IRWIW + membar.gl + membar.cta P0 |cta (P1 |warp P2 |warp P3) 0/50k 0/100k 0/50k 0/50k 0/50k

New test results   There are a total of 2 tests with 8 events that are allowed under PTX1 but disallowed under PTX2, and for which Alglave et al. do not provide testing results. We ran these two tests on a GTX Titan using Alglave et al.'s GPU-litmus tool. We do not observe the weak behaviour.

Test shape GPU config GTX Titan
IRIW + membar.cta + membar.gl (single-address) P0 |cta (P1 |warp P2 |warp P3) 0/100k
IRRWIW + membar.cta + membar.gl (single-address) P0 |cta (P1 |warp P2 |warp P3) 0/100k