The Alloy models in this repository are divided into four subdirectories:

hw: architecture-level memory consistency models
sw: language-level memory consistency models
mappings: language-to-architecture compiler mappings
tests: questions about memory consistency models and the relationships between them

The diagram below depicts how the models build on one another (an arrow from model A to model B indicates that A imports model B): Model dependencies

1. Getting Started

Installation

Most of our models rely only on the basic Alloy tool, but some depend on the higher-order quantification that is only supported in the AlloyStar tool. An unofficial copy of AlloyStar (incorporating a couple of minor tweaks) can be downloaded from our GitHub repository.

Running Alloy

Set the solver to Glucose or Plingeling (via Options → Solver). Set the maximum memory usage and stack size as high as possible, e.g. 4GB of memory and 64MB of stack (via Options → Maximum memory and Options → Maximum stack). Set the maximum number of CEGIS iterations to 0, which indicates ‘no maximum’ (via Options → Max CEGIS iterations).
Open an Alloy model file, e.g. tests/Q2_c11_simp_orig.als (via File → Open...).
Run a command by picking it from the Execute menu (e.g. Execute → Run gp for exactly 1 Exec, 5 E expect 1).
Alloy will respond with either “No instance found” or “Instance found”. In the latter case, click on “Instance” to load the Alloy Visualizer.
When opening an instance in the Alloy Visualizer, change the Projection setting from none to Exec. This greatly improves readability. If the instance involves two executions, click on the << and >> buttons at the bottom to switch between them. If the instance involves separate hardware and software executions, project over both types of execution at the same time for maximum readability.
In the Alloy Visualizer, the Theme button allows nodes and edges to be hidden or restyled according to their type.

Going Further

For learning more about the Alloy language, the definitive reference is Daniel Jackson’s book. There is also a good online tutorial.

2. Reproducing Results

In the table below, the Task column refers to the row in Table 2 on page 11 of the paper. The File column refers to a file in this repository, and the Command column gives the command in that file that should be executed (by selecting it from the Execute menu). The Solver column identifies the SAT solver that was found to provide the fastest result. The Time column gives the number of seconds to encode (first number) and to solve (second number) the task. These numbers were obtained on a 64-bit Linux machine with four 16-core 2.1 GHz AMD Opteron processors and 128 GB of RAM; results obtained on different machines may vary considerably given the highly unpredictable nature of SAT solving. Finally, the Instance? column reports whether an instance is found or not.

Task	File	Command	Solver	Time /s	Instance?
1	`tests/Q2_c11_sra_simp.als`	2nd	Glucose	0.7+0.6	yes
2	`tests/Q2_c11_swrf_simp.als`	3rd	Glucose	0.8+625	no
3	`tests/Q2_c11_swrf_simp.als`	1st	Plingeling	2+214	yes
4	`tests/Q2_c11_simp_orig.als`	2nd	Glucose	0.4+0.3	yes
5	`tests/Q2_x86_mca.als`	2nd	Plingeling	0.8+607	no
6	`tests/Q2_ppc_mca.als`	2nd	Glucose	1.5+0.06	yes
7	`tests/Q2_sc_c11nodrf.als`	1st	Glucose	0.4+0.04	yes
8	`tests/Q2_ptx.als`	2nd	Glucose	0.7+4	yes
9	`tests/Q3_c11_seq.als`	1st	MiniSat	0.5+163	no
10	`tests/Q3_c11_seq.als`	2nd	Plingeling	0.7+5	yes
11	`tests/Q3_c11_mo.als`	2nd	Glucose	0.9+51	yes
12	`tests/Q4_c11_x86a.als`	1st	Plingeling	0.7+13029	no
13	`tests/Q4_c11_ppc_trimmed.als`	1st	Plingeling	8+91	yes
14	`tests/Q4_opencl_amd.als`	2nd	Glucose	6+1355	yes
15	`tests/Q4_opencl_amd.als`	1st	Plingeling	16+4743	yes
16	`tests/Q4_opencl_ptx_orig.als`	2nd	Plingeling	2+11	yes
17	`tests/Q4_opencl_ptx_cumul.als`	1st	Plingeling	4+9719	no

3. Testing our stronger PTX MCM

This section shows tests that differentiate the PTX1 model against the PTX2 model. That is, tests that can pass under PTX1, but not under PTX2. The test names refer to Sarkar et al.'s periodic table of litmus tests.

In order to show that PTX2 remains empirically sound, we must show that these tests are not observable on Nvidia chips. Many of the tests are available from Alglave et al.'s publicly-available experimental data. For these tests, we link to the existing results. Some tests are not included in Alglave et al.'s set. These tests are all single-address tests. Alglave et al. only consider the 4 traditional single-address tests (CoRR, CoRW, CoWR, CoWW). However, due to a combination of the PTX model not providing full coherence, and GPU scopes, we obtain larger single-address tests that do not reduce to any of those tests. For tests that are unavailable in Alglave et al.'s set, we run new tests on a GTX Titan chip.

7-event executions

Existing test results There are a total of 12 tests with 7 events that are allowed under PTX1 but disallowed under PTX2 and for which Alglave et al. provide testing results. In no case is the weak behaviour observed.

Test shape	GPU config	GTX 540	GTX 660	GTX 750 Ti	GTX Titan	Tesla C2075
RWC + membar.cta + membar.gl	P₀ \|_cta (P₁ \|_warp P₂)	0/50k	0/100k	0/50k	0/50k	0/50k
RWC + membar.gl + membar.cta	P₀ \|_cta (P₁ \|_warp P₂)	0/50k	0/100k	0/50k	0/50k	0/50k
WRR+2W + membar.cta + membar.gl	P₀ \|_cta (P₁ \|_warp P₂)	0/50k	0/100k	0/50k	0/50k	0/50k
WRR+2W + membar.gl + membar.cta	P₀ \|_cta (P₁ \|_warp P₂)	0/50k	0/100k	0/50k	0/50k	0/50k
WRC + membar.cta + membar.gl	P₀ \|_cta (P₁ \|_warp P₂)	0/50k	0/100k	0/50k	0/50k	0/50k
WRC + membar.gl + membar.cta	P₀ \|_cta (P₁ \|_warp P₂)	0/50k	0/100k	0/50k	0/50k	0/50k
WWC + membar.cta + membar.gl	P₀ \|_cta (P₁ \|_warp P₂)	0/50k	0/100k	0/50k	0/50k	0/50k
WWC + membar.gl + membar.cta	P₀ \|_cta (P₁ \|_warp P₂)	0/50k	0/100k	0/50k	0/50k	0/50k
WRW+WR + membar.cta + membar.gl	P₀ \|_cta (P₁ \|_warp P₂)	0/50k	0/100k	0/50k	0/50k	0/50k
WRW+WR + membar.gl + membar.cta	P₀ \|_cta (P₁ \|_warp P₂)	0/50k	0/100k	0/50k	0/50k	0/50k
WRW+2W + membar.cta + membar.gl	P₀ \|_cta (P₁ \|_warp P₂)	0/50k	0/100k	0/50k	0/50k	0/50k
WRW+2W + membar.gl + membar.cta	P₀ \|_cta (P₁ \|_warp P₂)	0/50k	0/100k	0/50k	0/50k	0/50k

New test results There are a total of 2 tests with 7 events that are allowed under PTX1 but disallowed under PTX2, and for which Alglave et al. do not provide testing results. We ran these two tests on a GTX Titan using Alglave et al.'s GPU-litmus tool. We do not observe the weak behaviour.

Test shape	GPU config	GTX Titan
RWC + membar.cta + membar.gl (single-address)	P₀ \|_cta (P₁ \|_warp P₂)	0/100k
WRR+2W + membar.cta + membar.gl (single-address)	P₀ \|_cta (P₁ \|_warp P₂)	0/100k

8-event executions

Existing test results Alloy found 10 tests with 8 events that are allowed under PTX1 but disallowed under PTX2 and for which Alglave et al. provide testing results. In no case do we observe the weak behaviour.

Test shape	GPU config	GTX 540	GTX 660	GTX 750 Ti	GTX Titan	Tesla C2075
IRIW + membar.cta + membar.gl	P₀ \|_cta (P₁ \|_warp P₂ \|_warp P₃)	0/50k	0/100k	0/50k	0/50k	0/50k
IRIW + membar.gl + membar.cta	P₀ \|_cta (P₁ \|_warp P₂ \|_warp P₃)	0/50k	0/100k	0/50k	0/50k	0/50k
IRRWIW + membar.cta + membar.gl	P₀ \|_cta (P₁ \|_warp P₂ \|_warp P₃)	0/50k	0/100k	0/50k	0/50k	0/50k
IRRWIW + membar.gl + membar.cta	P₀ \|_cta (P₁ \|_warp P₂ \|_warp P₃)	0/50k	0/100k	0/50k	0/50k	0/50k
IRRWIW + membar.gl + membar.cta	P₂ \|_cta (P₀ \|_warp P₁ \|_warp P₃)	0/50k	0/100k	0/50k	0/50k	0/50k
IRRWIW + membar.cta + membar.gl	P₂ \|_cta (P₀ \|_warp P₁ \|_warp P₃)	0/50k	0/100k	0/50k	0/50k	0/50k
IRWIW + membar.cta + membar.gl	P₀ \|_cta (P₁ \|_warp P₂ \|_warp P₃)	0/50k	0/100k	0/50k	0/50k	0/50k
IRWIW + membar.gl + membar.cta	P₀ \|_cta (P₁ \|_warp P₂ \|_warp P₃)	0/50k	0/100k	0/50k	0/50k	0/50k

New test results There are a total of 2 tests with 8 events that are allowed under PTX1 but disallowed under PTX2, and for which Alglave et al. do not provide testing results. We ran these two tests on a GTX Titan using Alglave et al.'s GPU-litmus tool. We do not observe the weak behaviour.

Test shape	GPU config	GTX Titan
IRIW + membar.cta + membar.gl (single-address)	P₀ \|_cta (P₁ \|_warp P₂ \|_warp P₃)	0/100k
IRRWIW + membar.cta + membar.gl (single-address)	P₀ \|_cta (P₁ \|_warp P₂ \|_warp P₃)	0/100k

Automatically Comparing Memory Consistency Models