This repository contains materials for recreating and building upon the results published in the following paper:
The main results are summarised in Table 2 on page 11, and this page (Section 2) provides instructions for reproducing them. We also provide the data underlying the experimental testing of our new PTX MCM (Section 3).
This repository is maintained on GitHub.
The Alloy models in this repository are divided into four subdirectories:
hw
: architecture-level memory consistency models
sw
: language-level memory consistency models
mappings
: language-to-architecture compiler mappings
tests
: questions about memory consistency models and the
relationships between them
The diagram below depicts how the models build on one another (an arrow from model A to model B indicates that A imports model B):
Most of our models rely only on the basic Alloy tool, but some depend on the higher-order quantification that is only supported in the AlloyStar tool. An unofficial copy of AlloyStar (incorporating a couple of minor tweaks) can be downloaded from our GitHub repository.
Set the solver to Glucose or Plingeling (via Options → Solver
). Set the maximum memory usage and stack size as high as possible,
e.g. 4GB of memory and 64MB of stack (via Options → Maximum memory
and Options → Maximum stack
). Set the maximum number of
CEGIS iterations to 0, which indicates ‘no maximum’ (via Options → Max CEGIS iterations
).
Open an Alloy model file, e.g. tests/Q2_c11_simp_orig.als
(via
File → Open...
).
Run a command by picking it from the Execute
menu (e.g. Execute → Run gp for exactly 1 Exec, 5 E expect 1
).
Alloy will respond with either “No instance found” or “Instance found”. In the latter case, click on “Instance” to load the Alloy Visualizer.
When opening an instance in the Alloy Visualizer, change the
Projection
setting from none
to Exec
. This greatly improves
readability. If the instance involves two executions, click on the
<<
and >>
buttons at the bottom to switch between them. If the
instance involves separate hardware and software executions, project
over both types of execution at the same time for maximum
readability.
In the Alloy Visualizer, the Theme
button allows nodes and edges
to be hidden or restyled according to their type.
In the table below, the Task column refers to the row in Table 2
on
page 11 of the paper.
The File column refers to a file in this repository, and
the Command column gives the command in that file that should be
executed (by selecting it from the Execute
menu). The Solver
column identifies the SAT solver that was found to provide the fastest
result. The Time column gives the number of seconds to encode
(first number) and to solve (second number) the task. These numbers
were obtained on a 64-bit Linux machine with four 16-core 2.1 GHz AMD
Opteron processors and 128 GB of RAM; results obtained on different
machines may vary considerably given the highly unpredictable nature
of SAT solving. Finally, the Instance? column reports whether an
instance is found or not.
Task | File | Command | Solver | Time /s | Instance? |
---|---|---|---|---|---|
1 | tests/Q2_c11_sra_simp.als |
2nd | Glucose | 0.7+0.6 | yes |
2 | tests/Q2_c11_swrf_simp.als |
3rd | Glucose | 0.8+625 | no |
3 | tests/Q2_c11_swrf_simp.als |
1st | Plingeling | 2+214 | yes |
4 | tests/Q2_c11_simp_orig.als |
2nd | Glucose | 0.4+0.3 | yes |
5 | tests/Q2_x86_mca.als |
2nd | Plingeling | 0.8+607 | no |
6 | tests/Q2_ppc_mca.als |
2nd | Glucose | 1.5+0.06 | yes |
7 | tests/Q2_sc_c11nodrf.als |
1st | Glucose | 0.4+0.04 | yes |
8 | tests/Q2_ptx.als |
2nd | Glucose | 0.7+4 | yes |
9 | tests/Q3_c11_seq.als |
1st | MiniSat | 0.5+163 | no |
10 | tests/Q3_c11_seq.als |
2nd | Plingeling | 0.7+5 | yes |
11 | tests/Q3_c11_mo.als |
2nd | Glucose | 0.9+51 | yes |
12 | tests/Q4_c11_x86a.als |
1st | Plingeling | 0.7+13029 | no |
13 | tests/Q4_c11_ppc_trimmed.als |
1st | Plingeling | 8+91 | yes |
14 | tests/Q4_opencl_amd.als |
2nd | Glucose | 6+1355 | yes |
15 | tests/Q4_opencl_amd.als |
1st | Plingeling | 16+4743 | yes |
16 | tests/Q4_opencl_ptx_orig.als |
2nd | Plingeling | 2+11 | yes |
17 | tests/Q4_opencl_ptx_cumul.als |
1st | Plingeling | 4+9719 | no |
Existing test results There are a total of 12 tests with 7 events that are allowed under PTX1 but disallowed under PTX2 and for which Alglave et al. provide testing results. In no case is the weak behaviour observed.
New test results There are a total of 2 tests with 7 events that are allowed under PTX1 but disallowed under PTX2, and for which Alglave et al. do not provide testing results. We ran these two tests on a GTX Titan using Alglave et al.'s GPU-litmus tool. We do not observe the weak behaviour.
Test shape | GPU config | GTX Titan |
---|---|---|
RWC + membar.cta + membar.gl (single-address) | P0 |cta (P1 |warp P2) | 0/100k |
WRR+2W + membar.cta + membar.gl (single-address) | P0 |cta (P1 |warp P2) | 0/100k |
Existing test results Alloy found 10 tests with 8 events that are allowed under PTX1 but disallowed under PTX2 and for which Alglave et al. provide testing results. In no case do we observe the weak behaviour.
Test shape | GPU config | GTX 540 | GTX 660 | GTX 750 Ti | GTX Titan | Tesla C2075 |
---|---|---|---|---|---|---|
IRIW + membar.cta + membar.gl | P0 |cta (P1 |warp P2 |warp P3) | 0/50k | 0/100k | 0/50k | 0/50k | 0/50k |
IRIW + membar.gl + membar.cta | P0 |cta (P1 |warp P2 |warp P3) | 0/50k | 0/100k | 0/50k | 0/50k | 0/50k |
IRRWIW + membar.cta + membar.gl | P0 |cta (P1 |warp P2 |warp P3) | 0/50k | 0/100k | 0/50k | 0/50k | 0/50k |
IRRWIW + membar.gl + membar.cta | P0 |cta (P1 |warp P2 |warp P3) | 0/50k | 0/100k | 0/50k | 0/50k | 0/50k |
IRRWIW + membar.gl + membar.cta | P2 |cta (P0 |warp P1 |warp P3) | 0/50k | 0/100k | 0/50k | 0/50k | 0/50k |
IRRWIW + membar.cta + membar.gl | P2 |cta (P0 |warp P1 |warp P3) | 0/50k | 0/100k | 0/50k | 0/50k | 0/50k |
IRWIW + membar.cta + membar.gl | P0 |cta (P1 |warp P2 |warp P3) | 0/50k | 0/100k | 0/50k | 0/50k | 0/50k |
IRWIW + membar.gl + membar.cta | P0 |cta (P1 |warp P2 |warp P3) | 0/50k | 0/100k | 0/50k | 0/50k | 0/50k |
New test results There are a total of 2 tests with 8 events that are allowed under PTX1 but disallowed under PTX2, and for which Alglave et al. do not provide testing results. We ran these two tests on a GTX Titan using Alglave et al.'s GPU-litmus tool. We do not observe the weak behaviour.
Test shape | GPU config | GTX Titan |
---|---|---|
IRIW + membar.cta + membar.gl (single-address) | P0 |cta (P1 |warp P2 |warp P3) | 0/100k |
IRRWIW + membar.cta + membar.gl (single-address) | P0 |cta (P1 |warp P2 |warp P3) | 0/100k |