Introduction to Task 9¶
In Task 9, we parallelize the core DAXPY operation
using two complementary CPU‐side models: OpenMP for shared‐memory threading and MPI for distributed‐memory message passing. We then measure and compare each parallel version’s execution time against the baseline serial implementation.
OpenMP vs MPI¶
Aspect | OpenMP | MPI |
---|---|---|
Memory Model | Shared—threads access the same address space. | Distributed—each rank has a separate memory space. |
Parallelism Scope | Loop‐level/task‐level within single node. | Process‐level across nodes (or processes). |
Overhead | Thread spawn/synchronization cost. | Process launch and communication (e.g., broadcast) cost. |
Use Case | Multi‐core scaling on one machine. | Multi‐node scaling or when explicit data‐partitioning is desired. |
Implementation highlights¶
-
OpenMP: In
vector_sum_omp.hpp
, the core loop is decorated with:vector_sum_omp.hpp#pragma omp parallel for for (std::size_t i = 0; i < n; ++i) { d[i] = a * x[i] + y[i]; }
This simple directive forks threads to split the index range, and at the end of the loop, threads synchronize automatically. In
test_vector_sum_omp.cpp
, we time the serial and OpenMP runs back‐to‐back and confirmd_ser == d_omp
before reporting: -
MPI: The MPI implementation lives in
vector_sum_mpi.hpp
. The test drivertest_vector_sum_mpi.cpp
initializes MPI, then:- Root process (rank 0) allocates and fills
x
andy
. -
Broadcast both vectors to all ranks:
-
Each rank calls
vector_sum_mpi(a, x, y, d_mpi)
on its local copy. -
Root measures serial vs. MPI time, asserts equality, and prints:
- Root process (rank 0) allocates and fills
What you will find in this repository¶
-
Header Files (
include/
):
Provides declarations for thevector_sum_omp
andvector_sum_mpi
functions. -
Test Files (
test/
):
Contains unit tests for thevector_sum_omp
andvector_sum_mpi
functions. -
Helper Scripts (
scripts/
):buildProject.sh
: A script to build the project from scratch.destroyProject.sh
: A script to completely clean the project, removing build artifacts and installed commands.
-
Docker Environment (
docker/
):
A Dockerfile (e.g.,Dockerfile.alma9
) is included to provide a ready-to-use development environment with all required dependencies. -
Run Script Template (
commands/run.in
):
This template is used to generate a wrapper script that is installed to/usr/local/bin
for easy invocation of project executables. -
CMakeLists.txt
:
The CMake build configuration file for the project.
Project Structure¶
The project directory structure is as follows:
project/ # Project root directory
│
├── commands/ # Contains the run script template
│ └── run.in # Script to run executables with the correct environment
├── docker/ # Docker build context
│ └── Dockerfile.alma9 # Dockerfile for building the project
├── include/ # Header files
│ ├── vector_sum_chunked.hpp # Declaration of the vector_sum_chunked function
│ ├── vector_sum_mpi.hpp # Declaration of the vector_sum_mpi function
│ ├── vector_sum_omp.hpp # Declaration of the vector_sum_omp function
│ ├── vector_sum.hpp # Declaration of the vector_sum function
├── scripts/ # Helper scripts
│ ├── buildProject.sh # Script to build the project from scratch
│ └── destroyProject.sh # Script to completely clean the project
├── test/ # Unit tests
│ ├── test_vector_sum_mpi.cpp # Tests for the vector_sum_mpi function
│ ├── test_vector_sum_omp.cpp # Tests for the vector_sum_omp function
│ ├── test_vector_sum.cpp # Tests for the vector_sum_chunked function
├── CMakeLists.txt # CMake build configuration file
└── README.md # Project documentation