MulticoreBSP for C  Version 2.0.4
Functions
Direct Remote Memory Access

Direct Remote Memory Access (DRMA) provides BSP processes the means to accesses memory regions of other BSP processes. More...

Functions

static void bsp_push_reg (void *const address, const bsp_size_t size)
 Registers a memory area for communication. More...
 
static void bsp_pop_reg (void *const address)
 De-registers a pushed registration. More...
 
static void bsp_put (const bsp_pid_t pid, const void *const source, const void *const destination, const bsp_size_t offset, const bsp_size_t size)
 Put data in a remote memory location. More...
 
static void bsp_get (const bsp_pid_t pid, const void *const source, const bsp_size_t offset, void *const destination, const bsp_size_t size)
 Get data from a remote memory location. More...
 
static void bsp_direct_get (const bsp_pid_t pid, const void *const source, const bsp_size_t offset, void *const destination, const bsp_size_t size)
 Get data from a remote memory location. More...
 
static void bsp_hpput (const bsp_pid_t pid, const void *const source, const void *const destination, const bsp_size_t offset, const bsp_size_t size)
 Put data in a remote memory location. More...
 
static void bsp_hpget (const bsp_pid_t pid, const void *const source, const bsp_size_t offset, void *const destination, const bsp_size_t size)
 Get data from a remote memory location. More...
 

Detailed Description

Direct Remote Memory Access (DRMA) provides BSP processes the means to accesses memory regions of other BSP processes.

Remote memory regions must first be registered before DRMA primitives are allowed to operate on them:

To copy local data directly to remote memory areas:

To copy data from remote memory areas to local memory:

The bsp_put and bsp_get are considered safe primitives, while the bsp_hpget and bsp_hpput should be used with care.

Function Documentation

static void bsp_direct_get ( const bsp_pid_t  pid,
const void *const  source,
const bsp_size_t  offset,
void *const  destination,
const bsp_size_t  size 
)
inlinestatic

Get data from a remote memory location.

This is a blocking communication primitive: communication is executed immediately and is not queued until the next synchronisation step. The remote memory location must be registered using bsp_push_reg in a previous superstep.

The data retrieved will be the data at the remote memory location at "this" time. There is no guarantee that the remote thread is at the same position in executing the SPMD program; it might be anywhere in the current superstep. If the remote thread writes to the source memory block in this superstep, the retrieved data may partially consist of old and new data; this function does not buffer nor is it atomic in any way.

Note: if MCBSP_COMPATIBILITY_MODE is defined, pid, offset and size are of type `int'. Otherwise, pid is of type `unsigned int' and offset & size of type `size_t'.

Parameters
pidThe ID number of the remote thread.
sourcePointer to the registered remote memory area where to get data from.
offsetOffset (in bytes) of the remote memory area. Offset must be positive and must be less than the remotely registered memory size.
destinationPointer to the local destination memory area.
sizeSize (in bytes) of the data to be communicated; i.e., all the data from address source up to address (source + size) at the remote thread, is copied to (destination+offset) up to (destination+offset+size) at this thread.
Remarks
This function was first proposed in MulticoreBSP for Java by Yzelman & Bisseling (2012), then included in the MulticoreBSP for C library by Yzelman et al. (2014). It is not an original BSPlib primitive as defined by Hill et al. (1998), and should never be introduced or used for distributed-memory communication.

References mcbsp_direct_get.

static void bsp_get ( const bsp_pid_t  pid,
const void *const  source,
const bsp_size_t  offset,
void *const  destination,
const bsp_size_t  size 
)
inlinestatic

Get data from a remote memory location.

This is a non-blocking communication request. Communication will be executed during the next synchronisation step. The remote memory location must be registered using bsp_push_reg in a previous superstep.

The data retrieved will be the data at the remote memory location at the time of synchronisation. It will not (and cannot) retrieve data at "this" point in the SPMD program at the remote thread. If other communication at the remote process would change the data at the region of interest, these changes are not included in the retrieved data; in this sense, the get is buffered.

Note: if MCBSP_COMPATIBILITY_MODE is defined, pid, offset and size are of type `int'. Otherwise, pid will be of type `unsigned int' and offset and size of type `size_t'.

Parameters
pidThe ID number of the remote thread.
sourcePointer to the registered remote memory area where to get data from.
offsetOffset (in bytes) of the remote memory area. Offset must be positive and must be less than the remotely registered memory size.
destinationPointer to the local destination memory area.
sizeSize (in bytes) of the data to be communicated; i.e., all the data from address (source+offset) up to address (source+offset+size) at the remote thread, is copied to destination up to (destination+size) at this thread.

References mcbsp_get.

static void bsp_hpget ( const bsp_pid_t  pid,
const void *const  source,
const bsp_size_t  offset,
void *const  destination,
const bsp_size_t  size 
)
inlinestatic

Get data from a remote memory location.

This is a non-blocking communication request. Communication will be executed between now and the next synchronisation step. Note that this differs from bsp_get. Communication is guaranteed to have finished before the next superstep. Note this means that both source and destination memory areas might be read and written to at any time after issueing this communication request. This overlap of communication and computation is the fundamental difference with the standard bsp_get.

It is not guaranteed this overlap results in faster execution time. You should think about if using these high-performance primitives makes sense on a per-application basis, and factor in the extra costs of structuring your algorithm to enable correct use of these primitives.

Note the difference between this high-performance get and bsp_direct_get is that the latter function is blocking (performs the communication immediately and waits for it to end).

Otherwise usage is similar to that of bsp_get; please refer to that function for further documentation.

Note: if MCBSP_COMPATIBILITY_MODE is defined, pid, offset and size are of type `int'. Otherwise, pid is of type `unsigned int' and offset & size of type `size_t'.

Parameters
pidThe ID number of the remote thread.
sourcePointer to the registered remote memory area where to get data from.
offsetOffset (in bytes) of the remote memory area. Offset must be positive and must be less than the remotely registered memory size.
destinationPointer to the local destination memory area.
sizeSize (in bytes) of the data to be communicated; i.e., all the data from address source up to address (source + size) at the remote thread, is copied to (destination+offset) up to (destination+offset+size) at this thread.

References mcbsp_hpget.

static void bsp_hpput ( const bsp_pid_t  pid,
const void *const  source,
const void *const  destination,
const bsp_size_t  offset,
const bsp_size_t  size 
)
inlinestatic

Put data in a remote memory location.

This is a non-blocking communication request. Communication will be executed sometime between now and during the next synchronisation step. Note that this differs from bsp_put. Communication is guaranteed to have finished before the next superstep. Note this means that both source and destination memory areas might be read and written to at any time after issueing this communication request. This overlap of communication and computation is the fundamental difference with the standard bsp_put.

It is not guaranteed this overlap results in faster execution time. You should think about if using these high-performance primitives makes sense on a per-application basis, and factor in the extra costs of structuring your algorithm to enable correct use of these primitives.

Otherwise usage is similar to that of bsp_put; please refer to that function for further documentation.

Note: if MCBSP_COMPATIBILITY_MODE is defined, pid, offset and size are of type `int'. Otherwise, pid is of type `unsigned int', offset and size of type `size_t'.

Parameters
pidThe ID number of the remote thread.
sourcePointer to the source data.
destinationPointer to the registered remote memory area to send data to.
offsetOffset (in bytes) of the memory area. Offset must be positive and less than the remotely registered memory size.
sizeSize (in bytes) of the data to be communicated; i.e., all the data from address source up to address (source + size) at the current thread, is copied to (destination+offset) up to (destination+offset+size) at the thread with ID pid.

References mcbsp_hpput.

static void bsp_pop_reg ( void *const  address)
inlinestatic

De-registers a pushed registration.

Makes a memory region unavailable for communication. The region should first have been registered using bsp_push_reg, otherwise a run-time error will result.

If the same memory address is registered multiple times, only the latest registration is cancelled.

The order of deregistrations must be the same across all threads to ensure correct execution. Like with bsp_push_reg, this is entirely the responsibility of the programmer; MulticoreBSP does check for correctness (it cannot efficiently do so).

Issuing a pop_reg counts as a p-relation during the next bsp_sync, worst case.

Parameters
addressPointer to the memory region to deregister.

References mcbsp_pop_reg.

static void bsp_push_reg ( void *const  address,
const bsp_size_t  size 
)
inlinestatic

Registers a memory area for communication.

If an SPMD program defines a local variable x, each of the P threads actually has its own memory areas associated with that variable. Communication requires threads to be aware of the memory location of a destination variable. This function achieves this. The order of variable registration must be the same across all threads in the SPMD program. The size of the registered memory block may differ from thread to thread. Registration takes effect only after a synchronisation.

Issuing a push_reg counts as a p-relation during the next bsp_sync, worst case.

Bug:
Due to an implementation choice, the overhead of pushing k variables across all SPMD processes is k^2. This is no issue if k is small, and scales in a realistic sense if $ k^2 \leq P $. An alternative implementation could reduce this overhead cost to $ k\log k $ (red/black tree, worst case) or k (hashmap, average case) albeit at the cost of using (increasingly) more memory. Contact the maintainers if this variant is indeed preferable.

Note: if MCBSP_COMPATIBILITY_MODE is defined, size will be of type `int'. Otherwise, it is of type `size_t'.

Parameters
addressPointer to the memory area to register.
sizeSize, in bytes, of the area to register.

References mcbsp_push_reg.

static void bsp_put ( const bsp_pid_t  pid,
const void *const  source,
const void *const  destination,
const bsp_size_t  offset,
const bsp_size_t  size 
)
inlinestatic

Put data in a remote memory location.

This is a non-blocking communication request. Communication will be executed during the next synchronisation step. The remote memory location must be registered using bsp_push_reg in a previous superstep.

The data to be communicated to the remote area will be buffered on request; i.e., the source memory location is free to change after this communication request; the communicated data will not reflect those changes.

Note: if MCBSP_COMPATIBILITY_MODE is defined, pid, offset and size are of type `int'. Otherwise, pid is of type `unsigned int', and offset and size of type `size_t'.

Parameters
pidThe ID number of the remote thread.
sourcePointer to the source data.
destinationPointer to the registered remote memory area to send data to.
offsetOffset (in bytes) of the memory area. Offset must be positive and less than the remotely registered memory size.
sizeSize (in bytes) of the data to be communicated; i.e., all the data from address source up to address (source + size) at the current thread, is copied to (destination+offset) up to (destination+offset+size) at the thread with ID pid.

References mcbsp_put.