MulticoreBSP for C  Version 2.0.4
Namespaces | Data Structures | Functions
MulticoreBSP for C extensions for C++ programming

MulticoreBSP for C defines a C++ wrapper for object-oriented SPMD control via the mcbsp::BSP_PROGRAM class. More...

Namespaces

 mcbsp
 Namespace in which the C++-style extensions for MulticoreBSP for C reside.
 

Data Structures

class  mcbsp::BSP_program
 Abstract class which a user can extend to write BSP programs. More...
 

Functions

virtual void mcbsp::BSP_program::spmd ()=0
 The parallel SPMD code to be implemented by user. More...
 
virtual BSP_program * mcbsp::BSP_program::newInstance ()=0
 Creates a new instance of the implementing class, which will be used by new threads spawned by bsp_begin(). More...
 
virtual void mcbsp::BSP_program::destroyInstance (BSP_program *const instance)
 Code that destroys instances creatured using the newInstance() function. More...
 
void mcbsp::BSP_program::begin (const bsp_pid_t P=bsp_nprocs())
 Initialises and starts the current BSP program. More...
 
template<typename T >
void bsp_push_reg (T *const address, const size_t size=1)
 Registers a memory area for communication. More...
 
template<typename T >
bsp_size_t bsp_set_tagsize (const T &tag_type=T())
 Sets the tag size of inter-thread messages. More...
 
template<typename T >
void bsp_put (const bsp_pid_t pid, const void *const source, T *const destination, const size_t offset=0, const size_t size=1)
 Put data in a remote memory location. More...
 
template<typename T >
void bsp_get (const bsp_pid_t pid, const T *const source, const size_t offset, void *const destination, const size_t size=1)
 Get data from a remote memory location. More...
 
template<typename T >
void bsp_direct_get (const bsp_pid_t pid, const T *const source, const bsp_size_t offset, void *const destination, const bsp_size_t size=1)
 Get data from a remote memory location. More...
 
template<typename T >
void bsp_hpsend (const bsp_pid_t pid, const void *const tag, const T *const payload, const size_t size=1)
 This is a non-buffering and non-blocking send request. More...
 
template<typename T >
void bsp_move (T *const payload, const size_t max_copy_size=1)
 Retrieves the payload from the first message in the queue of incoming messages, and removes that message. More...
 
template<typename T >
void bsp_hpput (const bsp_pid_t pid, const void *const source, const T *const destination, const size_t offset=0, const size_t size=1)
 Put data in a remote memory location. More...
 
template<typename T >
void bsp_hpget (const bsp_pid_t pid, const T *const source, const size_t offset, void *const destination, const size_t size)
 Get data from a remote memory location. More...
 

Detailed Description

MulticoreBSP for C defines a C++ wrapper for object-oriented SPMD control via the mcbsp::BSP_PROGRAM class.

Users can extend this class, implement its virtual functions, and call mcbsp::BSP_PROGRAM::begin.

There are also extensions available that make the common BSPlib primitives like bsp_get and bsp_put work with the number of elements instead of with bytes. See the file mcbsp-templates.hpp for details.

Function Documentation

void mcbsp::BSP_program::begin ( const bsp_pid_t  P = bsp_nprocs())

Initialises and starts the current BSP program.

Automatically implies (correct) calls to bsp_init() and bsp_begin().

The parallel SPMD section is the overloaded function spmd() of this class instance.

The SPMD program makes use of the same globally defined BSP primitives as available in plain C; there are no special C++ wrappers for communication.

bsp_end() is automatically called after the spmd() function exits.

template<typename T >
void bsp_direct_get ( const bsp_pid_t  pid,
const T *const  source,
const bsp_size_t  offset,
void *const  destination,
const bsp_size_t  size = 1 
)

Get data from a remote memory location.

This is the templated variant of a regular bsp_direct_get().

This is a blocking communication primitive: communication is executed immediately and is not queued until the next synchronisation step. The remote memory location must be regustered using bsp_push_reg in a previous superstep.

The data retrieved will be the data at the remote memory location at "this" time. There is no guarantee that the remote thread is at the same position in executing the SPMD program; it might be anywhere in the current superstep. If the remote thread writes to the source memory block in this superstep, the retrieved data may partially consist of old and new data; this function does not buffer nor is it atomic in any way.

Note: if MCBSP_COMPATIBILITY_MODE is defined, pid, offset and size are of type `int'. Otherwise, pid is of type `unsigned int' and offset & size of type `size_t'.

Parameters
pidThe ID number of the remote thread.
sourcePointer to the registered remote memory area where to get data elements from.
offsetOffset (in number of elements) of the remote memory area. Offset must be positive and must not exceed the registered capacity of the remote memory area.
destinationPointer to one or more local destination elements.
sizeNumber of data elements to be sent. I.e., all the data from source[offset] up to source[offset+size-1] at the remote process is copied into destination[ 0 ] up to destination[size-1] at this process.
template<typename T >
void bsp_get ( const bsp_pid_t  pid,
const T *const  source,
const size_t  offset,
void *const  destination,
const size_t  size = 1 
)

Get data from a remote memory location.

This is the templated variant of a regular bsp_get().

This is a non-blocking communication request. Communication will be executed during the next synchronisation step. The remote memory location must be registered using bsp_push_reg in a previous superstep.

The data retrieved will be the data at the remote memory location at the time of synchronisation. It will not (and cannot) retrieve data at "this" point in the SPMD program at the remote thread. If other communication at the remote process would change the data at the region of interest, these changes are not included in the retrieved data; in this sense, the get is buffered.

Note: if MCBSP_COMPATIBILITY_MODE is defined, pid, offset and size are of type `int'. Otherwise, pid will be of type `unsigned int' and offset and size of type `size_t'.

Parameters
pidThe ID number of the remote thread.
sourcePointer to the registered remote memory area where to get data elements from.
offsetOffset (in number of elements) of the remote memory area. Offset must be positive and must not exceed the registered capacity of the remote memory area.
destinationPointer to one or more local destination elements.
sizeNumber of data elements to be sent. I.e., all the data from source[offset] up to source[offset+size-1] at the remote process is copied into destination[ 0 ] up to destination[size-1] at this process.
template<typename T >
void bsp_hpget ( const bsp_pid_t  pid,
const T *const  source,
const size_t  offset,
void *const  destination,
const size_t  size 
)

Get data from a remote memory location.

This is the templated variant of a regular bsp_hpget().

Note: current implementation does a normal bsp_get! An communication overlapping implementation is forthcoming.

This is a non-blocking communication request. Communication will be executed between now and the next synchronisation step. Note that this differs from bsp_get. Communication is guaranteed to have finished before the next superstep. Note this means that both source and destination memory areas might be read and written to at any time after issueing this communication request. This overlap of communication and computation is the fundamental difference with the standard bsp_get.

It is not guaranteed this overlap results in faster execution time. You should think about if using these high-performance primitives makes sense on a per-application basis, and factor in the extra costs of structuring your algorithm to enable correct use of these primitives.

Note the difference between this high-performance get and bsp_direct_get is that the latter function is blocking (performs the communication immediately and waits for it to end).

Otherwise usage is similar to that of bsp_get; please refer to that function for further documentation.

Note: if MCBSP_COMPATIBILITY_MODE is defined, pid, offset and size are of type `int'. Otherwise, pid is of type `unsigned int' and offset & size of type `size_t'.

Parameters
pidThe ID number of the remote thread.
sourcePointer to the registered remote memory area where to get data elements from.
offsetOffset (in number of elements) of the remote memory area. Offset must be positive and must not exceed the registered capacity of the remote memory area.
destinationPointer to one or more local destination elements.
sizeNumber of data elements to be sent. I.e., all the data from source[offset] up to source[offset+size-1] at the remote process is copied into destination[ 0 ] up to destination[size-1] at this process.
template<typename T >
void bsp_hpput ( const bsp_pid_t  pid,
const void *const  source,
const T *const  destination,
const size_t  offset = 0,
const size_t  size = 1 
)

Put data in a remote memory location.

This is the templated variant of a regular bsp_hpput().

Note: current implementation does a normal bsp_get! An communication overlapping implementation is forthcoming.

This is a non-blocking communication request. Communication will be executed sometime between now and during the next synchronisation step. Note that this differs from bsp_put. Communication is guaranteed to have finished before the next superstep. Note this means that both source and destination memory areas might be read and written to at any time after issueing this communication request. This overlap of communication and computation is the fundamental difference with the standard bsp_put.

It is not guaranteed this overlap results in faster execution time. You should think about if using these high-performance primitives makes sense on a per-application basis, and factor in the extra costs of structuring your algorithm to enable correct use of these primitives.

Otherwise usage is similar to that of bsp_put; please refer to that function for further documentation.

Note: if MCBSP_COMPATIBILITY_MODE is defined, pid, offset and size are of type `int'. Otherwise, pid is of type `unsigned int', offset and size of type `size_t'.

Parameters
pidThe ID number of the remote thread.
sourcePointer to one or more source elements.
destinationPointer to the registered remote memory area to send one or more source elements to.
offsetOffset (in number of elements) of the destination memory area. Offset must be positive and must not exceed the registered capacity of the remote memory area.
sizeNumber of data elements to be transmitted. I.e., all the data from source[0] up to source[size-1] at this process is copied into destination[offset] up to destination[offset+size-1] at the process with ID pid.
template<typename T >
void bsp_hpsend ( const bsp_pid_t  pid,
const void *const  tag,
const T *const  payload,
const size_t  size = 1 
)

This is a non-buffering and non-blocking send request.

This is the templated variant of a regular bsp_hpsend().

The function differs from the regular bsp_send in two major ways: (1) the actual send communication may occur between now and the end of the next synchronisation phase, and (2) the tag and payload data is read and sent somewhere between now and the end of the next synchronisation phase. If you change the contents of the memory area tag and payload point to after calling this function, undefined communication will occur. The semantics of BSMP remain unchanged: the sent messages will only become available at the remote processor when the next computation superstep begins. The performance gain is two-fold: (a) bsp_hpsend avoids buffering-on-send, and (b) BSP may send messages during a computation phase, thus overlapping computation with communication.

Ad (a): normally, BSMP in BSP copies tag and payload data at least three times. It buffers on bsp_send (buffer-on-send), it buffers at the receiving processes' incoming BSMP queue (buffer-on-receive), and finally bsp_get_tag and bsp_move copy the data in the target user-supplied memory areas. To also eliminate the latter data movement, please consider using bsp_hpmove.

Ad (b): it is not guaranteed this overlap results in faster execution time. You should think about if using these high- performance primitives makes sense on a per-application basis, and factor in the extra costs of structuring your algorithm to enable correct use of these primitives.

See bsp_send for general remarks about using BSMP primitives in BSP. See bsp_hpget and bsp_hpput for equivalent (non- buffering and non-blocking) high-performance communication requests.

Note: If MCBSP_COMPATIBILITY_MODE is defined, then pid and size are of type `int'. Otherwise, pid is of type `unsigned int' and size of type `size_t'.

Parameters
pidID of the remote thread to send this message to.
tagPointer to the tag data. This data will not be buffered.
payloadPointer to one or more payload data elements. These elements will not be buffered.
sizeNumber of data elements in the payload.
template<typename T >
void bsp_move ( T *const  payload,
const size_t  max_copy_size = 1 
)

Retrieves the payload from the first message in the queue of incoming messages, and removes that message.

This is the templated variant of a regular bsp_move().

If the incoming queue is empty, this function has no effect. This function will copy a given maximum of bytes from the message payload into a supplied buffer. This maximum should equal or be larger than the payload size (which can, e.g., be retrieved via bsp_get_tag). The maximum can be 0 bytes; the net effect is the efficient removal of the first message from the queue.

Note that Bulk Synchronous Message Passing (BSMP) is doubly buffered: bsp_send buffers on send and this function buffers again on receives.

See bsp_hpmove if buffer-on-receive is unwanted.

Note: if MCBSP_COMPATIBILITY_MODE is defined, max_copy_size is of type `int'. Otherwise, it is of type `size_t'.

Parameters
payloadWhere to copy the data elements from the payload of a received BSMP message into.
max_copy_sizeThe maximum number of elements to copy.
template<typename T >
void bsp_push_reg ( T *const  address,
const size_t  size = 1 
)

Registers a memory area for communication.

This is the templated variant of a regular bsp_push_reg().

If an SPMD program defines a local variable x, each of the P threads actually has its own memory areas associated with that variable. Communication requires threads to be aware of the memory location of a destination variable. This function achieves this. The order of variable registration must be the same across all threads in the SPMD program. The size of the registered memory block may differ from thread to thread. Registration takes effect only after a synchronisation.

Note: if MCBSP_COMPATIBILITY_MODE is defined, size will be of type `int'. Otherwise, it is of type `size_t'.

Parameters
addressPointer to the memory area to register.
sizeNumber of elements to register. This should equal 1 of address points to a single element instead of an array.
template<typename T >
void bsp_put ( const bsp_pid_t  pid,
const void *const  source,
T *const  destination,
const size_t  offset = 0,
const size_t  size = 1 
)

Put data in a remote memory location.

This is the templated variant of a regular bsp_put().

This is a non-blocking communication request. Communication will be executed during the next synchronisation step. The remote memory location must be registered using bsp_push_reg in a previous superstep.

The data to be communicated to the remote area will be buffered on request; i.e., the source memory location is free to change after this communication request; the communicated data will not reflect those changes.

Note: if MCBSP_COMPATIBILITY_MODE is defined, pid, offset and size are of type `int'. Otherwise, pid is of type `unsigned int', and offset and size of type `size_t'.

Parameters
pidThe ID number of the remote thread.
sourcePointer to one or more source elements.
destinationPointer to the registered remote memory area to send one or more source elements to.
offsetOffset (in number of elements) of the destination memory area. Offset must be positive and must not exceed the registered capacity of the remote memory area.
sizeNumber of data elements to be transmitted. I.e., all the data from source[0] up to source[size-1] at this process is copied into destination[offset] up to destination[offset+size-1] at the process with ID pid.
template<typename T >
bsp_size_t bsp_set_tagsize ( const T &  tag_type = T())

Sets the tag size of inter-thread messages.

This is the templated variant of a regular bsp_set_tagsize().

bsp_send can be used to send message tuples (tag,payload). Tag must be of a fixed size, the payload size may differ per message.

This function sets the tagsize so that it can store an instance of a type equal to the one passed as an argument. This new size will be valid from the next bsp_sync() on.

All processes must still call bsp_set_tagsize with objects of the same size, or MulticoreBSP for C will return an error.

The function will now output the old tag size as the return value.

Note: if MCBSP_COMPATIBILITY_MODE is defined, the return type is an `int'. Otherwise, it is of type `size_t'.

Parameters
tag_typeAn object of the new tag type.
Returns
The old tag size, in bytes.
virtual void mcbsp::BSP_program::destroyInstance ( BSP_program *const  instance)
protectedvirtual

Code that destroys instances creatured using the newInstance() function.

The default implementation of this virtual function is as follows:

virtual void destroyInstance( BSP_program * instance ) { delete instance; }

Override this implementation only when this would not yield correct behaviour. Note that when the newInstance function is implemented as per its recommendation, no action is required.

Parameters
instancethe instance, created by newInstance, that should be destroyed.
See Also
newInstance()
virtual BSP_program* mcbsp::BSP_program::newInstance ( )
protectedpure virtual

Creates a new instance of the implementing class, which will be used by new threads spawned by bsp_begin().

Note that this need not be a copy if your BSP program does not require this! The recommended implementation is the following:

virtual BSP_program * newInstance() { return new FinalType(); }

where FinalType is the name of the BSP program you are implementing (FinalType must be a non-abstract subclass of BSP_program).

virtual void mcbsp::BSP_program::spmd ( )
protectedpure virtual

The parallel SPMD code to be implemented by user.