parallel_do Template Function

Summary

Template function that processes work items in parallel.

Header

 #include "tbb/parallel_do.h"

Syntax

template<typename InputIterator, typename Body> 
void parallel_do( InputIterator first, InputIterator last,
                 Body body[, task_group_context& group] );

Description

A parallel_do(first,last,body) applies a function object body over the half-open interval [first,last). Items may be processed in parallel. Additional work items can be added by body if it has a second argument of type parallel_do_feeder. The function terminates when body(x) returns for all items x that were in the input sequence or added to it by method parallel_do_feeder::add.

The requirements for input iterators are specified in Section 24.1 of the ISO C++ standard. The table below shows the requirements on type Body.

parallel_do Requirements for Body B and its Argument Type T

Pseudo-Signature

Semantics

B::operator()(
cv-qualifiers T& item,
 parallel_do_feeder<T>& feeder
 ) const
OR
 B::operator()(cv-qualifiers T&
item ) const
 

Process item. Template parallel_do may concurrently invoke operator() for the same this but different item.

The signature with feeder permits additional work items to be added.

T( const T& )

Copy a work item.

~T::T()

Destroy a work item.

For example, a unary function object, as defined in Section 20.3 of the C++ standard, models the requirements for B.

Caution

Defining both the one-argument and two-argument forms of operator() is not permitted.

Note

The parallelism in parallel_do is not scalable if all of the items come from an input stream that does not have random access. To achieve scaling, do one of the following:

  • Use random access iterators to specify the input stream.

  • Design your algorithm such that the body often adds more than one piece of work.

  • Use parallel_for instead.

To achieve speedup, the grainsize of B::operator() needs to be on the order of at least ~100,000 clock cycles. Otherwise, the internal overheads of parallel_do swamp the useful work.

The algorithm can be passed a task_group_context object so that its tasks are executed in this group. By default the algorithm is executed in a bound group of its own.

Example

The following code sketches a body with the two-argument form of operator().

struct MyBody {
    void operator()(item_t item, 
                    parallel_do_feeder<item_t>& feeder ) {
        for each new piece of work implied by item do {
            item_t new_item = initializer;
            feeder.add(new_item);
        }
    } 
};