Added template class blocked_rangeNd for a generic multi-dimensional
range (requires C++11). Inspired by a contribution from Jeff Hammond.
Fixed a crash with dynamic memory allocation replacement on
Windows* for applications using system() function.
Fixed parallel_deterministic_reduce to split range correctly when used
Fixed a synchronization issue in task_group::run_and_wait() which
caused a simultaneous call to task_group::wait() to return
Changes in version 2018U4:
Improved support for Flow Graph Analyzer and Intel(R) VTune(TM)
Amplifier in the task scheduler and generic parallel algorithms.
Default device set for opencl_node now includes all the devices from
the first available OpenCL* platform.
Added lightweight policy for functional nodes in the flow graph. It
indicates that the node body has little work and should, if possible,
be executed immediately upon receiving a message, avoiding task