Multi-Stream HD Video Enhancement with CUDA® and OpenCL™

Abstract

Video enhancement can be related to a family of operations and filters applied to input video streams, in order to improve their quality, either for display, additional processing or further storage.

Enhancement is common to be applied in many situations, however the most typical are:

  1. Improving a general input video stream from an unknown source.
  2. Improving an input video stream from a known video sensor with defined operational characteristics.
  3. Improving a compressed video stream with additional metadata encoded in the bitstream.

Video sensors offer increasing high throughput (~10’s of MPixels/s) on the expense of computational capacity. Therefore, video enhancement is performed to add more filters at later stage that cannot be implemented directly in sensor hardware.

The following work was performed for a large international firm through an internal research laboratory.
The goal of the project was implementing a given set of algorithms with processing flow on the GPU and optimize performance to allow faster than real-time processing of a single input video.

Results

The algorithm was implemented using NVIDIA® CUDA® on NVIDIA® Tesla® C1060 commercial device built into a HP Workstation (Z800).

Runtime results were achieving faster than real-time performance (>30Hz per stream) and with additional GPU devices a support for multi-stream processing was added in a single machine (up to 4 HD 1080p@60Hz streams).

The resulting GPU code included few PTX assembly instructions in performance hungry areas, where the CUDA® compiler (nvcc) failed to perform additional optimizations. With the use of PTX, additional time gain of 17% was achieved in the overall runtime of the algorithm leading to higher FPS.

With such performance per machine, a previous array of workstations was replaced by a single system, simplifying the architecture and reducing cost by almost 4 times.

Later, the library was ported to OpenCL™ to allow cross-platform and cross-vendor deployment of the library.