NVIDIA® CUDA® Development

[toc]

Introduction

From the pioneers of GPU computing in Israel – our background with the NVIDIA® CUDA® parallel computing platform goes back to the early days (2007), times it was called GPGPU. Since then, we had developed and ported over hundreds of different algorithms for typical GPU computing fields:

Image & Video Processing
Finance
Geophysics
Medical Imaging
Cloud Computing (on Amazon AWS EC2 and 3rd party providers)
DirectX / OpenGL Graphics Interoperability
And much more

With such expertise and experience we are capable of delivering any GPU based solution, targeting most industries and technologies.

Supported CUDA hardware profiles include: embedded (HPeC, low power), desktop / workstation, server-side / cluster scale with multi-GPU scenarios in all profiles to achieve high-performance, asynchronous and low-latency processing.

Development Methodology

Dedicated software development methodologies were created and tuned over the years to achieve robust cycles, reduce cost and minimize development duration for HPC projects involving GPUs.

Development stages follow the sequence below (high-level overview):

Requirements collection
System level analysis
Algorithm level analysis
Prototype development
Optimization and fine-tuning
Testing (system and unit level, follows previous stages as well)
Integration
Deployment

Development processes can be performed in compliance with ISO 9000/9001 standards definitions for organizations with high quality assurance requirements.

Cloud Computing

Our CUDA solutions can be developed to be hosted under Amazon AWS EC2 HPC instances, integrated with NVIDIA Tesla GPU family.

The benefits of using AWS services are mainly to achieve better scalability and lower development costs as it eliminates the need to purchase high-end systems.

For production purposes, AWS can offer great added values by using an effective cost model, offer higher acceleration and scalability.

Visit Amazon AWS EC2.

Other 3rd party cloud service providers can be used as well on request.

Frameworks and Programming Languages

C / C++
Microsoft .NET (C#, Visual Basic, Python and more)
Java
FORTRAN
Python
MathWorks MATLAB

Operating Systems

Microsoft Windows:

Desktop: XP, Vista, 7
Server: 2003, 2008, 2008 R2, HPC Server 2008
Embedded editions support for all types above

Linux (most distributions):

RHEL
CentOS
Fedora
Ubuntu
And others…

Real-Time embedded Linux
Windriver
Real-Time (Soft & Hard) modes support in Windows/Linux editions
Android (from 2.0 and newer)
Sun Solaris, OpenSolaris
Apple MacOSX
Support 32 bit and 64 bit architectures

Toolkits & Libraries

A comprehensive list of libraries are available with CUDA accelerated features and functionalities. The list below provides a short description of capabilities we gained over the years, but not limited to.

NVIDIA NPP – Image Processing Primitives
NVIDIA CUFFT – Fast Fourier Transform
NVIDIA CUBLAS – Basic Linear Algebra Subroutines
NVIDIA CUVID – Video decoding/encoding (H.264, MPEG-2, VC-1 etc.)
NVIDIA SDI – Video capture from SDI links
NVIDIA SceniX – Real-time realistic rendering
NVIDIA CompleX – Render scenes with large scale data-sets
NVIDIA OptiX – Interactive Real-Time Raytracing
OpenCV
OpenVIDIA
Additional 3rd party libraries with CUDA acceleration

Graphics Interoperability

CUDA integration into visualization systems can be useful in many cases, to replace existing intensive CPU processing stages or provide additional acceleration where normal shaders are limited or too complex.

It is also possible to integrate CUDA functionality where high-level graphics frameworks are used, such as OpenSceneGraph, Ogre, Unity, SlimDX and more.

A list of supported API (multiple operating systems):

Microsoft DirectX 9, 10 and 11
OpenGL

Supported GPU Families

NVIDIA® GeForce® (by different brands or manufacturers)
NVIDIA® Tesla® – Official GPU computing solution
NVIDIA® Quadro® – Official Visualization and computing solution
NVIDIA® QuadroPlex – Official Visualization and computing solution for cluster envrionments
NVIDIA® Tegra®, including ARM processor support
CUDA® on ARM – CARMA (based on ARM architecture)

From the 1^st NVIDIA® Tesla® architecture, to Fermi® and the recent Kepler®.

Complementary GPU Solutions

NVIDIA® SDI Capture – High-quality video acquisition from SDI links
GPU Direct – Asynchronous, low-latency, data transfers using GPU and Infiniband
NVIDIA® SLI® – Combined multi-GPU management