01 – What is CUDA.NET

CUDA.NET is a library that provides access to GPU computing resources on top (using) CUDA API by NVIDIA.

This article is divided into the following topics:

  • What is a GPU?
  • Overview of CUDA
  • Introduction to CUDA.NET
  • Typical Applications
  • Supported Platforms

What is a GPU?

GPU stands for Graphics Processing Unit.
It is a special, dedicated hardware usually used for graphics (2D, 3D, gaming) but now also employed to computing purposes as well.

GPU is used as a general term to represent a hardware solution and there are various vendors worldwide manufacturing them – although there are many types of GPUs only specific models or generations can be used for computing or with CUDA.

There are benefits for using the GPU as a computing resource – It provides strong computing power compared to other equivalents such as CPU, DSP or other dedicated chips with somewhat ease of programming.
For example, a reasonable GPU with 128 cores can provide about 500 GFLOPS (500 billion floating point operations per second), whereas a 4 core CPU can provide about 90 GFLOPS. The numbers can vary based on multiple parameters, but by means of raw computing power, these numbers provide a rough estimate for the potential in using the GPU.

Overview of CUDA

CUDA stands for Compute Unified Device Architecture and is a software environment created by NVIDIA to provide developers with specific API to utilize the GPU for computing directly, rather than doing graphics (the main purpose of GPUs).

This software environment provides API to enumerate the GPUs available in a system as computational devices, initialize them, allocate memory for each and execute code, actually full management aspects of these computing resources accessible on a computer.

CUDA itself is built with C, provides defined API and further libraries to assist developers, such as FFT and BLAS to perform Fourier transforms or linear algebra calculation, accelerated, on the GPU.

For further, deeper reading of these topics (GPU / CUDA), please follow this link: CUDA.

Introduction to CUDA.NET

As outlined above, the environments available today to GPU developers are mostly based on C and meant for native applications. However there is a need to have the same capabilities from managed (.NET/Java) applications. This is where CUDA.NET enters.

CUDA.NET is mostly an interfacing library, providing the same set of API as CUDA for low-level access, using the same terms and concepts. It is also a pure .NET implementation so one can use it from any .NET language or platform that supports CUDA and .NET (Linux, MacOSX etc.).

In addition to a low-level interface, CUDA.NET provides an object-oriented abstraction over CUDA, using the same objects and terms, but with simplifed access for .NET based applications. The same objects can be shared between both environments, but developers would find the OO interface much more friendly and intuitive for use.

The same set of libraries covered by CUDA is also accessible from CUDA.NET – FFT, BLAS and upcoming support for new libraries.

Typical Applications

The GPU can be beneficial for applications where computing takes a significant amount of time or is a bottleneck, as well when looking to free other resources and offload computations to the GPU (as it doesn’t affect the system while working in the background).

Fields where a sort of accelerated computing is needs, or processing of multiple elements can benefit the GPU.
To name a few:

  • Image/Video processing (filters, encoding, decoding)
  • Signal processing
  • Finance
  • Oil & gas (Geophysics)
  • Medical imaging
  • Scientific computations, simulations and research

Supported Platforms

As mentioned earlier, CUDA.NET is based on a pure .NET implementation.

It can be used on (assuming the OS supports CUDA): 

  • Windows
    • For desktops/embedded: XP and above
    • For servers: 2003 and above
  • Linux and other UNIX variants
  • Macintosh (MacOSX)

The library is fully compatible with 32 and 64 bit systems of all kinds mentioned above.

00 – Preface

The new CUDA.NET Tutorials category was created to collect and manage resources and materials for developers starting to work and develop with CUDA.NET library for various platforms.

The usual composition will be of articles on specific topics and gradually increasing complexity.

This post will include an additional Table of Contents for published articles as we go.

Table of Contents

  1. Preface


For any question or comment, please contact us through our email address: support (at) cass-hpc.com.

OpenCL.NET 1.0.48 Released


We are happy to announce the availability of the so long waiting OpenCL.NET 1.0.48 library.

This version aligns with OpenCL 1.0.48 standard, and fully conforms with latest NVIDIA drivers for OpenCL (and as well on supported platforms).

In brief, this release of the standard added few API functions and modified some, to truly allow heterogeneous computing on a single system. An application can query for the existence of multiple computing devices on the system, also by different vendors (recognize the CPU and a GPU as compute resources) regardless of the vendor. Such that consuming different computing resources can be transparent.

For further details about standard features and changes please consult Khronos website.

For OpenCL.NET page and download, click here.

As always, you are invited to contact us at: support@cass-hpc.com.

World Cloud Computing Summit 2009

The 2nd annual cloud computing summit is about to take place in Shfayim, Israel, between December 2-3, 2009.

Following last year success, the event will cover recent developments and progress in cloud technologies. Presenting with top-of-the-line companies active in this field, including (partial list): Amazon, Google, eBay, IBM, HP, Sun, RedHat and more.

Additional “hands-on” labs and workshops are offered during the event for participants that would like to learn more about cloud technologies and integration possibilities.

We are also presenting Hoopoe at the summit, for GPU Cloud Computing, and providing a workshop on GPU Computing in general and Hoopoe as well.

This event ends 2009 and symbolically the last decade, marking cloud computing as a major development that we are about to see more and more in the next years.

You are invited to join us during the event.

SizeT – .NET and native code


In this post I wanted to introduce you with a new construct we added to the latest release of CUDA.NET (2.3.6) and will be available with the published OpenCL.NET library.

The problem

.NET is a very fixed environment, defining well known types, such that an int is always 4 bytes long (32 bit) and a long is always 8 bytes long (64 bit).

This is not the case with native code, for developers of C/C++. Writing a program in 32 bit environment, will always yield 32 bit types, unless using specific directives to get 64 bit variables. When writing 64 bit programs, they do get access to 64 bit wide variables as primitives supported by the compiler.

This clearly creates a portability problem for code and applications written in 32 and 64 bit environments.

Another example, is pointer size, where in C/C++ environments, under 32 bit the pointer is 4 bytes wide (int) and under 64 bit systems it is 8 bytes wide (long). The .NET environment (through different languages) provides a simple construct to overcome this problem, namely the IntPtr object, which some of you may be familiar with.

Now, coming back to our domain, the runtime API (also the driver in a new CUDA 2.3 function) and OpenCL makes extensive use of the C/C++ size_t data type. This data type ensures for developers that under different environments they will get the maximum width of the supported data type, unsigned int for 32 bit systems and unsigned long for 64 bit systems.

Possible options

By means of the interoprating library (wrapper), such as CUDA.NET, it creates a problem, since the API should provide several versions of the function, one given an uint (to map to 32 bit with unsigned int C/C++ type), and ulong (to map against unsigned long in 64 bit C/C++ systems). Supplying such an interface to the user will have to force him a specific behavior and system, since in .NET, the uint is always 32 bit wide, and ulong is 64 bit wide, no matter what.

Another option can be to provide a unique, standalone interface, using the IntPtr object, since .NET takes care to make it 32 bit wide in 32 bit systems and 64 bit wide for 64 bit systems, dynamically, without user intervention.

But using the IntPtr and a very serious downside, it is not dynamic, once it’s value is set, it cannot be changed through simple arithmetic operators, like +,-,*,/ or else.

The solution

Exactly for this purpose we created the SizeT object (structure). First, it maps to the same name as it’s native counterpart (size_t) and second it provides the dynamic mechanisms we want for working with 32 or 64 bit systems transparently.

SizeT can serve just like any other basic primitive in .NET.
For example:

SizeT temp = 15;
uint value = (uint)temp;
ulong value2 = (ulong)temp;
temp = value;

Internally, the SizeT wraps the IntPtr object to provide the same dynamic capabilities under 32 and 64 bit platforms.
It can host the required .NET primitives (int, uint, long, ulong), so when programming, one will make a good habit for using the SizeT instead of other data types (working with the runtime CUDA API).

For OpenCL the interface was built from the first place to use SizeT in mind, as the OpenCL API uses only size_t data types for cross platform functions.

CUDA.NET 2.2 released

We are happy to announce the release of CUDA.NET version 2.2.

This release aligns with CUDA 2.2 API and features, and provides further improvements with CUDA.NET.
To download page.

Few of the additions/changes:

  • Supporting CUDA 2.2 API (zero copy etc.)
  • CUDA class supports all driver functions, adding few missing texture functions into the API
  • Removing double precision FFT routines from CUFFT – the functions were there for future support, but are no longer available
  • Adding MSDN/CHM based documentation for the library
  • Extending the runtime API support to allow various memory copies and the latest 2.2 API

We you will all find that release useful.

You are invited to provide us comments for usage and in general about the library to improve it.
You can send all that information to support@cass-hpc.com.