CUDA.NET examples issues

The examples attached with the CUDA.NET library demonstrate simple aspects of programming with CUDA.NET to the GPU.
They mostly consist of a code that runs on the GPU itself, written in the CUDA language. These files end with the *.cu suffix.

In order to use these files with the GPU, they have to pass a compilation step, processed by the nvcccompiler (included in CUDA Toolkit) to create a cubin file (binary file the GPU uses).
To operate properly, the nvcc compiler needs access to the cl compiler (Visual C++ equipped with Visual Studio or can be downloaded as standalone).

If nvcc cannot find the cl compiler or the environment is not fully configured it fails.
This can happen when cl is executed from a C# or VB.NET project (where the environment is not configured to C++).
To overcome the errors, it is possible to define to nvcc command line parameters that will allow it to compile the code. This parameter specifies the path to the cl compiler.
For example (considering a Visual Studio 2008 installation), add the following parameter:
–compiler-bindir=”C:Program FilesMicrosoft Visual Studio 9.0VCbin”

On different platforms/installations this path can be different. Older versions of Visual Studio will have a different path as well.
The complete command line to execute nvcc with is:
nvcc –cubin –compiler-bindir=”C:Program FilesMicrosoft Visual Studio 9.0VCbin”
If compiling a CUDA file named “”.

Problems with the CHM documentation


We are being asked from time to time for errors when viewing the CHM documentation of CUDA.NET or OpenCL.NET.
Usually there is an “Internet Explorer” like message stating the page cannot be displayed.

This happens because of Internet Explorer security configuration that blocks CHM content when opened directly from the Web.
The best way to resolve it is to download the ZIP file to a safe folder on your computer (not temporary internet folders), unzip it and then open the file outside of Internet Explorer itself.

This should resolve the more in most of the cases.

OpenCL.NET Released

Hello everyone,

We are happy to announce the immediate availability of OpenCL.NET for the public.
This library provides a .NET implementation and wrapping of the OpenCL interface for GPU computing (and general computing as well).

Currently, the library supports revision 1.0.43 of Khronos (being the latest version of the standard).

Users may test the library with NVIDIA released drivers for OpenCL, or on other architectures as OpenCL should be supported on (Intel, AMD CPU etc.).

The API in this release was adapted to be cross platform in mind, and code, using the new SizeT construct for transparent handling of 32/64 bit platforms.

In addition, there is only one version of the library conforming to all operating systems who support OpenCL, regardless of Windows, Linux or Mac.

For any question, request, bug report or else, please contact us at:

We hope you will find this library useful.

SizeT – .NET and native code


In this post I wanted to introduce you with a new construct we added to the latest release of CUDA.NET (2.3.6) and will be available with the published OpenCL.NET library.

The problem

.NET is a very fixed environment, defining well known types, such that an int is always 4 bytes long (32 bit) and a long is always 8 bytes long (64 bit).

This is not the case with native code, for developers of C/C++. Writing a program in 32 bit environment, will always yield 32 bit types, unless using specific directives to get 64 bit variables. When writing 64 bit programs, they do get access to 64 bit wide variables as primitives supported by the compiler.

This clearly creates a portability problem for code and applications written in 32 and 64 bit environments.

Another example, is pointer size, where in C/C++ environments, under 32 bit the pointer is 4 bytes wide (int) and under 64 bit systems it is 8 bytes wide (long). The .NET environment (through different languages) provides a simple construct to overcome this problem, namely the IntPtr object, which some of you may be familiar with.

Now, coming back to our domain, the runtime API (also the driver in a new CUDA 2.3 function) and OpenCL makes extensive use of the C/C++ size_t data type. This data type ensures for developers that under different environments they will get the maximum width of the supported data type, unsigned int for 32 bit systems and unsigned long for 64 bit systems.

Possible options

By means of the interoprating library (wrapper), such as CUDA.NET, it creates a problem, since the API should provide several versions of the function, one given an uint (to map to 32 bit with unsigned int C/C++ type), and ulong (to map against unsigned long in 64 bit C/C++ systems). Supplying such an interface to the user will have to force him a specific behavior and system, since in .NET, the uint is always 32 bit wide, and ulong is 64 bit wide, no matter what.

Another option can be to provide a unique, standalone interface, using the IntPtr object, since .NET takes care to make it 32 bit wide in 32 bit systems and 64 bit wide for 64 bit systems, dynamically, without user intervention.

But using the IntPtr and a very serious downside, it is not dynamic, once it’s value is set, it cannot be changed through simple arithmetic operators, like +,-,*,/ or else.

The solution

Exactly for this purpose we created the SizeT object (structure). First, it maps to the same name as it’s native counterpart (size_t) and second it provides the dynamic mechanisms we want for working with 32 or 64 bit systems transparently.

SizeT can serve just like any other basic primitive in .NET.
For example:

SizeT temp = 15;
uint value = (uint)temp;
ulong value2 = (ulong)temp;
temp = value;

Internally, the SizeT wraps the IntPtr object to provide the same dynamic capabilities under 32 and 64 bit platforms.
It can host the required .NET primitives (int, uint, long, ulong), so when programming, one will make a good habit for using the SizeT instead of other data types (working with the runtime CUDA API).

For OpenCL the interface was built from the first place to use SizeT in mind, as the OpenCL API uses only size_t data types for cross platform functions.

Advanced data types with CUDA

Following with CUDA.NET 2.3.6 release, this article is meant to show you so of the more advanced constructs .NET can offer developers willing to get advanced interoperability with native code.
As most of you ar familiar, CUDA.NET offers to copy many types of arrays and data types to the GPU memory (through the different memcpy functions). These are based on well defined data types, mostly for numerical purposes.

Consider a basic data type of float, the corresponding array is declared as: float[], in C# or otherwise in different languages, but the principle is the same. In addition to these primitives (byte, short, int, long, float, double) there is also support for vector data types that CUDA support, such as Float2, where it is composed of 2 consequtive float elements.

What happens when you want to pass more complex data types that are not supported by CUDA.NET?

In this case, there are several techniques to achieve this goal, some maybe more complex to empploy than others, and it mostly depend on your expected usage.

1. Declaring a new copy function

Well, that’s always an option if you wish to extend the API of functions. In such case, the developer declares a new copy function to use, with expected parameters and consumes it.

The following example can show a little more:

// This is a dummy, complex data type
struct Test
public int value1;
public float value2;

// Define a new copy function to use with CUDA, assuming running under Linux
public static extern CUResult cuMemcpyHtoD(CUdeviceptr dst, Test[] src, uint bytes);

The definition above is for a function, to use, capable of copying data from an array of Test objects to device memory.
But, it may not always be convenient.

2. The dynamic, simpler way

Well, .NET offers one more possibility to convert .NET objects into native representation, without using “unsafe” mechanisms.

For this purpose, there is an object called “GCHandle” to use. This object provides an advanced control over the garbage collector of .NET to lock objects in memory and get their native pointer (IntPtr in .NET).

Since all copy functions in CUDA.NET support the IntPtr data type, one can use this mechanism as a generic way to copy data to the GPU. In practice, when a user calls one of the existing copy functions, the exact process is performed.

Again, consider the Test structure we created before.

// Getting native handle from an array
Test[] data = new Test[100];
// Fill in the array values...
GCHandle ptr = GCHandle.Alloc(data, GCHandleType.Pinned);
IntPtr src = ptr.AddrOfPinnedObject();
// Now copy to the GPU memory from this pointer...
// When finished, don't forget to free the GCHandle!

This is a simple process for exposing complex .NET data types to CUDA and CUDA.NET to be processed by the GPU.

In the next article we will present the new SizeT object we added for portability between 32 and 64 bit systems.

New CUDA.NET Release (2.3.6)

Hi Everyone,

We’ve just released the latest version (2.3.6) of CUDA.NET library.

Beside supporting the latest features of CUDA 2.3 (double precision FFT, advanced memory allocation and more) we added more features to the API of the runtime and graphics (DX/GL) to better support 32/64 systems and be portable.

Following this article we will publish a few series of articles presenting the new constructus we added, and native interoperability, which is always an issue with .NET code and advanced demands for applications. They intend to show how to create portable code between systems and using complex structures an data types passed to the GPU.


CUDA.NET – Case studies, call for contribution

We are pleased to announce a call for contribution for case studies and customer stories using CUDA.NET to be presented in our web site.

We invite organizations, research institutes and privates to tell us about about their use of CUDA.NET for different purposes – developing a product, researching variety of scientific fields and more.

Users willing to contribute their story are invited to send their details to the following address: and we will contact them soon.

Thank you for your cooperation.

CUDA.NET 2.2 released

We are happy to announce the release of CUDA.NET version 2.2.

This release aligns with CUDA 2.2 API and features, and provides further improvements with CUDA.NET.
To download page.

Few of the additions/changes:

  • Supporting CUDA 2.2 API (zero copy etc.)
  • CUDA class supports all driver functions, adding few missing texture functions into the API
  • Removing double precision FFT routines from CUFFT – the functions were there for future support, but are no longer available
  • Adding MSDN/CHM based documentation for the library
  • Extending the runtime API support to allow various memory copies and the latest 2.2 API

We you will all find that release useful.

You are invited to provide us comments for usage and in general about the library to improve it.
You can send all that information to

jCUDA 1.1 released

jCUDA version 1.1 is released to the public. This version adds many improvments to the previous 1.0.1 release.


  • – Adding object oriented support for CUDA, OpenGL and CUFFT functionality
  • – Splitting FFT and CUDA native libraries to operate as standalone
  • – Extending native interface to provide more functionality (NativeUtiles.getPointerSize method)

You may download the new release from: jCUDA.

Using Hoopoe File System (HoopoeFS)

The Hoopoe File System service reference can be found at:

This post will present the File System interface to Hoopoe distributing engine. The File System (FS) interface can be used by users to transfer data files to be processed by Hoopoe with CUDA computing kernels. After processing completes, the same interface can be used to read back computed results.


  • Features
  • General terms
  • API description
  • API examples
    • Creating new instance
    • Authenticating
    • Creating a file
    • Creating a directory
    • Creating a file under a sub-directory
    • Deleting a file
    • Writing data into files
    • Reading data from files

1. Features

HoopoeFS exposes a simple interface for data and file management. In general, most features available by general OS file systems are given by HoopoeFS service, providing high flexibility for users.

Taking security into consideration, every users is provided with a completely isolated environment, so no special security functions should be used or exposed, as every users sees, and able to access only the files he generated or uploaded.

As a simplified manner, the API provided by HoopoeFS is generic, but there are few limitations to user operations and capabilities. For example, a user is allowed to place files in the root directory or under sub directories. A user is also allowed to create only one level of sub directories being able to contain additional files.

2. General terms

As previously mentioned, HoopoeFS provides all general constructs for working with files and directories.

A file, is simply a container for data, either raw or compressed and can be named using every supported character.

A directory, is a container for files, and provided is the root directory, and further sub directories that can be created by the user.

3. API description

For data constructs (File, Directory), are provided management functions as follows:

  • CreateFile/CreateDirectory – allows to create a new file or directory, respectively. Calling these functions is required as a first step prior to accessing them.
  • DeleteFile/DeleteDirectory – deletes a previously created file or directory.
  • RenameFile/RenameDirectory – given existing files or directories, allows to modify their name.
  • WriteData/ReadData – modifies the content of a file (write) or reads content from a specific file.
  • GetFileSize – returns the number of bytes in a file.

For general operation, few more functions are given:

  • Authenticate – returns a value indicating whether the user is registered and recognized by HoopoeFS
  • IsUserOverQuota – returns a value indicating whether the user has exceeded the allowed storage space. In such case the user cannot create new files or directories, but can delete and read contents of files.

4. API examples

4.1 Creating new instance

In order to work with HoopoeFS, it is necessary to create a new instance of HoopoeFS class:
HoopoeFS hps = new HoopoeFS();

4.2 Authenticating

It is a good practice to check with HoopoeFS if we are authenticated, before performing futher operations. Every operation to be performed must use this authentication level:

Authentication a = new Authentication();
a.User = test@company_alias;
a.Password = "my_password"
hfs.AuthenticationValue = a;

4.3 Creating a file 

Creating a file is a simple task with HoopoeFS API:


4.4 Creating a directory

Following the previous example, a similar API can be used to create a new sub-directory (all directories are created under the root):


4.5 Creating a file under a sub-directory

Once created a sub directory, any number of files can be created under it.

To do that, the following operations are necessary:

Directory d = new Directory();
d.Name = "test_data";
hfs.DirectoryValue = d;

You may note, that once hfs.DirectoryValue is set, all file related operations correspond to the directory (creating new files, deleting, modifying etc.), so when working with the directory ends, hfs.DirectoryValue should be set to null.

4.6 Deleting a file

A very straight forward operation:


4.7 Writing data into files

The concept of writing data to files within Hoopoe maps to the real world, with a simplified API.

byte[] data = new byte[512*1024];
// Load/generate data

// Write the data, starting at offset 0 of the file
long offset = 0;
hfs.WriteFile("temp.dat", data, offset);

// If willing to write more data, then consider a
// new offset
offset += data.Length;

4.8 Reading data from files

The same rules for writing data apply to reading it from files.

// Read the data, starting at offset 0 of the file
long offset = 0;
// Determines the amount of bytes to read
int length = 512*1024;
byte[] data = hfs.ReadFile("temp.dat", offset, length);

// Past this point, data will contain the bytes
// that were read.
// In case fewer bytes than requested were read,
// the size of data will be consistent with the actual
// bytes read.

// If willing to read more data, then consider a
// new offset
offset += data.Length;