Announcing CLFORTRAN

We are pleased to announce CLFORTRAN for GPGPU.

CLFORTRAN is a new and elegant Fortran module that allows integration of OpenCL with Fortran programs easier than ever.
Taking advantage of Fortran language features, it is written in pure Fortran – aka no C/C++ code is required to utilize the GPGPU.

CLFORTRAN is compatible with all major compilers: GNU, Intel and IBM, and supporting OpenCL 1.2 API.
In addition, it is provided as open source and licensed under LGPL, to allow scientific computing at massive scales and all supported vendors.

You may read more at CLFORTRAN.

Intel® Released the Xeon® Phi™ Processor (MIC)

Intel® have just introduced their Xeon® Phi™ processor to the market, targeting HPC and scientific computing. It is available for purchase and integration into existing systems/platforms/servers.

The new co-processor is a discrete device that runs an operating system of its own and functions as a fully functional computer (though being a co-processor).

Why should anyone be interested in Xeon® Phi™?

It is based on the most common x86 architecture, therefore porting existing code and algorithms should be the easiest possible.
One may also utilize OpenCL™ algorithms to take advantage of the high-parallelism of the Phi™ processor.

In addition, it features 60 cores, 8GB of internal memory (with 320 GB/s) and uses PCIe x16 slot to provide high performance bandwidth throughput.
With almost 1 TFLOPS of double precision, Phi™ is competent to very high end GPUs in the market today, but on some aspects, provides better performance and industrial matching than other vendors.

You can read more at:

Contact us for more details and projects regarding Intel® Xeon® Phi™ family of processors.


We are happy to announce the availability of NVIDIA® CARMA® platforms, providing an ARM based system (Tegra® 3) cooperating with an internal and a discrete GPU (NVIDIA® Quadro® 1000M utilizing 96 cores).

NVIDIA® CARMA® aims to provide an energy-effecient HPC solution, targeting low-power, small form factor environments.
Software support is provided for CUDA® 5, with FFT, BLAS, image processing, accelerated video decoding/encoding (H.264, H.265, VC-1, MPEG-2 etc.) and much more.

Both Linux, Android and Windows® 8 operating systems are supported.

Hardware platforms are available in ruggedized form factor as well for outdoor, defense or military purposes.

This opens new opportunities for embedded like GPGPU solutions, for digital signage, medical imaging and other handheld/mobile usage.
For more information:

OpenCL.NET 1.1.44 Released

We are pleased to annouce the release of a new OpenCL.NET library supporting OpenCL™ 1.1 (revision 44) driver API by the Khronos group.

This release includes all API functions described by the standard and targets cross-platform HPC and GPGPU applications.
OpenCL.NET can be used to develop solutions under Windows, Linux or Mac, also integrating with Matlab to accelerate algorithms of different kinds.
All vendors are supported, including NVIDIA®, Intel® and AMD.

For more information: OpenCL.NET.

01 – What is CUDA.NET

CUDA.NET is a library that provides access to GPU computing resources on top (using) CUDA API by NVIDIA.

This article is divided into the following topics:

  • What is a GPU?
  • Overview of CUDA
  • Introduction to CUDA.NET
  • Typical Applications
  • Supported Platforms

What is a GPU?

GPU stands for Graphics Processing Unit.
It is a special, dedicated hardware usually used for graphics (2D, 3D, gaming) but now also employed to computing purposes as well.

GPU is used as a general term to represent a hardware solution and there are various vendors worldwide manufacturing them – although there are many types of GPUs only specific models or generations can be used for computing or with CUDA.

There are benefits for using the GPU as a computing resource – It provides strong computing power compared to other equivalents such as CPU, DSP or other dedicated chips with somewhat ease of programming.
For example, a reasonable GPU with 128 cores can provide about 500 GFLOPS (500 billion floating point operations per second), whereas a 4 core CPU can provide about 90 GFLOPS. The numbers can vary based on multiple parameters, but by means of raw computing power, these numbers provide a rough estimate for the potential in using the GPU.

Overview of CUDA

CUDA stands for Compute Unified Device Architecture and is a software environment created by NVIDIA to provide developers with specific API to utilize the GPU for computing directly, rather than doing graphics (the main purpose of GPUs).

This software environment provides API to enumerate the GPUs available in a system as computational devices, initialize them, allocate memory for each and execute code, actually full management aspects of these computing resources accessible on a computer.

CUDA itself is built with C, provides defined API and further libraries to assist developers, such as FFT and BLAS to perform Fourier transforms or linear algebra calculation, accelerated, on the GPU.

For further, deeper reading of these topics (GPU / CUDA), please follow this link: CUDA.

Introduction to CUDA.NET

As outlined above, the environments available today to GPU developers are mostly based on C and meant for native applications. However there is a need to have the same capabilities from managed (.NET/Java) applications. This is where CUDA.NET enters.

CUDA.NET is mostly an interfacing library, providing the same set of API as CUDA for low-level access, using the same terms and concepts. It is also a pure .NET implementation so one can use it from any .NET language or platform that supports CUDA and .NET (Linux, MacOSX etc.).

In addition to a low-level interface, CUDA.NET provides an object-oriented abstraction over CUDA, using the same objects and terms, but with simplifed access for .NET based applications. The same objects can be shared between both environments, but developers would find the OO interface much more friendly and intuitive for use.

The same set of libraries covered by CUDA is also accessible from CUDA.NET – FFT, BLAS and upcoming support for new libraries.

Typical Applications

The GPU can be beneficial for applications where computing takes a significant amount of time or is a bottleneck, as well when looking to free other resources and offload computations to the GPU (as it doesn’t affect the system while working in the background).

Fields where a sort of accelerated computing is needs, or processing of multiple elements can benefit the GPU.
To name a few:

  • Image/Video processing (filters, encoding, decoding)
  • Signal processing
  • Finance
  • Oil & gas (Geophysics)
  • Medical imaging
  • Scientific computations, simulations and research

Supported Platforms

As mentioned earlier, CUDA.NET is based on a pure .NET implementation.

It can be used on (assuming the OS supports CUDA): 

  • Windows
    • For desktops/embedded: XP and above
    • For servers: 2003 and above
  • Linux and other UNIX variants
  • Macintosh (MacOSX)

The library is fully compatible with 32 and 64 bit systems of all kinds mentioned above.

00 – Preface

The new CUDA.NET Tutorials category was created to collect and manage resources and materials for developers starting to work and develop with CUDA.NET library for various platforms.

The usual composition will be of articles on specific topics and gradually increasing complexity.

This post will include an additional Table of Contents for published articles as we go.

Table of Contents

  1. Preface


For any question or comment, please contact us through our email address: support (at)

CUDA.NET 3.0.0 Released

Dear all,

We are happy to announce the release of CUDA.NET version 3.0.0.
This release provides support for latest CUDA 3.0 API and few more updates that will make programming with CUDA from .NET easier and faster.


  • Support for CUDA 3.0 API
  • Added memset functions for CUDA class
  • Supporting new graphics interoperability functions
  • Improved generics support for memory operations
  • Added CUDAContextSynchronizer class

Improved memory operations
We employ GCHandle class to be used with generic memory copies in CUDA class. This method allows to work with every data type (existing vectors or user defined) natively in .NET. The implication is that now you can copy existing custom arrays of structures/classes (user data-types) to device with memory copy functions.

This class was added to assist developers in multi-GPU and multi-threaded environments sharing the same device. It uses existing CUDA API to manipulate the context each thread is attached to and provides .NET means to synchronize between threads sharing the same device for different computations.
Find it under the Tools namespace, the documentation includes a description of how to use it.

We hope you will enjoy this release.
As always, please send us comments or suggestions to: