CUDA – CASS

August 20, 2018

OpenCL.NET is now on GitHub

OpenCL.NET library has been recently added to GitHub with full source code, under the MIT license.
The latest version provides access to OpenCL 2.2 driver API.

March 18, 2014

New CLFORTRAN examples and update

New examples are available on the CLFORTRAN page.

These include:

Quering platforms and devices information
Creating OpenCL context and command queue
Basic device IO with Fortran arrays and validity testing

In addition, CLFORTRAN API was improved for better OpenCL functionality support in Fortran.

January 23, 2014

Announcing CLFORTRAN

We are pleased to announce CLFORTRAN for GPGPU.

CLFORTRAN is a new and elegant Fortran module that allows integration of OpenCL with Fortran programs easier than ever.
Taking advantage of Fortran language features, it is written in pure Fortran – aka no C/C++ code is required to utilize the GPGPU.

CLFORTRAN is compatible with all major compilers: GNU, Intel and IBM, and supporting OpenCL 1.2 API.
In addition, it is provided as open source and licensed under LGPL, to allow scientific computing at massive scales and all supported vendors.

You may read more at CLFORTRAN.

February 4, 2013

Intel® Released the Xeon® Phi™ Processor (MIC)

Intel® have just introduced their Xeon® Phi™ processor to the market, targeting HPC and scientific computing. It is available for purchase and integration into existing systems/platforms/servers.

The new co-processor is a discrete device that runs an operating system of its own and functions as a fully functional computer (though being a co-processor).

Why should anyone be interested in Xeon® Phi™?

It is based on the most common x86 architecture, therefore porting existing code and algorithms should be the easiest possible.
One may also utilize OpenCL™ algorithms to take advantage of the high-parallelism of the Phi™ processor.

In addition, it features 60 cores, 8GB of internal memory (with 320 GB/s) and uses PCIe x16 slot to provide high performance bandwidth throughput.
With almost 1 TFLOPS of double precision, Phi™ is competent to very high end GPUs in the market today, but on some aspects, provides better performance and industrial matching than other vendors.

You can read more at:

http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-detail.html

September 16, 2010

01 – What is CUDA.NET

CUDA.NET is a library that provides access to GPU computing resources on top (using) CUDA API by NVIDIA.

This article is divided into the following topics:

What is a GPU?
Overview of CUDA
Introduction to CUDA.NET
Typical Applications
Supported Platforms

What is a GPU?

GPU stands for Graphics Processing Unit.
It is a special, dedicated hardware usually used for graphics (2D, 3D, gaming) but now also employed to computing purposes as well.

GPU is used as a general term to represent a hardware solution and there are various vendors worldwide manufacturing them – although there are many types of GPUs only specific models or generations can be used for computing or with CUDA.

There are benefits for using the GPU as a computing resource – It provides strong computing power compared to other equivalents such as CPU, DSP or other dedicated chips with somewhat ease of programming.
For example, a reasonable GPU with 128 cores can provide about 500 GFLOPS (500 billion floating point operations per second), whereas a 4 core CPU can provide about 90 GFLOPS. The numbers can vary based on multiple parameters, but by means of raw computing power, these numbers provide a rough estimate for the potential in using the GPU.

Overview of CUDA

CUDA stands for Compute Unified Device Architecture and is a software environment created by NVIDIA to provide developers with specific API to utilize the GPU for computing directly, rather than doing graphics (the main purpose of GPUs).

This software environment provides API to enumerate the GPUs available in a system as computational devices, initialize them, allocate memory for each and execute code, actually full management aspects of these computing resources accessible on a computer.

CUDA itself is built with C, provides defined API and further libraries to assist developers, such as FFT and BLAS to perform Fourier transforms or linear algebra calculation, accelerated, on the GPU.

For further, deeper reading of these topics (GPU / CUDA), please follow this link: CUDA.

Introduction to CUDA.NET

As outlined above, the environments available today to GPU developers are mostly based on C and meant for native applications. However there is a need to have the same capabilities from managed (.NET/Java) applications. This is where CUDA.NET enters.

CUDA.NET is mostly an interfacing library, providing the same set of API as CUDA for low-level access, using the same terms and concepts. It is also a pure .NET implementation so one can use it from any .NET language or platform that supports CUDA and .NET (Linux, MacOSX etc.).

In addition to a low-level interface, CUDA.NET provides an object-oriented abstraction over CUDA, using the same objects and terms, but with simplifed access for .NET based applications. The same objects can be shared between both environments, but developers would find the OO interface much more friendly and intuitive for use.

The same set of libraries covered by CUDA is also accessible from CUDA.NET – FFT, BLAS and upcoming support for new libraries.

Typical Applications

The GPU can be beneficial for applications where computing takes a significant amount of time or is a bottleneck, as well when looking to free other resources and offload computations to the GPU (as it doesn’t affect the system while working in the background).

Fields where a sort of accelerated computing is needs, or processing of multiple elements can benefit the GPU.
To name a few:

Image/Video processing (filters, encoding, decoding)
Signal processing
Finance
Oil & gas (Geophysics)
Medical imaging
Scientific computations, simulations and research

Supported Platforms

As mentioned earlier, CUDA.NET is based on a pure .NET implementation.

It can be used on (assuming the OS supports CUDA):

Windows
- For desktops/embedded: XP and above
- For servers: 2003 and above
Linux and other UNIX variants
Macintosh (MacOSX)

The library is fully compatible with 32 and 64 bit systems of all kinds mentioned above.

August 3, 2010

00 – Preface

The new CUDA.NET Tutorials category was created to collect and manage resources and materials for developers starting to work and develop with CUDA.NET library for various platforms.

The usual composition will be of articles on specific topics and gradually increasing complexity.

This post will include an additional Table of Contents for published articles as we go.

Preface

For any question or comment, please contact us through our email address: support (at) cass-hpc.com.

June 21, 2010

CUDA.NET 3.0.0 Released

Dear all,

We are happy to announce the release of CUDA.NET version 3.0.0.
This release provides support for latest CUDA 3.0 API and few more updates that will make programming with CUDA from .NET easier and faster.

Additions:

Support for CUDA 3.0 API
Added memset functions for CUDA class
Supporting new graphics interoperability functions
Improved generics support for memory operations
Added CUDAContextSynchronizer class

Improved memory operations
We employ GCHandle class to be used with generic memory copies in CUDA class. This method allows to work with every data type (existing vectors or user defined) natively in .NET. The implication is that now you can copy existing custom arrays of structures/classes (user data-types) to device with memory copy functions.

CUDAContextSynchronizer
This class was added to assist developers in multi-GPU and multi-threaded environments sharing the same device. It uses existing CUDA API to manipulate the context each thread is attached to and provides .NET means to synchronize between threads sharing the same device for different computations.
Find it under the Tools namespace, the documentation includes a description of how to use it.

We hope you will enjoy this release.
As always, please send us comments or suggestions to: support@cass-hpc.com.

May 11, 2010

Vicodeo™ – Accelerated Video Decoding Library

Dear all,

We are glad to introduce a new library for video decoding, Vicodeo™, featuring accelerated performance for faster than real-time decoding of H.264, MPEG-2 (and more) video streams – in managed environments (.NET / Java).

Video processing nowadays has become a computing intensive task. Being able to accelerate decoding and various processing tasks, opens the door for many types of applications and usage of video in life, from: high-quality films, security/surveillance cameras, live events, video conversations over the web and much more.

Our library provides many capabilities beyond real-time (+) decoding of 1080p (Full HD) streams:

Codec support: H.264, MPEG-2, VC-1 and more
Color space conversion from YUV 4:2:0 to RGB (accelerated)
Integrated parser for elementary/transport streams and video packets
Simple integration with DirectX or OpenGL
Faster than real-time decoding for 1080p even on low-end platforms
Optional immediate decoding of frames, without buffering
And more!

For more information: Video Decoding.

January 15, 2010

GECCO 2010 – GPU Competition

Dear all,

GECCO (GPUs for Genetic and Evolutionary Computation Conference) will take part this year between July 7^th-11^th, at Portland, Oregon, USA.

Rules and competition guidelines are published on the website provided by the link below.
Registration is open until June 4^th, 2010.

Link to the competition GECCO 2010.

Thanks to Dr. Simon Harding, Memorial University, Canada, for the notes and update.

December 7, 2009

OpenCL.NET 1.0.48 Released

Hello,

We are happy to announce the availability of the so long waiting OpenCL.NET 1.0.48 library.

This version aligns with OpenCL 1.0.48 standard, and fully conforms with latest NVIDIA drivers for OpenCL (and as well on supported platforms).

In brief, this release of the standard added few API functions and modified some, to truly allow heterogeneous computing on a single system. An application can query for the existence of multiple computing devices on the system, also by different vendors (recognize the CPU and a GPU as compute resources) regardless of the vendor. Such that consuming different computing resources can be transparent.

For further details about standard features and changes please consult Khronos website.

For OpenCL.NET page and download, click here.

As always, you are invited to contact us at: support@cass-hpc.com.