Home > Cuda Driver > Cuda Driver Api Pdf

Cuda Driver Api Pdf

Contents

CUresultcuCtxGetApiVersion ( CUcontextctx, unsigned int*version ) Gets the context's API version. Functions CUresult cuCtxCreate ( CUcontext*pctx, unsigned int flags, CUdevicedev ) Create a CUDA context. CU_CTX_SCHED_BLOCKING_SYNC: Instruct CUDA to block the CPU thread on a synchronization primitive when waiting for the GPU to finish work. If ctx is NULL, returns the API version used to create the currently bound context. http://mdportal.net/cuda-driver/cuda-drivers.html

See also: cuCtxDestroy, cuCtxGetApiVersion, cuCtxGetCacheConfig, cuCtxGetDevice, cuCtxGetFlags, cuCtxGetLimit, cuCtxPopCurrent, cuCtxPushCurrent, cuCtxSetCacheConfig, cuCtxSetLimit, cuCtxSynchronize CUresult cuCtxDestroy ( CUcontextctx ) Destroy a CUDA context. CU_DEVICE_ATTRIBUTE_MULTI_GPU_BOARD_GROUP_ID: Unique identifier for a group of devices associated with the same board. CU_LIMIT_MALLOC_HEAP_SIZE: size in bytes of the heap used by the malloc() and free() device system calls. The function cuDeviceGetAttribute() can be used with CU_DEVICE_ATTRIBUTE_COMPUTE_MODE to determine the compute mode of the device. http://docs.nvidia.com/cuda/cuda-driver-api/

Cuda Driver Api Example

Functions CUresultcuDeviceGet ( CUdevice*device, int ordinal ) Returns a handle to a compute device. The driver will use the requested configuration if possible, but it is free to choose a different configuration if required to execute the function. CU_COMPUTEMODE_PROHIBITED: Compute-prohibited mode - Device is prohibited from creating new CUDA contexts. CU_CTX_SCHED_SPIN: Instruct CUDA to actively spin when waiting for results from the GPU.

This function will return '0' in both *leastPriority and *greatestPriority if the current context's device does not support stream priorities (see cuDeviceGetAttribute). If ctx is current to the calling thread then ctx will also be popped from the current thread's context stack (as though cuCtxPopCurrent() were called). CUresultcuCtxGetLimit ( size_t*pvalue, CUlimitlimit ) Returns resource limits. Cuctxcreate This limit can be set smaller than the default or up the maximum launch depth of 24.

See also: cuCtxCreate, cuCtxDestroy, cuCtxGetApiVersion, cuCtxGetCacheConfig, cuCtxGetDevice, cuCtxGetFlags, cuCtxGetLimit, cuCtxPopCurrent, cuCtxPushCurrent, cuCtxSetLimit, cuCtxSynchronize, cuFuncSetCacheConfig CUresult cuCtxSetCurrent ( CUcontextctx ) Binds the specified CUDA context to the calling CPU thread. Attempting to set this limit on devices of compute capability less than 2.0 will result in the error CUDA_ERROR_UNSUPPORTED_LIMIT being returned. Setting CU_LIMIT_PRINTF_FIFO_SIZE must be performed before launching any kernel that uses the printf() device system call, otherwise CUDA_ERROR_INVALID_VALUE will be returned. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html The supported cache configurations are: CU_FUNC_CACHE_PREFER_NONE: no preference for shared memory or L1 (default) CU_FUNC_CACHE_PREFER_SHARED: prefer larger shared memory and smaller L1 cache CU_FUNC_CACHE_PREFER_L1: prefer larger L1 cache and smaller shared

Changed the shared memory configuration between launches may insert a device side synchronization point between those launches. Cumoduleload CU_LIMIT_STACK_SIZE controls the stack size in bytes of each GPU thread. This can decrease latency when waiting for the GPU, but may lower the performance of CPU threads if they are performing work in parallel with the CUDA thread. Functions CUresultcuCtxCreate ( CUcontext*pctx, unsigned int flags, CUdevicedev ) Create a CUDA context.

Cuda Runtime Api

cuCtxSynchronize() returns an error if one of the preceding tasks failed. Deprecated: This flag was deprecated as of CUDA 4.0 and was replaced with CU_CTX_SCHED_BLOCKING_SYNC. Cuda Driver Api Example Parameters ctx - Context to destroy Returns CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE Description Destroys the CUDA context specified by ctx. Cuda Driver Api Vs Runtime Api NVIDIACUDA Toolkit Documentation Search In:Entire SiteJust This Documentclear searchsearch CUDA Toolkit v8.0 CUDA Driver API 1.Difference between the driver and runtime APIs 2.API synchronization behavior 3.Stream synchronization behavior 4.Modules 4.1.Data types

Parameters device - Returned device handle ordinal - Device number to get handle for Returns CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_INVALID_DEVICE Description Returns in *device a device handle given an ordinal check over here Note:Note that this function may also return error codes from previous, asynchronous launches. If ctx is NULL then this will be equivalent to popping the top of the calling CPU thread's CUDA context stack (or a no-op if the calling CPU thread's CUDA context Stream priorities follow a convention where lower numbers imply greater priorities. Culaunchkernel Example

Available modes are as follows: CU_COMPUTEMODE_DEFAULT: Default mode - Device is not restricted and can have multiple CUDA contexts present at a single time. This is only a preference. CU_CTX_LMEM_RESIZE_TO_MAX: Instruct CUDA to not reduce local memory after resizing local memory for a kernel. his comment is here Changing the shared memory bank size will not increase shared memory usage or affect occupancy of kernels, but may have major effects on performance.

If more pending launches than the default (2048 launches) are needed for a module using the device runtime, this limit can be increased. Cudeviceprimaryctxretain Note:Note that this function may also return error codes from previous, asynchronous launches. NVIDIACUDA Toolkit Documentation Search In: Entire Site Just This Document clear search search CUDA Toolkit v8.0 When cuCtxGetSharedMemConfig is called on devices without configurable shared memory, it will return the fixed bank size of the hardware.

See also: cuCtxCreate, cuCtxDestroy, cuCtxGetApiVersion, cuCtxGetCacheConfig, cuCtxGetDevice, cuCtxGetFlags, cuCtxGetLimit, cuCtxPopCurrent, cuCtxPushCurrent, cuCtxSetLimit, cuCtxSynchronize, cuCtxGetSharedMemConfig, cuFuncSetCacheConfig, CUresult cuCtxGetStreamPriorityRange ( int*leastPriority, int*greatestPriority ) Returns numerical values that correspond to the least and greatest

However, on low power devices like Tegra, it always defaults to CU_CTX_SCHED_BLOCKING_SYNC. Setting this limit must be performed before any launch of a kernel that uses the device runtime and calls cudaDeviceSynchronize() above the default sync depth, two levels of grids. Parameters config - Requested cache configuration Returns CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE Description On devices where the L1 cache and shared memory use the same hardware resources, this sets through config Cuinit See also: cuCtxCreate, cuCtxDestroy, cuCtxGetDevice, cuCtxGetFlags, cuCtxGetLimit, cuCtxPopCurrent, cuCtxPushCurrent, cuCtxSetCacheConfig, cuCtxSetLimit, cuCtxSynchronize CUresult cuCtxGetCacheConfig ( CUfunc_cache*pconfig ) Returns the preferred cache configuration for the current context.

CU_DEVICE_ATTRIBUTE_HOST_NATIVE_ATOMIC_SUPPORTED: 1 if Link between the device and the host supports native atomic operations. See also: cuCtxCreate, cuCtxDestroy, cuCtxGetApiVersion, cuCtxGetDevice, cuCtxGetFlags, cuCtxGetLimit, cuCtxPopCurrent, cuCtxPushCurrent, cuCtxSetCacheConfig, cuCtxSetLimit, cuCtxSynchronize, cuFuncSetCacheConfig CUresult cuCtxGetCurrent ( CUcontext*pctx ) Returns the CUDA context bound to the calling CPU thread. See also: cuDeviceGetCount, cuDeviceGetName, cuDeviceGet, cuDeviceTotalMem CUresult cuDeviceGetCount ( int*count ) Returns the number of compute-capable devices. http://mdportal.net/cuda-driver/cuda-driver-api.html See also: cuCtxCreate, cuCtxDestroy, cuCtxGetApiVersion, cuCtxGetCacheConfig, cuCtxGetDevice, cuCtxGetFlags, cuCtxGetLimit, cuCtxPopCurrent, cuCtxPushCurrent, cuCtxSetLimit, cuCtxSynchronize, cuCtxGetSharedMemConfig, cuFuncSetCacheConfig, CUresult cuCtxSynchronize ( void ) Block for a context's tasks to complete.

CUresultcuCtxGetDevice ( CUdevice*device ) Returns the device ID for the current context. CU_COMPUTEMODE_EXCLUSIVE_PROCESS: Compute-exclusive-process mode - Device can have only one context used by a single process at a time. See also: cuDeviceGetAttribute, cuDeviceGetCount, cuDeviceGetName, cuDeviceGet, NVIDIACUDA Toolkit Documentation Search In:Entire SiteJust This Documentclear searchsearch CUDA Toolkit v6.0 CUDA Driver API 1.API synchronization behavior 2.Stream synchronization behavior 3.Modules 3.1.Data types Note:Note that this function may also return error codes from previous, asynchronous launches.

CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS: Device suppports coherently accessing pageable memory without calling cudaHostRegister on it. If these reservations fail, cuCtxSetLimit will return CUDA_ERROR_OUT_OF_MEMORY, and the limit can be reset to a lower value. Setting CU_LIMIT_MALLOC_HEAP_SIZE must be performed before launching any kernel that uses the malloc() or free() device system calls, otherwise CUDA_ERROR_INVALID_VALUE will be returned. See also: cuDeviceGetAttribute, cuDeviceGetCount, cuDeviceGet, cuDeviceTotalMem CUresult cuDeviceTotalMem ( size_t*bytes, CUdevicedev ) Returns the total amount of memory on the device.

For example, it is valid for the API version to be 3020 while the driver version is 4020. Parameters Flags - Initialization flag for CUDA. The context is created with a usage count of 1 and the caller of cuCtxCreate() must call cuCtxDestroy() or when done using the context. See also: cuCtxCreate, cuCtxDestroy, cuCtxGetApiVersion, cuCtxGetCacheConfig, cuCtxGetFlags, cuCtxGetLimit, cuCtxPopCurrent, cuCtxPushCurrent, cuCtxSetCacheConfig, cuCtxSetLimit, cuCtxSynchronize CUresult cuCtxGetFlags ( unsigned int*flags ) Returns the flags for the current context.

CU_CTX_MAP_HOST: Instruct CUDA to support mapped pinned allocations. Parameters leastPriority - Pointer to an int in which the numerical value for least stream priority is returned greatestPriority - Pointer to an int in which the numerical value for greatest Note:Note that this function may also return error codes from previous, asynchronous launches. The supported cache configurations are: CU_FUNC_CACHE_PREFER_NONE: no preference for shared memory or L1 (default) CU_FUNC_CACHE_PREFER_SHARED: prefer larger shared memory and smaller L1 cache CU_FUNC_CACHE_PREFER_L1: prefer larger L1 cache and smaller shared

A grid is outstanding from the point of launch up until the grid is known to have been completed. If no context is bound to the calling CPU thread then *pctx is set to NULL and CUDA_SUCCESS is returned. The three LSBs of the flags parameter can be used to control how the OS thread, which owns the CUDA context at the time of an API call, interacts with the