本文共 1983 字,大约阅读时间需要 6 分钟。
Memory model
Registers
Equally to a CPU register file, it is private for each thread and read-/write-able. The amount of registers is limited depending on the occupancy, the kernel complexity and the GPU generation. Should the register file be exhausted, then data spills into local memory.
Local Memory
It is introduced to provide a dynamic approach of register files in order to overcome hardware limitations. The price to be paid is performance loss.
Shared Memory
It can be used for communication between all threads of a thread block as well as primary local storage space. Shared memory is generally the lowest latency communication method between threads. It is read- and write-able, but no coherency is guaranteed if two threads try to access it at the same point of time. Therefore atomic functions are included in the framework.
Constant Memory
The constant memory is one of the read only address spaces.
1-D Texture Array
In contrast to the constant memory the texture array allows an automatic interpolation between neighboring values - in hardware - depending on the given position.
1-D Linear Texture
In contrast to the 1-D texture array, the 1-D linear texture is write-able for kernel functions. Since the texture caches don't force coherence, it is important to understand the behavior will be undefined if a thread writes to a certain position while another thread is reading the position.
2-D Texture Array
Similar to 1-D texture array, provides a bilinear interpolations by hardware.
2-D Texture from Pitch-Linear Memory
Similar to 1-D linear Texture, they are write-able by the kernel.
3-D Texture Array
Unfortunately, there is no 3-D write-able texture available.
Global Memory
Compared to others,it is the slowest possible access. It is limited only by the amount of memory available on the graphics card.
Conclusion: without detailed knowledge about this memory model a parallel implementation is still possible, but a huge loss in performance is very likely.