博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
OpenCl 笔记1 Memory Model
阅读量:2444 次
发布时间:2019-05-10

本文共 1983 字,大约阅读时间需要 6 分钟。

Memory model

Registers

Equally to a CPU register file, it is private for each thread and read-/write-able. The amount of registers is limited depending on the occupancy, the kernel complexity and the GPU generation. Should the register file be exhausted, then data spills into local memory.

Local Memory

It is introduced to provide a dynamic approach of register files in order to overcome hardware limitations.  The price to be paid is performance loss.

Shared Memory

It can be used for communication between all threads of a thread block as well as primary local storage space. Shared memory is generally the lowest latency communication method between threads. It is read- and write-able, but no coherency is guaranteed if two threads try to access it at the same point of time. Therefore atomic functions are included in the framework.

Constant Memory

The constant memory is one of the read only address spaces.

1-D Texture Array

In contrast to the constant memory the texture array allows an automatic interpolation between neighboring values - in hardware - depending on the given position.

1-D Linear Texture

In contrast to the 1-D texture array, the 1-D linear texture is write-able for kernel functions. Since the texture caches don't force coherence, it is important to understand the behavior will be undefined if a thread writes to a certain position while another thread is reading the position.

2-D Texture Array

Similar to 1-D texture array, provides a bilinear interpolations by hardware.

2-D Texture from Pitch-Linear Memory

Similar to 1-D linear Texture, they are write-able by the kernel.

3-D Texture Array

Unfortunately, there is no 3-D write-able texture available.

Global Memory

Compared to others,it is the slowest possible access. It is limited only by the amount of memory available on the graphics card.

Conclusion: without detailed knowledge about this memory model a parallel implementation is still possible, but a huge loss in performance is very likely.

你可能感兴趣的文章
Android RSS阅读器教程
查看>>
SkySilk –托管云服务提供商
查看>>
使用字典的Python HashMap实现
查看>>
流程图与算法_流程图与算法之间的区别
查看>>
wps宏的功能_宏与功能之间的区别
查看>>
while和do while循环之间的区别
查看>>
程序员连续剧_每个程序员都应该看的5部最佳电视连续剧
查看>>
tensorflow简介_TensorFlow简介
查看>>
矩阵 python 加法_Python矩阵加法
查看>>
python快速排序_Python快速排序
查看>>
人工神经网络导论_神经网络导论
查看>>
C ++ STL无序多集– std :: unordered_multiset
查看>>
深度学习导论
查看>>
go-back-n_iMyFone D-Back iPhone数据恢复
查看>>
MailboxValidator –批量电子邮件列表清理服务
查看>>
机器学习中常见的最优化算法_最常见的机器学习算法
查看>>
css图片和边框之间有间隔_CSS和CSS3之间的区别
查看>>
iphone浏览器劫持修复_修复iPhone卡在Apple徽标问题上的问题
查看>>
5个最佳Python机器学习IDE
查看>>
c++中将字符串转化为数字_在C和C ++中将十进制数转换为罗马数字
查看>>