Software pipelining ( swp ) is an effective technique for loop optimization 軟件流水是開發(fā)循環(huán)指令級并行的重要編譯技術。
Software pipelining is a loop scheduling technique that extracts ilp by overlapping the execution of several consecutive iterations 軟件流水是開發(fā)循環(huán)程序指令級并行性的重要編譯優(yōu)化技術。
Epic defines a new style of architecture that enables higher levels of instruction level parallelism ( ilp ) without unacceptable hardware complexity Epic是一種顯性并行指令計算體系結構,主要思想是利用編譯器和處理器的協同能力來提高指令級并行度。
Nowadays , all sorts of multimedia services and network services develop flourishingly . it is far from enough to meet the performance requirement of such services to exploit ilp only 在各種多媒體服務以及網絡服務蓬勃發(fā)展的今天,僅僅開發(fā)傳統的指令級并行性已經遠遠不能滿足這些服務對微處理器的性能要求。
Based on the dlx simulator , smarcof is modified with sma specific extension and heuristic optimizing rules . simulation of spec code shows that above rules could exploit hybrid parallelism effectively with rather low overhead 基于spec代碼的模擬表明該方式能夠有效的挖掘系統的潛力,實現深度的指令級并行和線程級并行開發(fā)。
State - of - the - art microprocessors exploit instruction level parallelism ( ilp ) to achieve high performance on applications by searching for independent instructions in a dynamic window of instructions and executing them on a wide - issue pipeline 對于當前軟件中占主要部分的串行程序而言,微處理器主要依靠開發(fā)程序的指令級并行( ilp )來提高性能。
Multithreaded microprocessor , which has many hardware contexts sharing an execution core , can efficiently exploit both the instruction level parallelism and thread level parallelism to acquire higher performance and better performance / power ratio 多份硬件現場共享一組執(zhí)行單元的多線程處理器能靈活地利用程序中的指令級并行和線程級并行,從而提供更好的性能。
3 ) the instruction - level parallel calculation of streamlines on 3d curvilinear grids has been implemented firstly by using the streaming simd extensions ( sse ) , which are a set of extensions of the intel pentium hi / 4 processor . compared with the conventional algorithm , sse - based algorithm coded by vector class library enhances performance about 55 % , and coded by inlined - assembly is about 75 % ) pentium ( pentium4 )處理器的流simd擴展( sse ) ,首次實現了3d曲線網格流線計算的指令級并行,與傳統算法相比,向量類庫編碼實現的sse算法將性能提高了55左右,嵌入匯編實現提高了75左右。
One of the key elements to achieving higher performance in microprocessors is executing more instructions per cycle . however , dependencies among instructions , varying latencies of certain instructions , and execution resources constraints , limit this parallelism considerably . in order to exploit instruction level parallelism , processor should employ data dependence analysis to identify independent instructions that can execute in parallel 當前,在微處理器體系結構研究中,為了充分提高微處理器的處理性能,主要采用了指令級并行技術( ilp ) ,指令級并行性的開發(fā)程度對發(fā)揮微處理器的硬件特性,提高程序運行性能至為關鍵。
Adaptive stack cache with fast address generation policy decouples stack references from other data references , improves instruction - level parallelism , reduces data cache pollution , and decreases data cache miss ratio . stack access latency can be reduced by using fast address generation scheme proposed here 該方案將棧訪問從數據高速緩存的訪問中分離出來,充分利用棧空間數據訪問的特點,提高指令級并行度,減少數據高速緩存污染,降低數據高速緩存失效率,并采用快速地址計算策略,減少棧訪問的命中時間。