久久久久久久999_99精品久久精品一区二区爱城_成人欧美一区二区三区在线播放_国产精品日本一区二区不卡视频_国产午夜视频_欧美精品在线观看免费

專注電子技術(shù)學(xué)習(xí)與研究
當(dāng)前位置:單片機(jī)教程網(wǎng) >> MCU設(shè)計(jì)實(shí)例 >> 瀏覽文章

嵌入式C/C++開(kāi)發(fā)中的代碼優(yōu)化(翻譯版)

作者:佚名   來(lái)源:本站原創(chuàng)   點(diǎn)擊數(shù):  更新時(shí)間:2014年08月18日   【字體:

 

 
 
事情應(yīng)該被做得盡可能的簡(jiǎn)單,但不是任意地簡(jiǎn)單化。
 
――愛(ài)因斯坦
 
雖然做好的程序能按項(xiàng)目需求正確運(yùn)行到最后一步,但在嵌入式系統(tǒng)開(kāi)發(fā)中并不總是能成功的。由于低成本的需要硬件設(shè)計(jì)者幾乎不可能設(shè)計(jì)出足夠的內(nèi)存和處理器性能來(lái)使得程序能被運(yùn)行。當(dāng)然,在軟件開(kāi)發(fā)過(guò)程中使得程序能夠正確運(yùn)行是更重要的。為了這點(diǎn),通常有一個(gè)或更多的開(kāi)發(fā)平臺(tái),這些平臺(tái)擁有更多的內(nèi)存和更快的處理器速度,能夠使得軟件正確運(yùn)行,并且在項(xiàng)目開(kāi)發(fā)最后階段能夠優(yōu)化代碼。總之,項(xiàng)目開(kāi)發(fā)的最后目標(biāo)是使得開(kāi)發(fā)出的程序能夠在配置低的設(shè)備上運(yùn)行。
 
1 提高代碼的運(yùn)行效率
 
所有的C或C++編譯器在一定程度上能夠優(yōu)化代碼,然而,大部份的優(yōu)化是基于運(yùn)行速度和代碼長(zhǎng)度的折衷,你的程序無(wú)法做到既速度快又代碼長(zhǎng)度小。實(shí)際上,在某一方面改進(jìn)了,但又在其它方面會(huì)有負(fù)面影響,這取決于程序員決定什么改進(jìn)是最重要的。設(shè)置一些優(yōu)化方面的信息,編譯器在優(yōu)化階段當(dāng)碰到運(yùn)行速度和代碼長(zhǎng)度要折衷時(shí)能能夠做出適當(dāng)選擇。
 
既然不能讓編譯器在兩方面同時(shí)達(dá)到優(yōu)化效果,我建議首先要減小代碼量。對(duì)于實(shí)時(shí)或頻繁執(zhí)行的代碼塊運(yùn)行速度通常是重要的,并且有許多方法通過(guò)動(dòng)手可以提高運(yùn)行效率。然而,代碼長(zhǎng)度通過(guò)動(dòng)手來(lái)改變是很難的,這方面編譯器能夠做得更好。
 
在程序運(yùn)行之前,你也許知道哪些子程序或模塊是決定代碼效率的關(guān)鍵,或許你對(duì)這有非常好的解決方法。程序中斷、高優(yōu)先任務(wù)、實(shí)時(shí)計(jì)算、高強(qiáng)度計(jì)算、頻繁被調(diào)用函數(shù)等都是決定代碼效率的關(guān)鍵。有一個(gè)存在于軟件開(kāi)發(fā)過(guò)程中的profiler工具能讓你更加專注于那些花費(fèi)較多時(shí)間的程序。
 
如果你確定需要更好的代碼效率,下面的一些技術(shù)能讓你減少代碼運(yùn)行時(shí)間:
 
內(nèi)聯(lián)函數(shù)
 
在C++程序中,“inline”關(guān)鍵字可以在任何一個(gè)函數(shù)上申明。該關(guān)鍵字能使編譯器改變函數(shù)調(diào)用方式為直接使用拷貝函數(shù)內(nèi)的代碼來(lái)運(yùn)行。這種減少運(yùn)行時(shí)間開(kāi)銷的方法與實(shí)際函數(shù)調(diào)用有關(guān)系,當(dāng)內(nèi)聯(lián)函數(shù)經(jīng)常被調(diào)用且函數(shù)內(nèi)代碼少時(shí)效果是最好的。
 
內(nèi)聯(lián)函數(shù)提供了一個(gè)怎樣提高運(yùn)行速度的很好的例子,但鏈接的代碼量有時(shí)會(huì)有相反的效果。重復(fù)增加的代碼會(huì)增加程序的代碼,這與函數(shù)的調(diào)用次數(shù)成正比,顯然,對(duì)于大函數(shù)將增加更加顯著。結(jié)果是程序運(yùn)行快了,但需要更多的內(nèi)存。
 
表查找
 
 “switch”語(yǔ)句是一個(gè)要被小心使用的常用編程技術(shù),每一次的檢查和跳轉(zhuǎn)在機(jī)器語(yǔ)言中會(huì)簡(jiǎn)單地因決定下一步要做什么而耗盡處理器的時(shí)間。為了提高速度,可以根據(jù)每一種情況發(fā)生的頻率來(lái)排列每一個(gè)“case”的順序,也就是說(shuō),把最容易發(fā)生的情況放在前面,把最不容易發(fā)生的情況放在后面。雖然在最壞情況下不會(huì)改進(jìn)運(yùn)行效率,但是可以提高平均運(yùn)行時(shí)間。
 
如果每一個(gè)“case”語(yǔ)句中要處理很多任務(wù),用一個(gè)數(shù)組指針函數(shù)來(lái)替代“switch”語(yǔ)句可能會(huì)更有效果。例如下面的代碼就可以采用這種方法:
 
enum NodeType { NodeA, NodeB, NodeC };
 
 
 
switch (getNodeType())
 
{
 
    case NodeA:
 
        .
 
        .
 
    case NodeB:
 
        .
 
        .
 
    case NodeC:
 
        .
 
        .
 
}
 
為了提高處理速度,我們把“switch”語(yǔ)句替換成下面的形式。第一部份是設(shè)置:建立一個(gè)函數(shù)指針數(shù)組。第二部份是用一行代碼替換“switch”語(yǔ)句且提高了運(yùn)行效率。
 
int processNodeA(void);
 
int processNodeB(void);
 
int processNodeC(void);
 
 
 
 
 
int (* nodeFunctions[])() = { processNodeA, processNodeB, processNodeC };
 
 
 
.
 
.
 
 
 
 
 
status = nodeFunctions[getNodeType()]();
 
 
 
用匯編語(yǔ)言寫(xiě)代碼
 
 
 
有些軟件模塊最好匯編語(yǔ)言來(lái)寫(xiě),這給了程序員一個(gè)盡可能提高效率的機(jī)會(huì)。雖然大部份C/C++編譯器產(chǎn)生的機(jī)器碼會(huì)比一般的程序員要好,但對(duì)同一個(gè)函數(shù)一個(gè)好的程序員仍能夠做得比一般的編譯器要好。例如,在我的職業(yè)生涯早期我用C語(yǔ)言實(shí)現(xiàn)一個(gè)數(shù)字過(guò)濾算法并在TI TMS320C30 DSP目標(biāo)機(jī)上運(yùn)行,我們使用的編譯器不知道也不會(huì)利用特殊的指令的優(yōu)勢(shì)來(lái)做我需要的數(shù)學(xué)運(yùn)算,我把C程序中的一個(gè)循環(huán)代碼替換成匯編語(yǔ)言并實(shí)現(xiàn)一樣的功能,結(jié)果程序整個(gè)運(yùn)行時(shí)間提高了10多個(gè)百分點(diǎn)。
 
寄存器變量
 
當(dāng)申明局部變量時(shí)可以用到寄存器,告訴編譯器把變量存到一個(gè)通用寄存器,而不是在內(nèi)存棧上。 對(duì)于最頻繁使用的變量合理使用這個(gè)方法來(lái)編譯能提高一些程序運(yùn)行效率,函數(shù)被調(diào)用得越頻繁,代碼效率就提高得越明顯。
 
全局變量
 
用全局變量比用傳遞參數(shù)更有效率,用全局變量可以減少因函數(shù)調(diào)用和退出而產(chǎn)生的參數(shù)進(jìn)棧和出棧。實(shí)際上,大部份高效的子程序的實(shí)現(xiàn)總是不帶參數(shù)的。然而,使用全局變量也會(huì)對(duì)程序產(chǎn)生負(fù)面影響,軟件工程中一般反對(duì)使用全局變量,旨在提高模塊化和可重入性,這也是個(gè)重要的需要考慮的事項(xiàng)。
 
輪詢
 
中斷服務(wù)程序經(jīng)常被用于提高程序效率。然而, 一些少有的事件會(huì)因中斷而變得沒(méi)有效率,這些事件在中斷等待時(shí)間上處于同一個(gè)數(shù)據(jù)級(jí),這時(shí)用輪詢的方法與硬件設(shè)備通信會(huì)更好,當(dāng)然,這會(huì)大大導(dǎo)致小模塊化的軟件設(shè)計(jì)。
 
固定點(diǎn)算法
 
除非你的目標(biāo)平臺(tái)上有浮點(diǎn)運(yùn)算協(xié)處理器,否則你的程序會(huì)在處理浮點(diǎn)數(shù)據(jù)上花費(fèi)大量開(kāi)銷。支持浮點(diǎn)運(yùn)算的編譯器中的程序庫(kù)包含一組子程序來(lái)模擬浮點(diǎn)運(yùn)算協(xié)處理器中的指令組。這些函數(shù)中許多在整數(shù)計(jì)算上花費(fèi)很長(zhǎng)時(shí)間而且這些函數(shù)不能被重入。
 
如果你只在一些計(jì)算上用了浮點(diǎn),用固定點(diǎn)算法來(lái)重新實(shí)現(xiàn)這些計(jì)算會(huì)更好。雖然它可能難以實(shí)現(xiàn),但是在理論上用固定點(diǎn)算法是可實(shí)現(xiàn)任何浮點(diǎn)計(jì)算的。(總之,那是浮點(diǎn)運(yùn)算軟件是怎樣實(shí)現(xiàn)的事,對(duì)嗎?)你的最大有利在于你可能不用去實(shí)現(xiàn)整個(gè)IEEE 754標(biāo)準(zhǔn),只是實(shí)現(xiàn)一兩個(gè)計(jì)算而已。如果你不需要這類全部的函數(shù),那就堅(jiān)持使用編譯器的浮點(diǎn)運(yùn)算庫(kù)并尋找其它提高程序的方法。
 
2 減少代碼量
 
就如前面我說(shuō)的,當(dāng)要減少代碼量最好讓編譯器來(lái)為你賭一把。然而,如果最后程序?qū)τ谀愕目捎脙?nèi)存來(lái)說(shuō)還是太大,這兒有幾個(gè)編程技術(shù)可以讓你減少更多的代碼量。在這部份我們將討論自動(dòng)和手動(dòng)來(lái)優(yōu)化代碼量。
 
當(dāng)然,墨菲法則(任何可能出錯(cuò)的事終將出錯(cuò))表明第一次讓編譯器優(yōu)化先前的代碼將會(huì)突然失敗。也許自動(dòng)優(yōu)化的最大壞處是清除沒(méi)用的代碼(dead code elimination),認(rèn)為這些代碼是冗余的和無(wú)關(guān)的。例如,添加一個(gè)值為0的變量,但在無(wú)論什么計(jì)算中都沒(méi)有被使用,但你可能仍然要編譯器產(chǎn)生那些無(wú)關(guān)的指令來(lái)實(shí)現(xiàn)編譯器無(wú)法知道的某些功能。
 
例如下面的代碼,大部份編譯器會(huì)去掉第一行語(yǔ)句,因?yàn)?ldquo;*pControl”的值在第三行中被重寫(xiě)前沒(méi)有被使用:
 
    *pControl = DISABLE;
 
    *pData    = 'a';
 
    *pControl = ENABLE;
 
但是如果pControl和pData實(shí)際上是指向用于內(nèi)存映射的設(shè)備寄存器的的指針,結(jié)果會(huì)是怎樣呢?這種情況下,在寫(xiě)入數(shù)據(jù)前外圍設(shè)備無(wú)法找到“DISABLE”命令。這將會(huì)破壞處理器和處圍設(shè)備的交互。為了避免出現(xiàn)這個(gè)問(wèn)題,就必須要用“volatile”申明所有指向用于內(nèi)存映射的設(shè)備寄存器的指針和線程(或是一個(gè)線程和一個(gè)中斷程序)共享的全局變量。如果不這樣做,墨菲法則在你的項(xiàng)目中將最后會(huì)無(wú)法預(yù)料地出現(xiàn)。我保證會(huì)。
 
注意:不要犯這種錯(cuò)誤:以為優(yōu)化后程序會(huì)和原來(lái)沒(méi)有優(yōu)化的程序一樣運(yùn)行。在每個(gè)優(yōu)化水平上必須完全再次檢查你的軟件,確認(rèn)運(yùn)行結(jié)果沒(méi)有被改變。
 
情況變量更糟后,調(diào)試一個(gè)優(yōu)化后的程序是具有挑戰(zhàn)性的。在編譯器優(yōu)化下,一行代碼和實(shí)現(xiàn)這行代碼的處理器指令不是緊密相關(guān)的,那些特殊的指令可能已經(jīng)被除掉或分離開(kāi),或者兩個(gè)相似的代碼塊可能用同一個(gè)方法實(shí)現(xiàn)。實(shí)際上,一些高級(jí)語(yǔ)言中的代碼行可能已經(jīng)被一起除去(如上面的例子中的代碼)!結(jié)果是,你可能無(wú)法在某行設(shè)置斷點(diǎn)或檢查變量的值。
 
如果你正要自動(dòng)優(yōu)化代碼,這兒有些能夠用手動(dòng)方法更好地減少代碼量的技巧:
 
避免使用標(biāo)準(zhǔn)庫(kù)中的子程序
 
減少代碼量的最好方法之一是避免使用標(biāo)準(zhǔn)庫(kù)中的子程序,這些子程序因?yàn)橐幚砀鞣N可能情況而造成代碼量很大。可能可以通過(guò)自己實(shí)現(xiàn)其中的一部份功能來(lái)顯著減少代碼量。例如,標(biāo)準(zhǔn)C庫(kù)中“sprintf”函數(shù)是非常大的,其中許多是與支持浮點(diǎn)運(yùn)算有關(guān)的代碼。但如果你不需格式化和顯示浮點(diǎn)數(shù)(%f 或 %d),你可以寫(xiě)一個(gè)只支持整數(shù)處理的“sprintf”函數(shù),這樣可以省下幾千字節(jié)的代碼量。實(shí)際上,標(biāo)準(zhǔn)C庫(kù)(如Cygnus'newlib)很少有實(shí)現(xiàn)這樣的函數(shù),如“sprintf”。
 
本地字長(zhǎng)度
 
每一個(gè)處理器有自己的字大小,ANSI C和C++標(biāo)準(zhǔn)數(shù)據(jù)類型必須映射成本地字大小。處理更小和更大的數(shù)據(jù)類型有時(shí)需要額外的機(jī)器指令。在程序中使用一致的整數(shù)可能可以減少幾百字節(jié)的代碼量。
 
GOTO 語(yǔ)句
 
和全局變量一樣,好的軟件工程準(zhǔn)則規(guī)定反對(duì)使用這種方法。但在緊急情況下,GOTO語(yǔ)句能夠消除復(fù)雜的控制結(jié)構(gòu)或共享一塊重復(fù)使用的代碼。
 
除了這些技術(shù),前面描述的表查找、寫(xiě)匯編語(yǔ)言代碼、寄存器變量、全局變量都對(duì)減少代碼量有用,其中寫(xiě)匯編語(yǔ)言代碼通常能減少最多的代碼量。
 
3 減少內(nèi)存的使用
 
在某些情況下,相比較可讀寫(xiě)內(nèi)存(RAM)和只讀內(nèi)存(ROM),RAM才是程序運(yùn)行的限制因素。在這種情況下,你就得減少對(duì)全局?jǐn)?shù)據(jù)、棧空間和堆空間的依靠。這些優(yōu)化程序員比編譯器會(huì)做得更好。
 
因?yàn)镽OM通常比RAM要便宜,減少全局?jǐn)?shù)據(jù)的一個(gè)可接受的策略是把常量數(shù)據(jù)移到ROM中,如果你用“const”申明了所有的常量數(shù)據(jù),這種方法可以被編譯器自動(dòng)解決。大多C/C++編譯器會(huì)把遇到的全局常量數(shù)據(jù)移到一個(gè)相當(dāng)于ROM的指定的數(shù)據(jù)段中。這個(gè)方法對(duì)于在運(yùn)行中有許多不會(huì)改變的字符串或數(shù)組數(shù)據(jù)是很有價(jià)值的。
 
如果其中的一些數(shù)據(jù)在運(yùn)行中是固定的但不必是常量,則這個(gè)不變的數(shù)據(jù)段可以被放置在混合型內(nèi)存設(shè)備中,這種內(nèi)存設(shè)備可通過(guò)網(wǎng)絡(luò)或一種寫(xiě)入技術(shù)來(lái)改變其中的數(shù)據(jù)。比如在你的產(chǎn)品中要配置每個(gè)地方的銷售稅率,如果要改變稅率,這種內(nèi)存設(shè)備可以被更新,但另外的RAM也能同時(shí)把這更改的數(shù)據(jù)存起來(lái)。
 
減少棧空間也能降低對(duì)RAM的需求。一種方法是準(zhǔn)確計(jì)算出存儲(chǔ)在整個(gè)內(nèi)存空間的棧數(shù)據(jù)需要多大的棧空間,然后,在一般和不好的狀態(tài)下運(yùn)行軟件,用調(diào)試器來(lái)測(cè)試修改后的棧空間,如果棧空間中的數(shù)據(jù)從沒(méi)有被重寫(xiě)過(guò),則按計(jì)算出的數(shù)據(jù)來(lái)減少棧空間是安全的。
 
在實(shí)時(shí)操作系統(tǒng)中要特別注意棧空間,大多操作系統(tǒng)為會(huì)每一個(gè)任務(wù)建立一個(gè)獨(dú)立的棧空間,這些棧被用于函數(shù)調(diào)用和出現(xiàn)在任務(wù)上下文中的子程序服務(wù)中斷。你可以在用早期的多種形式的描述方法來(lái)確定每個(gè)任務(wù)需要的棧數(shù)量。你也可以通過(guò)減少任務(wù)數(shù)量或讓操作系統(tǒng)有一個(gè)單獨(dú)的中斷棧用于執(zhí)行所有的子程序服務(wù)中斷,后一種方法能夠顯著減少每個(gè)任務(wù)所需要的棧空間。
 
全局?jǐn)?shù)據(jù)和棧空間占用后剩下的內(nèi)存空間便是堆空間的限制范圍。如果堆空間太小,在需要時(shí)則無(wú)法分配內(nèi)存空間,所以在釋放前一定要比較“malloc”或“new”的結(jié)果是不是等于NULL。如果你采用所有的這些建議,但你的程序仍然需要太大的空間,你可能沒(méi)有其它選擇,只有去減少堆空間了。
 
4.(略)
 
原文:
 
《Programming Embedded Systems in c and C++》O'Reilly,1999
 
Chapter 10: Optimizing Your Code
 
    Things should be made as simple as possible, but not any simpler.
 
    —Albert Einstein
 
Though getting the software to work correctly seems like the logical last step for a project, this is not always the case in embedded systems development. The need for low-cost versions of our products drives hardware designers to provide just barely enough memory and processing power to get the job done. Of course, during the software development phase of the project it is more important to get the program to work correctly. And toward that end there are usually one or more "development" boards around, each with additional memory, a faster processor, or both. These boards are used to get the software working correctly, and then the final phase of the project becomes code optimization. The goal of this final step is to make the working program run on the lower-cost "production" version of the hardware.
10.1 Increasing Code Efficiency
 
Some degree of code optimization is provided by all modern C and C++ compilers. However, most of the optimization techniques that are performed by a compiler involve a tradeoff between execution speed and code size. Your program can be made either faster or smaller, but not both. In fact, an improvement in one of these areas can have a negative impact on the other. It is up to the programmer to decide which of these improvements is most important to her. Given that single piece of information, the compiler's optimization phase can make the appropriate choice whenever a speed versus size tradeoff is encountered.
 
Because you can't have the compiler perform both types of optimization for you, I recommend letting it do what it can to reduce the size of your program. Execution speed is usually important only within certain time-critical or frequently executed sections of the code, and there are many things you can do to improve the efficiency of those sections by hand. However, code size is a difficult thing to influence manually, and the compiler is in a much better position to make this change across all of your software modules.
 
By the time your program is working you might already know, or have a pretty good idea, which subroutines and modules are the most critical for overall code efficiency. Interrupt service routines, high-priority tasks, calculations with real-time deadlines, and functions that are either compute-intensive or frequently called are all likely candidates. A tool called a profiler, included with some software development suites, can be used to narrow your focus to those routines in which the program spends most (or too much) of its time.
 
Once you've identified the routines that require greater code efficiency, one or more of the following techniques can be used to reduce their execution time:
 
Inline functions
 
    In C++, the keyword inline can be added to any function declaration. This keyword makes a request to the compiler to replace all calls to the indicated function with copies of the code that is inside. This eliminates the runtime overhead associated with the actual function call and is most effective when the inline function is called frequently but contains only a few lines of code.
 
    Inline functions provide a perfect example of how execution speed and code size are sometimes inversely linked. The repetitive addition of the inline code will increase the size of your program in direct proportion to the number of times the function is called. And, obviously, the larger the function, the more significant the size increase will be. The resulting program runs faster, but now requires more ROM.
Table lookups
 
    A switch statement is one common programming technique to be used with care. Each test and jump that makes up the machine language implementation uses up valuable processor time simply deciding what work should be done next. To speed things up, try to put the individual cases in order by their relative frequency of occurrence. In other words, put the most likely cases first and the least likely cases last. This will reduce the average execution time, though it will not improve at all upon the worst-case time.
 
    If there is a lot of work to be done within each case, it might be more efficient to replace the entire switch statement with a table of pointers to functions. For example, the following block of code is a candidate for this improvement:
 
    enum NodeType { NodeA, NodeB, NodeC };
 
    switch (getNodeType())
    {
        case NodeA:
            .
            .
        case NodeB:
            .
            .
        case NodeC:
            .
            .
    }
 
    To speed things up, we would replace this switch statement with the following alternative. The first part of this is the setup: the creation of an array of function pointers. The second part is a one-line replacement for the switchstatement that executes more efficiently.
 
    int processNodeA(void);
    int processNodeB(void);
    int processNodeC(void);
 
 
    int (* nodeFunctions[])() = { processNodeA, processNodeB, processNodeC };
 
    .
    .
 
 
    status = nodeFunctions[getNodeType()]();
 
Hand-coded assembly
 
    Some software modules are best written in assembly language. This gives the programmer an opportunity to make them as efficient as possible. Though most C/C++ compilers produce much better machine code than the average programmer, a good programmer can still do better than the average compiler for a given function. For example, early in my career I implemented a digital filtering algorithm in C and targeted it to a TI TMS320C30 DSP. The compiler we had back then was either unaware or unable to take advantage of a special instruction that performed exactly the mathematical operations I needed. By manually replacing one loop of the C program with inline assembly instructions that did the same thing, I was able to decrease the overall computation time by more than a factor of ten.
Register variables
 
    The keyword register can be used when declaring local variables. This asks the compiler to place the variable into a general-purpose register, rather than on the stack. Used judiciously, this technique provides hints to the compiler about the most frequently accessed variables and will somewhat enhance the performance of the function. The more frequently the function is called, the more likely such a change is to improve the code's performance.
Global variables
 
    It is more efficient to use a global variable than to pass a parameter to a function. This eliminates the need to push the parameter onto the stack before the function call and pop it back off once the function is completed. In fact, the most efficient implementation of any subroutine would have no parameters at all. However, the decision to use a global variable can also have some negative effects on the program. The software engineering community generally discourages the use of global variables, in an effort to promote the goals of modularity and reentrancy, which are also important considerations.
Polling
 
    Interrupt service routines are often used to improve program efficiency. However, there are some rare cases in which the overhead associated with the interrupts actually causes an inefficiency. These are cases in which the average time between interrupts is of the same order of magnitude as the interrupt latency. In such cases it might be better to use polling to communicate with the hardware device. Of course, this too leads to a less modular software design.
Fixed-point arithmetic
 
    Unless your target platform includes a floating-point coprocessor, you'll pay a very large penalty for manipulating float data in your program. The compiler-supplied floating-point library contains a set of software subroutines that emulate the instruction set of a floating-point coprocessor. Many of these functions take a long time to execute relative to their integer counterparts and also might not be reentrant.
 
    If you are only using floating-point for a few calculations, it might be better to reimplement the calculations themselves using fixed-point arithmetic only. Although it might be difficult to see just how this can be done, it is theoretically possible to perform any floating-point calculation with fixed-point arithmetic. (After all, that's how the floating-point software library does it, right?) Your biggest advantage is that you probably don't need to implement the entire IEEE 754 standard just to perform one or two calculations. If you do need that kind of complete functionality, stick with the compiler's floating-point library and look for other ways to speed up your program.
 
10.2 Decreasing Code Size
 
As I said earlier, when it comes to reducing code size your best bet is to let the compiler do the work for you. However, if the resulting program is still too large for your available ROM, there are several programming techniques you can use to further reduce the size of your program. In this section we'll discuss both automatic and manual code size optimizations.
 
Of course, Murphy's Law dictates that the first time you enable the compiler's optimization feature your previously working program will suddenly fail. Perhaps the most notorious of the automatic optimizations is " dead code elimination." This optimization eliminates code that the compiler believes to be either redundant or irrelevant. For example, adding zero to a variable requires no runtime calculation whatsoever. But you might still want the compiler to generate those "irrelevant" instructions if they perform some function that the compiler doesn't know about.
 
For example, given the following block of code, most optimizing compilers would remove the first statement because the value of *pControl is not used before it is overwritten (on the third line):
 
    *pControl = DISABLE;
    *pData    = 'a';
    *pControl = ENABLE;
 
But what if pControl and pData are actually pointers to memory-mapped device registers? In that case, the peripheral device would not receive the DISABLE command before the byte of data was written. This could potentially wreak havoc on all future interactions between the processor and this peripheral. To protect yourself from such problems, you must declare all pointers to memory-mapped registers and global variables that are shared between threads (or a thread and an ISR) with the keywordvolatile. And if you miss just one of them, Murphy's Law will come back to haunt you in the final days of your project. I guarantee it.
 
Never make the mistake of assuming that the optimized program will behave the same as the unoptimized one. You must completely retest your software at each new optimization level to be sure its behavior hasn't changed.
 
To make matters worse, debugging an optimized program is challenging, to say the least. With the compiler's optimization enabled, the correlation between a line of source code and the set of processor instructions that implements that line is much weaker. Those particular instructions might have moved or been split up, or two similar code blocks might now share a common implementation. In fact, some lines of the high-level language program might have been removed from the program altogether (as they were in the previous example)! As a result, you might be unable to set a breakpoint on a particular line of the program or examine the value of a variable of interest.
 
Once you've got the automatic optimizations working, here are some tips for further reducing the size of your code by hand:
 
Avoid standard library routines
 
    One of the best things you can do to reduce the size of your program is to avoid using large standard library routines. Many of the largest are expensive only because they try to handle all possible cases. It might be possible to implement a subset of the functionality yourself with significantly less code. For example, the standard C library's sprintf routine is notoriously large. Much of this bulk is located within the floating-point manipulation routines on which it depends. But if you don't need to format and display floating-point values (%f or %d ), you could write your own integer-only version of sprintf and save several kilobytes of code space. In fact, a few implementations of the standard C library (Cygnus' newlib comes to mind) include just such a function, called siprintf.
Native word size
 
    Every processor has a native word size, and the ANSI C and C++ standards state that data type int must always map to that size. Manipulation of smaller and larger data types sometimes requires the use of additional machine-language instructions. By consistently using int whenever possible in your program, you might be able to shave a precious few hundred bytes from your program.
Goto statements
 
    As with global variables, good software engineering practice dictates against the use of this technique. But in a pinch, goto statements can be used to remove complicated control structures or to share a block of oft repeated code.
 
In addition to these techniques, several of the ones described in the previous section could be helpful, specifically table lookups, hand-coded assembly, register variables, and global variables. Of these, the use of hand-coded assembly will usually yield the largest decrease in code size.
10.3 Reducing Memory Usage
 
In some cases, it is RAM rather than ROM that is the limiting factor for your application. In these cases, you'll want to reduce your dependence on global data, the stack, and the heap. These are all optimizations better made by the programmer than by the compiler.
 
Because ROM is usually cheaper than RAM (on a per-byte basis), one acceptable strategy for reducing the amount of global data might be to move constant data into ROM. This can be done automatically by the compiler if you declare all of your constant data with the keyword const. Most C/C++ compilers place all of the constant global data they encounter into a special data segment that is recognizable to the locator as ROM-able. This technique is most valuable if there are lots of strings or table-oriented data that does not change at runtime.
 
If some of the data is fixed once the program is running but not necessarily constant, the constant data segment could be placed in a hybrid memory device instead. This memory device could then be updated over a network or by a technician assigned to make the change. An example of such data is the sales tax rate for each locale in which your product will be deployed. If a tax rate changes, the memory device can be updated, but additional RAM can be saved in the meantime.
 
Stack size reductions can also lower your program's RAM requirement. One way to figure out exactly how much stack you need is to fill the entire memory area reserved for the stack with a special data pattern. Then, after the software has been running for a while—preferably under both normal and stressful conditions—use a debugger to examine the modified stack. The part of the stack memory area that still contains your special data pattern has never been overwritten, so it is safe to reduce the size of the stack area by that amount.[1]
 
Be especially conscious of stack space if you are using a real-time operating system. Most operating systems create a separate stack for each task. These stacks are used for function calls and interrupt service routines that occur within the context of a task. You can determine the amount of stack required for each task stack in the manner described earlier. You might also try to reduce the number of tasks or switch to an operating system that has a separate "interrupt stack" for execution of all interrupt service routines. The latter method can significantly reduce the stack size requirement of each task.
 
The size of the heap is limited to the amount of RAM left over after all of the global data and stack space has been allocated. If the heap is too small, your program will not be able to allocate memory when it is needed, so always be sure to compare the result of malloc or new with NULL before dereferencing it. If you've tried all of these suggestions and your program is still requiring too much memory, you might have no choice but to eliminate the heap altogether.
10.4 Limiting the Impact of C++
 
One of the biggest issues I faced upon deciding to write this book was whether or not to include C++ in the discussion. Despite my familiarity with C++, I had written almost all of my embedded software in C and assembly. In addition, there has been much debate within the embedded software community about whether C++ is worth the performance penalty. It is generally agreed that C++ programs produce larger executables that run more slowly than programs written entirely in C. However, C++ has many benefits for the programmer, and I wanted to talk about some of those benefits in the book. So I ultimately decided to include C++ in the discussion, but to use in my examples only those features with the least performance penalty.
 
I believe that many readers will face the same issue in their own embedded systems programming. Before ending the book, I wanted to briefly justify each of the C++ features I have used and to warn you about some of the more expensive features that I did not use.
The Embedded C++ Standard
 
You might be wondering why the creators of the C++ language included so many expensive—in terms of execution time and code size—features. You are not alone; people around the world have wondered the same thing—especially the users of C++ for embedded programming. Many of these expensive features are recent additions that are neither strictly necessary nor part of the original C++ specification. These features have been added one by one as part of the ongoing "standardization" process.
 
In 1996, a group of Japanese processor vendors joined together to define a subset of the C++ language and libraries that is better suited for embedded software development. They call their new industry standard Embedded C++. Surprisingly, for its young age, it has already generated a great deal of interest and excitement within the C++ user community.
 
A proper subset of the draft C++ standard, Embedded C++ omits pretty much anything that can be left out without limiting the expressiveness of the underlying language. This includes not only expensive features like multiple inheritance, virtual base classes, runtime type identification, and exception handling, but also some of the newest additions like templates, namespaces, and new-style casts. What's left is a simpler version of C++ that is still object-oriented and a superset of C, but with significantly less runtime overhead and smaller runtime libraries.
 
A number of commercial C++ compilers already support the Embedded C++ standard specifically. Several others allow you to manually disable individual language features, thus enabling you to emulate Embedded C++ or create your very own flavor of the C++ language.
 
Of course, not everything introduced in C++ is expensive. Many older C++ compilers incorporate a technology called C-front that turns C++ programs into C and feeds the result into a standard C compiler. The mere fact that this is possible should suggest that the syntactical differences between the languages have little or no runtime cost associated with them.[2] It is only the newest C++ features, like templates, that cannot be handled in this manner.
 
For example, the definition of a class is completely benign. The list of public and private member data and functions are not much different than a struct and a list of function prototypes. However, the C++ compiler is able to use the public and privatekeywords to determine which method calls and data accesses are allowed and disallowed. Because this determination is made at compile time, there is no penalty paid at runtime. The addition of classes alone does not affect either the code size or efficiency of your programs.
 
Default parameter values are also penalty-free. The compiler simply inserts code to pass the default value whenever the function is called without an argument in that position. Similarly, function name overloading is a compile-time modification. Functions with the same names but different parameters are each assigned unique names during the compilation process. The compiler alters the function name each time it appears in your program, and the linker matches them up appropriately. I haven't used this feature of C++ in any of my examples, but I could have done so without affecting performance.
 
Operator overloading is another feature I could have used but didn't. Whenever the compiler sees such an operator, it simply replaces it with the appropriate function call. So in the code listing that follows, the last two lines are equivalent and the performance penalty is easily understood:
 
Complex  a, b, c;
 
c = operator+(a, b);                 // The traditional way: Function Call
c = a + b;                           // The C++ way: Operator Overloading
 
Constructors and destructors also have a slight penalty associated with them. These special methods are guaranteed to be called each time an object of the type is created or goes out of scope, respectively. However, this small amount of overhead is a reasonable price to pay for fewer bugs. Constructors eliminate an entire class of C programming errors having to do with uninitialized data structures. This feature has also proved useful for hiding the awkward initialization sequences that are associated with complex classes like Timer and Task.
 
Virtual functions also have a reasonable cost/benefit ratio. Without going into too much detail about what virtual functions are, let's just say that polymorphism would be impossible without them. And without polymorphism, C++ would not be a true object-oriented language. The only significant cost of virtual functions is one additional memory lookup before a virtual function can be called. Ordinary function and method calls are not affected.
 
The features of C++ that are too expensive for my taste are templates, exceptions, and runtime type identification. All three of these negatively impact code size, and exceptions and runtime type identification also increase execution time. Before deciding whether to use these features, you might want to do some experiments to see how they will affect the size and speed of your own application.
 
[1]   Of course, you might want to leave a little extra space on the stack—just in case your testing didn't last long enough or did not accurately reflect all possible runtime scenarios. Never forget that a stack overflow is a potentially fatal event for your software and to be avoided at all costs.
 
[2]  Moreover, it should be clear that there is no penalty for compiling an ordinary C program with a C++ compiler.
 
關(guān)閉窗口
主站蜘蛛池模板: 97在线观看视频 | 巨骚综合 | 亚洲精品伦理 | 黄大色黄大片女爽一次 | 日韩精品在线看 | 日本一级淫片色费放 | 免费欧美视频 | 久久久久国产 | 国产99热| 亚洲免费专区 | 中文字幕综合网 | 麻豆av在线免费观看 | 欧美日韩一区二区三区视频 | 99视频在线精品免费观看2 | 理论片中文字幕 | www.日韩在线| 91超碰在线观看 | 黄色小视频免费观看 | 日韩特黄| 国产中文字幕一区二区 | 国产香蕉在线观看 | 天堂免费av | 欧美三级在线视频 | 久久福利社| 黄色一级大片 | 伊人av网| 亚洲成人黄色 | 国产在线欧美 | 永久免费看mv网站入口亚洲 | 亚洲经典av | 久操精品视频 | 一级黄色片在线观看 | 国产乡下妇女三片 | 欧美三级在线看 | 玖玖精品视频 | 毛片一级片 | 亚洲成人免费观看 | av一二三区 | 欧美综合久久 | 国内自拍xxxx18 | 成人视屏在线观看 |