Reverse Engineering Memory Layout of C++ Local Variables 2023年10月30日 目录 Toggle The stack and the esp registerLocal Variables越界访问疑问:当我写int value[] = { 1,2,3 },编译器是如何设置value的大小的?疑问:如果启用了编译时优化会怎么样?(esp addressing)总结参考资料 The stack and the esp register 见图。在整个内存区域,上面的叫做高地址,下面的叫做低地址。从high address到esp为the stack,阴影部分为未使用的空间,再下面依次是the heap, uninitialized data, initialized data, and text。[1]可执行文件被载入内存时,其.text section被载入到内存的text section,其.data section被载入到内存的initialized data,其.bss section被载入到内存的uninitialized data,。同时,系统为该程序分配另外一些内存空间,作为heap和stack。[2]The Stack从上往下生长;而the heap从下往上生长。 显示pgf tikz源代码 \documentclass{standalone} \usepackage[svgnames]{xcolor} \usepackage{tikz} \usetikzlibrary{patterns} \begin{document} \newcommand{\Width}{4cm} \newcommand{\HalfWidth}{2cm} % #1: y coordinate % #2: address % #3: value name \newcommand{\valueAt}[3]{ \node[anchor=east] at (0, #1) {\tt #2}; \node at (\HalfWidth, #1) {#3}; %\draw[->,thick, blue] (4, #1) -- ++(-1,0); } \begin{tikzpicture} %\path[xstep=\Width, ystep=0.5cm, draw=yellow] (0,0)grid (\Width,22cm); %\path[xstep=\Width, ystep=0.5cm, draw=black] (0,10.5) grid (\Width,16.5); \draw (0,0) rectangle (\Width, 21cm); \node[anchor=east] at (-0.5,20) {\large high address}; \node[anchor=east] at (-0.5,1) {\large low address}; \draw (0,2.5) -- ++(\Width,0); \node[anchor=north] at (\HalfWidth,2.5) {\color{red} text}; \draw[<-,thick, blue] (\Width, 2) -- ++(1,0) node[anchor=west] {\large eip}; \draw (0,4.5) -- ++(\Width,0); \node[anchor=north] at (\HalfWidth, 4.5) {\color{red}initialized data}; \draw (0,6.5) -- ++(\Width,0); \node[anchor=north] at (\HalfWidth, 6.5) {\color{red}uninitialized data}; \draw[pattern=north east lines] (0,7) rectangle (\Width, 9); \draw (0,7) -- ++(\Width,0); \draw[->,thick,red] (\HalfWidth, 7) -- ++(0,0.5); \node[anchor=south] at (\HalfWidth, 6.5) {\color{red}heap}; \draw (0,9) -- ++(\Width,0); \draw[->,thick,red] (\HalfWidth, 9) -- ++(0,-0.5); \node[anchor=south] at (\HalfWidth, 9) {\color{red}stack}; \draw[<-,thick, blue] (\Width, 9) -- ++(1,0) node[anchor=west] {\large esp}; \valueAt{10.5}{ebp-20h}{junk[5]}; \node[anchor=east] at (0, 11) {\tt ebp-1Ch}; \node[anchor=east] at (0, 11.5) {\tt ebp-18h}; \node[anchor=east] at (0, 12) {\tt ebp-14h}; \node[anchor=east] at (0, 12.5) {\tt ebp-10h}; \valueAt{13}{ebp-Ch}{value[3]}; \node[anchor=east] at (0, 13.5) {\tt ebp-8h}; \node[anchor=east] at (0, 14) {\tt ebp-4h}; \valueAt{14.5}{ebp}{s (caller's ebp)}; \valueAt{15}{ebp+4h}{r (return address of code)}; \valueAt{15.5}{ebp+8h}{argc}; \valueAt{16}{ebp+Ch}{argv}; \valueAt{16.5}{ebp+10h}{envp}; \draw[<-,thick, blue] (\Width, 14.5) -- ++(1,0) node[anchor=west] {\large ebp}; \draw[dashed] (0,17) -- ++(\Width,0); \draw[->>,thick] (\HalfWidth,14.5) .. controls (-\HalfWidth,17.25) .. (\HalfWidth,20); \draw[<-,thick, blue] (\Width, 20) -- ++(1,0) node[anchor=west] {\large caller's ebp}; \draw[->>,thick] (\HalfWidth,15) .. controls (-\HalfWidth,8) .. (\HalfWidth,1); \end{tikzpicture} \end{document} :The simplified virtual memory layout of the sample C++ program in Linux 注意,图是1970年代的Linux内存布局[3],缺少很多现代结构。Windows的内存不同,比如heap addresses are greater than stack addresses,并且heap和stack在virtual memory中还分成了好几块[4]。 The esp register指向the top of the stack。The ebp register指向当前function的base,就是当前function刚开始运行时的esp。有了ebp,当前方法返回时,就可以把esp退回调用本方法前的状态。 因为esp指向栈顶,esp的值会随着push, pop等指令自行变化;当然也可以手动改。 Local Variables #include <iostream> #include <iomanip> using std::cout; using std::endl; using std::setw; int main(int argc) { int value[] = { 1,2,3 }; int junk[5]; cout << endl; for (int i = 0; i < 5; i++) cout << setw(12) << value[i]; cout << endl; for (int i = 0; i < 10; i++) cout << setw(12) << junk[i]; cout << endl; cout << endl << "argc: " << argc; return 0; } 使用release模式编译此代码,但要 要么禁止优化,要么使用/Ox。 禁止全程序优化。 禁止省略帧指针。 :修改release编译的设置 用debug模式编译就不用改图这么多选项,但不符合实际情况,例如assembly code会多一些变量出来。 编译后用IDA Pro查看assembly code,然后把-20h位置的变量重命名为junk,把-0Ch的变量重命名为value。 .text:80 ; int __cdecl main(int argc, const char **argv, const char **envp) .text:80 _main proc near ; CODE XREF: sub_4015FC+F5↓p .text:80 .text:80 junk = dword ptr -20h .text:80 var_1C = byte ptr -1Ch .text:80 value = dword ptr -0Ch .text:80 var_8 = dword ptr -8 .text:80 var_4 = dword ptr -4 .text:80 argc = dword ptr 8 .text:80 argv = dword ptr 0Ch .text:80 envp = dword ptr 10h .text:80 .text:80 push ebp .text:81 mov ebp, esp .text:83 sub esp, 20h .text:86 push esi .text:87 push edi .text:88 push offset sub_4011F0 .text:8D push offset aValue ; "value:" .text:92 push ds:?cout@std@@3V?$basic_ostream@DU?$char_traits@D@std@@@1@A ; std::basic_ostream> std::cout .text:98 mov [ebp+value], 1 .text:9F mov [ebp+var_8], 2 .text:A6 mov [ebp+var_4], 3 .text:AD call sub_401000 .text:B2 add esp, 8 .text:B5 mov ecx, eax .text:B7 call ds:??6?$basic_ostream@DU?$char_traits@D@std@@@std@@QAEAAV01@P6AAAV01@AAV01@@Z@Z ; std::basic_ostream>::operator<<(std::basic_ostream> & (*)(std::basic_ostream> &)) .text:BD xor edi, edi .text:BF nop .text:C0 .text:C0 loc_4013C0: ; CODE XREF: _main+78↓j .text:C0 push 0 .text:C2 lea eax, [ebp+var_1C] .text:C5 push 0Ch .text:C7 push eax .text:C8 call ?setw@std@@YA?AU?$_Smanip@_J@1@_J@Z ; std::setw(__int64) .text:CD mov esi, ds:?cout@std@@3V?$basic_ostream@DU?$char_traits@D@std@@@1@A ; std::basic_ostream> std::cout .text:D3 push dword ptr [eax+0Ch] .text:D6 mov edx, [esi] .text:D8 push dword ptr [eax+8] .text:DB mov eax, [eax] .text:DD mov edx, [edx+4] .text:E0 add edx, esi .text:E2 push edx .text:E3 call eax .text:E5 add esp, 18h .text:E8 mov ecx, esi .text:EA push [ebp+edi*4+value] .text:EE call ds:??6?$basic_ostream@DU?$char_traits@D@std@@@std@@QAEAAV01@H@Z ; std::basic_ostream>::operator<<(int) .text:F4 inc edi IDA还多显示了几个变量,尤其是.text:9F和.text:A6,按照源代码,它们是value数组的成员。我们可以去 Edit -> Functions -> Stack variables,把value和junk设为数组。 这样显示就很简洁了。图已经把这些local variable填入。junk[0]是在ebp-20h,junk[1]是在上面。显然如果数组访问越界,junk是可以访问value,也可以访问s, r, argc等等,也可以向下访问text section。 越界访问 下面的代码用junk访问其他变量,包括caller's ebp, return address等。 #include <iostream> #include <iomanip> using std::cout; using std::endl; using std::setw; int main() { int junk[5]; int value[] = { 1,2,3 }; cout << "value:" << endl; for (int i = 0; i < 5; i++) cout << setw(12) << value[i]; cout << endl; cout << "junk:" << endl; for (int i = 0; i < 5; i++) cout << setw(12) << junk[i]; cout << endl; cout << "junk can access value:" << endl; for (int i = 5; i < 5+3; i++) cout << setw(12) << junk[i]; cout << endl; cout << "caller's ebp is " << junk[8] << endl; cout << "return address of code is " << junk[9] << endl; cout << "argc is " << junk[10] << endl; return 0; } 运行多次,发现return address of code总是比caller's ebp小,这符合理论,因为return address of code指向text section,而text section是在low address。 疑问:当我写int value[] = { 1,2,3 },编译器是如何设置value的大小的? 回答:编译器没有直接设置value的大小。.text:98到.text:A6确实是在赋值,但从assembly code上来看,不知道它们是不是一组相关变量。人类只能从用法上去猜。 .text:EA push [ebp+edi*4+value]是访问数组的标志,但它同样没有说数组到底有多长。 疑问:如果启用了编译时优化会怎么样?(esp addressing) 回答:如果没有做图的设置,assembly code变成用esp访问local variable。esp会随着push and pop变化。 :debugging the sample code compiled with esp addressing 图显示了用x32/64dbg运行该程序的情况。sub esp,24,因为stack是向下生长,所以这行代码是分配24h的空间给local variables,具体是哪些variables目前还没有说明。后面又进行了5次push,每次push都会减少esp。所以等执行到eip这行,esp已经和ebp差距24h+5*4h = 38h,这可以从图的watch面板验证。 :IDA splits the esp addressing into base and offset 如果我们用IDA查看,对应的行被显示为mov [esp+38h+value], 1。IDA的显示是“优化”过的。esp+38h=ebp。 程序接下来执行call sub_4012D0; add esp, 4。IDA告诉我们sub_4012D0的calling convention is cdel,即caller cleans the stack,所以接下来有add esp, 4。但是明明call之前push了5次,为什么只清除了一个push?这是编译器的优化逻辑,目前先不管。 后面有个细节,basic_ostream会令esp+4。再下面是.text 50 push 0,这相当于esp-4。 +4+4-4=+4,所以.text:52时,esp已经比text:2D多4了。于是esp+34h=ebp,而不是前面的esp+38h了。尚不清楚这里为什么要访问junk。 总结 IDA Pro会在程序序言显示local variables and arguments,其offset是相对于ebp的。但是程序可能用esp定位模式,即使如此,IDA会用esp+xxx的方式,自己计算为ebp,然后再应用offset。 参考资料Jenny Chen, Ruohao Guo. Stack and Heap Memory. Introduction to Data Structures and Algorithms with C++. [2023-10-31].↑RISHABH TRIPATHI. Memory Layout of C Program. . 2015-03-09 [2023-10-31].↑Jamie Hanrahan. Understanding Windows Process Memory Layout. . 2019-03-11 [2023-11-09].↑GMasucci. memory layout of stack and heap in user space. . 2014-04-02 [2023-11-09].↑ 文章导航 上一篇文章上一篇 Vim :w! (force write)下一篇文章下一个 Reverse C-string and strcat