Memory Layout of C++ Local Variables

2023年10月30日

The stack and the esp register

见图。在整个内存区域，上面的叫做高地址，下面的叫做低地址。从high address到esp为the stack，阴影部分为未使用的空间，再下面依次是the heap, uninitialized data, initialized data, and text。^[1]可执行文件被载入内存时，其.text section被载入到内存的text section，其.data section被载入到内存的initialized data，其.bss section被载入到内存的uninitialized data，。同时，系统为该程序分配另外一些内存空间，作为heap和stack。^[2]The Stack从上往下生长；而the heap从下往上生长。

：The simplified virtual memory layout of the sample C++ program in Linux

注意，图是1970年代的Linux内存布局^[3]，缺少很多现代结构。Windows的内存不同，比如heap addresses are greater than stack addresses，并且heap和stack在virtual memory中还分成了好几块^[4]。

The esp register指向the top of the stack。The ebp register指向当前function的base，就是当前function刚开始运行时的esp。有了ebp，当前方法返回时，就可以把esp退回调用本方法前的状态。

因为esp指向栈顶，esp的值会随着push, pop等指令自行变化；当然也可以手动改。

Local Variables

#include <iostream>
#include <iomanip>

using std::cout;
using std::endl;
using std::setw;

int main(int argc)
{
	int value[] = { 1,2,3 };
	int junk[5];

	cout << endl;
	for (int i = 0; i < 5; i++)
		cout << setw(12) << value[i];

	cout << endl;
	for (int i = 0; i < 10; i++)
		cout << setw(12) << junk[i];

	cout << endl;

	cout << endl << "argc: " << argc;
	return 0;
}

使用release模式编译此代码，但要

要么禁止优化，要么使用/Ox。
禁止全程序优化。
禁止省略帧指针。

用debug模式编译就不用改图这么多选项，但不符合实际情况，例如assembly code会多一些变量出来。

编译后用IDA Pro查看assembly code，然后把-20h位置的变量重命名为junk，把-0Ch的变量重命名为value。

.text:80 ; int __cdecl main(int argc, const char **argv, const char **envp)
.text:80 _main           proc near               ; CODE XREF: sub_4015FC+F5↓p
.text:80
.text:80 junk            = dword ptr -20h
.text:80 var_1C          = byte ptr -1Ch
.text:80 value           = dword ptr -0Ch
.text:80 var_8           = dword ptr -8
.text:80 var_4           = dword ptr -4
.text:80 argc            = dword ptr  8
.text:80 argv            = dword ptr  0Ch
.text:80 envp            = dword ptr  10h
.text:80
.text:80                 push    ebp
.text:81                 mov     ebp, esp
.text:83                 sub     esp, 20h
.text:86                 push    esi
.text:87                 push    edi
.text:88                 push    offset sub_4011F0
.text:8D                 push    offset aValue   ; "value:"
.text:92                 push    ds:?cout@std@@3V?$basic_ostream@DU?$char_traits@D@std@@@1@A ; std::basic_ostream> std::cout
.text:98                 mov     [ebp+value], 1
.text:9F                 mov     [ebp+var_8], 2
.text:A6                 mov     [ebp+var_4], 3
.text:AD                 call    sub_401000
.text:B2                 add     esp, 8
.text:B5                 mov     ecx, eax
.text:B7                 call    ds:??6?$basic_ostream@DU?$char_traits@D@std@@@std@@QAEAAV01@P6AAAV01@AAV01@@Z@Z ; std::basic_ostream>::operator<<(std::basic_ostream> & (*)(std::basic_ostream> &))
.text:BD                 xor     edi, edi
.text:BF                 nop
.text:C0
.text:C0 loc_4013C0:                             ; CODE XREF: _main+78↓j
.text:C0                 push    0
.text:C2                 lea     eax, [ebp+var_1C]
.text:C5                 push    0Ch
.text:C7                 push    eax
.text:C8                 call    ?setw@std@@YA?AU?$_Smanip@_J@1@_J@Z ; std::setw(__int64)
.text:CD                 mov     esi, ds:?cout@std@@3V?$basic_ostream@DU?$char_traits@D@std@@@1@A ; std::basic_ostream> std::cout
.text:D3                 push    dword ptr [eax+0Ch]
.text:D6                 mov     edx, [esi]
.text:D8                 push    dword ptr [eax+8]
.text:DB                 mov     eax, [eax]
.text:DD                 mov     edx, [edx+4]
.text:E0                 add     edx, esi
.text:E2                 push    edx
.text:E3                 call    eax
.text:E5                 add     esp, 18h
.text:E8                 mov     ecx, esi
.text:EA                 push    [ebp+edi*4+value]
.text:EE                 call    ds:??6?$basic_ostream@DU?$char_traits@D@std@@@std@@QAEAAV01@H@Z ; std::basic_ostream>::operator<<(int)
.text:F4                 inc     edi

IDA还多显示了几个变量，尤其是.text:9F和.text:A6，按照源代码，它们是value数组的成员。我们可以去 Edit -> Functions -> Stack variables，把value和junk设为数组。

这样显示就很简洁了。图已经把这些local variable填入。junk[0]是在ebp-20h，junk[1]是在上面。显然如果数组访问越界，junk是可以访问value，也可以访问s, r, argc等等，也可以向下访问text section。

越界访问

下面的代码用junk访问其他变量，包括caller's ebp, return address等。

#include <iostream>
#include <iomanip>

using std::cout;
using std::endl;
using std::setw;

int main()
{
	int junk[5];
	int value[] = { 1,2,3 };

	cout << "value:" << endl;
	for (int i = 0; i < 5; i++)
		cout << setw(12) << value[i];
	cout << endl;

	cout << "junk:" << endl;
	for (int i = 0; i < 5; i++)
		cout << setw(12) << junk[i];
	cout << endl;

	cout << "junk can access value:" << endl;
	for (int i = 5; i < 5+3; i++)
		cout << setw(12) << junk[i];
	cout << endl;


	cout << "caller's ebp is " << junk[8] << endl;
	cout << "return address of code is " << junk[9] << endl;
	cout << "argc is " << junk[10] << endl;
	return 0;
}

运行多次，发现return address of code总是比caller's ebp小，这符合理论，因为return address of code指向text section，而text section是在low address。

疑问：当我写`int value[] = { 1,2,3 }`，编译器是如何设置value的大小的？

回答：编译器没有直接设置value的大小。.text:98到.text:A6确实是在赋值，但从assembly code上来看，不知道它们是不是一组相关变量。人类只能从用法上去猜。

.text:EA push [ebp+edi*4+value]是访问数组的标志，但它同样没有说数组到底有多长。

疑问：如果启用了编译时优化会怎么样？（esp addressing）

回答：如果没有做图的设置，assembly code变成用esp访问local variable。esp会随着push and pop变化。

：debugging the sample code compiled with esp addressing

图显示了用x32/64dbg运行该程序的情况。sub esp,24，因为stack是向下生长，所以这行代码是分配24h的空间给local variables，具体是哪些variables目前还没有说明。后面又进行了5次push，每次push都会减少esp。所以等执行到eip这行，esp已经和ebp差距24h+5*4h = 38h，这可以从图的watch面板验证。

：IDA splits the esp addressing into base and offset

如果我们用IDA查看，对应的行被显示为mov [esp+38h+value], 1。IDA的显示是“优化”过的。esp+38h=ebp。

程序接下来执行call sub_4012D0; add esp, 4。IDA告诉我们sub_4012D0的calling convention is cdel，即caller cleans the stack，所以接下来有add esp, 4。但是明明call之前push了5次，为什么只清除了一个push？这是编译器的优化逻辑，目前先不管。

后面有个细节，basic_ostream会令esp+4。再下面是.text 50 push 0，这相当于esp-4。

+4+4-4=+4，所以.text:52时，esp已经比text:2D多4了。于是esp+34h=ebp，而不是前面的esp+38h了。尚不清楚这里为什么要访问junk。

总结

IDA Pro会在程序序言显示local variables and arguments，其offset是相对于ebp的。但是程序可能用esp定位模式，即使如此，IDA会用esp+xxx的方式，自己计算为ebp，然后再应用offset。

References

Jenny Chen, Ruohao Guo. Stack and Heap Memory. Introduction to Data Structures and Algorithms with C++. [2023-10-31].↑
RISHABH TRIPATHI. Memory Layout of C Program. . 2015-03-09 [2023-10-31].↑
Jamie Hanrahan. Understanding Windows Process Memory Layout. . 2019-03-11 [2023-11-09].↑
GMasucci. memory layout of stack and heap in user space. . 2014-04-02 [2023-11-09].↑