How does Linux find a program to run

2024年7月4日


In a shell, if I type a-command-to-run, how does Linux find this program?

Exit code

Z Shell has an option print_exit_value. If you set it, Z Shell will print the exit code if the exit code is not 0. See Listing .


$ zsh --no-rcs
$ setopt print_exit_value
$ some-command
zsh: command not found: some-command
zsh: exit 127   some-command
: Z Shell have native support of showing exit code

If you load oh-my-zsh, the output may be [1] 179519 exit 127 some-command. The deviation from standard zsh is seemingly due to async_prompt.zsh. [1] is a file descriptor, and 179519 is the process id. Z Shell follows the convention of Bash that if a command is not found, the exit code is 127.[1]

The PATH environment variable

We know that in Bash, the PATH Bourne variable controls the directories for command lookup. In addition to PATH of type string, Z Shell provides path of type array so that users can add or remove directories easily. Z Shell internally synchronizes path and PATH.

Despite that the PATH variable is special to Bourne shell, Bash, and Z Shell[2][3], is it special to the Linux (kernel) itself? We have to check the manual.

The exec() family of functions is used to start a program[4]. I use execl() to write C++ program.

#include <stdio.h>
#include <unistd.h>
#include <sys/wait.h>

int main() {
    pid_t pid = fork();
    // On success, the PID of the child process is returned in the parent, and 0 is returned in the child.

    if (pid == -1) {
        perror("fork failed");
        return 1;
    } else if (pid == 0) {
        execl("uname", "uname", NULL);
        // perror will append the message of errno
        perror("execl failed with the following error message");
        return 1;
    } else {
        wait(NULL);
        printf("Child process completed\n");
    }

    return 0;
}
: use Linux kernel functions to run a program named “uname”
$ sudo apt install --yes build-essential
$ g++ p.cpp
$ ./a.out
execl failed with the following error message: No such file or directory
Child process completed
: install C++ development tools and then build and run the program

Save Listing as p.cpp and run Listing . If you already installed C++ development tools, you don’t have to install build-essential.

In Listing , I ask to run “uname” but I don’t give a full path. The execution result shows that execl() failed and set errno to “ENOENT: No such file or directory”[5].

How execl() looks up a program is defined in path_resolution. In a word, if pathname doesn’t start with /, it is assumed to be relative to the current directory. My “uname” contains no “.”, so execl() just tries to run uname under the current folder, and since the current folder doesn’t have this file, the execution failed. Notice this whoe process does not involve the so-called PATH environment variable.

I can make a.out success. Don’t have to recompile the binary. I only need to provide my own version of uname under the current directory.

$ ll
total 28K
-rwxrwxr-x 1 vm vm 17K Jul  8 05:26 a.out
-rw-rw-r-- 1 vm vm 647 Jul  8 05:27 p.cpp
-rwxrwxr-x 1 vm vm  34 Jul  8 06:57 uname
$ cat uname
#!/bin/bash

echo "this is uname"
$ ./a.out
this is uname
Child process completed
: create my own version of uname and run a.out

Listing shows that I created an executable file named uname with shebang, and echo something. I then run a.out again. execl() this time found my uname, and executed it.

How, we have hands-on experience that the PATH environment variable is not special to Linux. The PATH variable is probabaly rooted in Bourne Shell and adopted by Bash and Z Shell.

How other programs use PATH

Take whereis, which attempts to locate the desired program in the hard-coded Linux places, as well as in the places specified by $PATH and $MANPATH.

$ zsh --no-rcs
$ path+=(~/dev)
$ whereis uname
uname: /usr/bin/uname /home/vm/dev/uname /usr/share/man/man2/uname.2.gz /usr/share/man/man1/uname.1.gz
$ path=()
$ whereis uname
zsh: command not found: whereis
$ /bin/whereis uname
uname: /usr/bin/uname /usr/share/man/man2/uname.2.gz /usr/share/man/man1/uname.1.gz
: Even when PATH is empty, whereis can still find uname in /usr/bin because this path is hard-coded

My own uname is in ~/dev, so I added this path to the path environment/shell variable. Then, whereis can find two uname programs, one in /usr/bin, and the other in /home/vm/dev. Next, I nuke path, whereis cannot be found. But if I specify the path of whereis, whereis can run and can find uname in the hard-coded system path, as Listing shows.

How PHP uses PATH?

env is a utility to run a program in a modified environment. Especially, it has an option --ignore-environment (-i) in order to start with an empty environment. For instance, env --ignore-environment printenv will print nothing. Similarly, PHP function getenv() will return nothing under -i.

$ env php -r 'print_r(getenv());'
Array
(
    [USER] => vm
    [LOGNAME] => vm
    [HOME] => /home/vm
    [PATH] => /home/vm/bin:/usr/local/bin:/usr/local/sbin:/usr/local/bin:
              /usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
	...
)
$ env -i php -r 'print_r(getenv());'
Array
(
)
$handle = proc_open(array('env'), [
    0 => ["pipe", "r"],  // stdin
    1 => ["pipe", "w"],  // stdout
    2 => ["pipe", "w"],  // stderr
], $pipes, null, null, ['bypass_shell' => true]);


if (!is_resource($handle)) {
    echo 'cannot create process';
    return;
}

$exit_code = get_proc_output($handle, $pipes, $stdout, $stderr);

echo "exit code: $exit_code\n";
echo "\nstdout:\n";
echo $stdout;

echo "\nstderr:\n";
echo $stderr;
$ env -i php p.php 
exit code: 0

stdout:

stderr:

This behavior is quite different. Even though the PATH environment variable is unset, php can still find the printenv program. Let dig into the source code of PHP.

The PHP function proc_open() calls Linux kernel function posix_spawnp() if an array is provided as the command[6]. Then according to manpages, posix_spawnp() helps searches for the specified file in the PATH environment variable (in the same way as for execvp(3)), if the executable file is a simple filename[7].

However, we have experimented that the PHP function getenv() returns empty list. How does posix_spawnp() utilize the PATH environment variable?

https://manpages.ubuntu.com/manpages/focal/en/man3/execvp.3.html says “If this [PATH] variable isn’t defined, the path list defaults to a list that includes the directories returned by confstr(_CS_PATH)

参考资料

  1. . Bash Reference Manual. . [2024-07-08].
  2. . The Z Shell Manual. . 2022-05-14 [2024-07-08].
  3. . Bash Reference Manual. . 2022-09-19 [2024-07-08].
  4. . Ubuntu Manpage: execl, execlp, execle, execv, execvp, execvpe - execute a file. . [2024-07-08].
  5. . Ubuntu Manpage: errno - number of last error. . [2024-07-08].
  6. . php-src/ext/standard/proc_open.c at· php/php-src. . [2024-07-08].
  7. . posix_spawn, posix_spawnp - spawn a process. . [2024-07-08].