Orphan Processes in Linux

Orphan processes can sometimes cause confusion when analyzing live Linux systems. But during a recent run of my Linux Forensics class, one of my students showed me an interesting trick that I wanted to make more generally known.

Consider a simple hierarchy of processes:

UID          PID    PPID  C STIME TTY          TIME CMD
root         725       1  0 12:53 ?        00:00:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
root        1285     725  0 12:55 ?        00:00:00 sshd: lab [priv]
lab         1335    1285  0 12:55 ?        00:00:00 sshd: lab@pts/0
lab         1352    1335  0 12:55 pts/0    00:00:00 -bash
lab         1415    1352  0 12:55 pts/0    00:00:00 ping 192.168.10.137

At the top is the master SSH server for this system. Its parent process ID (PPID) is one, because it was started by systemd when the machine booted. Then we have the root-owned sshd process that was started when I connected to the system, and an unprivileged sshd because of the PrivilegeSeparation feature. That unprivileged sshd starts my bash login shell, and from that shell I fired up a ping command to run in the background. You can walk all the way back up the chain and in each case the PPID of each process is the PID of the process before it.

This makes it easy to view these processes as a hierarchy using a tool like pstree:

systemd(1)─┬─ModemManager(704)─┬─{ModemManager}(718)
           │                   └─{ModemManager}(723)
           ├─NetworkManager(669)─┬─{NetworkManager}(694)
           │                     └─{NetworkManager}(701)
           ├─...
           ├─sshd(725)───sshd(1285)───sshd(1335)───bash(1352)───ping(1415)
           ├─...

But what happens when I exit my bash shell and leave the ping process running in the background?

lab         1415       1  0 12:55 ?        00:00:00 ping 192.168.10.137

My poor little ping process has become an orphan, and it’s PPID is now shown as one. This creates kind of a strange look in the pstree output:

systemd(1)─┬─ModemManager(704)─┬─{ModemManager}(718)
           │                   └─{ModemManager}(723)
           ├─NetworkManager(669)─┬─{NetworkManager}(694)
           │                     └─{NetworkManager}(701)
           ├─...
           ├─ping(1415)
           ├─...

Analysts can confuse an orphaned process with one that was started by systemd, and this creates an opportunity for bad actors to obfuscate processes that they started interactively.

/proc to the Rescue!

What my student pointed out to me is that the original PPID of the ping process is still tracked under the /proc/<pid> directory for the process. For example, /proc/1415/status shows the original PPID under the NSsid (Namespace Session ID) field:

Name:   ping
Umask:  0022
State:  S (sleeping)
Tgid:   1415
Ngid:   0
Pid:    1415
PPid:   1
TracerPid:      0
Uid:    1000    1000    1000    1000
Gid:    1000    1000    1000    1000
FDSize: 256
Groups: 24 25 27 29 30 44 46 108 113 117 120 1000
NStgid: 1415
NSpid:  1415
NSpgid: 1415
NSsid:  1352
...

More tersely, you can see the original PPID as the sixth field in /proc/1415/stat:

1415 (ping) S 1 1415 1352 0 -1 4194560 ...

If you compare this with the output of the master sshd process that was actually started by systemd, you will notice a difference in this field. Here’s /proc/725/status:

Name:   sshd
Umask:  0022
State:  S (sleeping)
Tgid:   725
Ngid:   0
Pid:    725
PPid:   1
TracerPid:      0
Uid:    0       0       0       0
Gid:    0       0       0       0
FDSize: 128
Groups:
NStgid: 725
NSpid:  725
NSpgid: 725
NSsid:  725
...

And here’s /proc/729/stat:

725 (sshd) S 1 725 725 0 -1 4194560 ...

In these cases, the process NSsid is the PID of the process started by systemd.

That’s Session ID

There’s one more subtlety at play here. I’m going to start two new ping processes: one in my login shell, and then one after I use sudo to become root:

lab@LAB:~$ ping 192.168.10.137 >/dev/null &
[1] 1492
lab@LAB:~$ sudo -s
root@LAB:/home/lab# ping 192.168.10.137 >/dev/null &
[1] 1495
root@LAB:/home/lab# pstree -d
...
───bash(1465)─┬─ping(1492)
              └─sudo(1493)───bash(1494)─┬─ping(1495)
                                        └─pstree(1497)
...
root@LAB:/home/lab# exit
exit
lab@LAB:~$ logout

I’ve logged out of both shells to orphan both ping processes.

Now I’ll log back in and check the NSsid of the two orphaned processes:

lab@LAB:~$ grep NSsid: /proc/1492/status /proc/1495/status
/proc/1492/status:NSsid:        1465
/proc/1495/status:NSsid:        1465

The parameter is called Name Space Session ID because it applies to the entire user session that is initiated when the user logs in. So even though I used sudo to become the root user, the NSsid is still the PID of my unprivileged login shell. Or, for example, if an attacker manages to escalate privilege in the middle of their session, you can still use the NSsid to tie together processes they may have started in their root shell with processes from the original unprivileged session.

Web Shells and cron Jobs

That got me curious about some other potential scenarios: web shells and cron jobs. I installed the nginx web server along with PHP-FPM. I created a simple web shell in PHP that just invoked any command I passed to it with system(). This means that PHP-FPM will invoke /bin/sh first, which will then execute the command I pass in. The process hierarchy ended up looking like this:

           |-php-fpm(8918)-+-php-fpm(8919)---sh(9074)---ping(9075)
           |               `-php-fpm(8920)

The NSsid of the ping process was 8918— the PID of the master PHP-FPM process that was started by systemd.

Next I set up a cron job that ran ping every minute in the background. I quickly stacked up a number of ping processes:

           |-ping(9199)
           |-ping(9209)
           |-ping(9216)
           |-ping(9225)

Notice that the ping processes here have been orphaned and show no relation to the original cron process that launched them. When I checked the NSsid values of these processes, here is what I found:

root@LAB:~# grep NSsid: /proc/9199/status /proc/9209/status /proc/9216/status /proc/9225/status
/proc/9199/status:NSsid:        9198
/proc/9209/status:NSsid:        9208
/proc/9216/status:NSsid:        9214
/proc/9225/status:NSsid:        9224
root@LAB:~# ps -ef | grep cron
root         665       1  0 12:53 ?        00:00:00 /usr/sbin/cron -f
root@LAB:~# ls -ld /proc/9198
ls: cannot access '/proc/9198': No such file or directory

Each process had a distinct NSsid and none of them matched the PID of the parent cron process. Like system(), cron runs a shell to execute each cron job. But unlike the PHP-FPM example, each shell spawned by cron starts a new session with a new NSsid value. The ping processes were orphaned when shells exited after launching the ping processes.

Bottom line is that processes that were started by systemd have their own PID as the NSsid. Processes started interactively, or launched from other services such as PHP-FPM or cron have some other PID in the NSsid field. Exactly what PID that is can vary depending upon how the process was launched. The NSsid field persists even when the process has been orphaned.

Unfortunately, once the process has been orphaned, we can’t recreate the entire original process hierarchy without additional data that’s not available by default. To rebuild the process hierarchies you would need to be tracking exec() system calls with auditd or eBPF, or using a third-party tool.