Pipes and file descriptors

IPC in Unix - part 2

Pipes are a simple way to connect the output of one process to the input of the other one. You have probably used pipes at some point, for example when executing commands such as ls | grep foo. Here, the output of ls command is fed into grep command and it's done using a pipe symbol |.

To implement pipes successfully we need to understand the pipe() system call, file descriptors, and some functions related to writing to a file. Let's get started!

File descriptors

A file descriptor is a number that uniquely represents an input/output resource, such as an open file or a network socket. It acts as a handle that a process uses to read from or write to the resource. File descriptors are process-specific and each process has its own descriptor table. Forked processes inherit their parent's file descriptors.

There are a few file descriptors that each process opens when it starts:

  • 0 - stdin (standard input, usually keyboard)
  • 1 - stdout (standard output, usually the screen)
  • 2 - stderr (standard error, usually the screen)

Other file descriptors are created and returned by system calls that open files or create network connections, such as open(), creat(), socket(). They are used in system calls such as read() and write(), and closed with close().

Astute readers may notice that process-specific nature of file descriptors could cause troubles with sharing underlying resources, for example in case when multiple processes write to the same file. There are multiple ways in which this is handled, but we'll cover that in one of the future articles in the series.

It's also worth mentioning the difference between the file descriptor and the file handle. The system calls mentioned above work with file descriptors, but there is a set of higher-level functions, such as fopen(), fread(), fwrite(), and fclose(), that work with file handles. A file handle is a more abstract and general term that refers to an operating system's representation of an opened file. It is represented with a FILE pointer in a C standard library and portable across different operating systems.

pipe()

Pipe system call takes a 2-element integer array as an argument and populates it with 2 file descriptors which can together be seen as a pipe. First file descriptor is used to read data and the second one to write it. Pipe is usually used with fork() to enable inter-process communication.

In the following example we create a pipe and then use fork() to create a child process. The child process takes a string from the standard input and writes it to the pipe, while the parent process waits for something to appear on the read end of the pipe. Both child and parent close the unwanted end of the pipe (for child it's the reading end, and for parent it's the writing end). Let's see the code:

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>

void pipes_demo_1()
{
    int pfds[2];
    char s[200];

    pipe(pfds);

    pid_t pid = fork();
    switch (pid)
    {
    case -1:
        perror("fork");
        exit(1);
    case 0:
        close(pfds[0]); // Child can close reading end of the pipe
        printf("CHILD: Write a string to the pipe\n");
        fgets(s, sizeof(s), stdin);
        write(pfds[1], s, strlen(s) + 1);
        close(pfds[1]);
        printf("CHILD: exiting\n");
        exit(0);
    default:
        close(pfds[1]); // Parent can close writing end of the pipe
        printf("PARENT: reading from pipe\n");
        read(pfds[0], s, sizeof(s));
        printf("PARENT: Pipe contents: %s", s);
        close(pfds[0]);
        wait(NULL);
    }
}

For a more realistic example, we can implement a ls | wc -l command which lists contents of a directory and then pipes that to the wc -l which counts the number of lines. Before looking at the code, let's explain 2 more system calls: dup() and execlp().

dup() is used to duplicate a file descriptor. It takes a file descriptor as an input and duplicates it to the lowest available file descriptor, so that both of them point to the same underlying resource. In the following example we first create a child process, close the stdout file descriptor, then call dup() and pass the write end of the pipe. Now when we execute the ls command within the child process its output will go to the write end of the pipe. In the parent process we do the same but with the stdin and the reading end of the pipe.

execlp(), similar to other exec functions, will replace the current process with the one passed in the argument. This specific function executes the program based on its name, and it searches for it in directories listed in the PATH environment variable.

With that clarified, let's look at the example:

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>

void pipes_demo_2()
{
    int pfds[2];

    pipe(pfds);

    pid_t pid = fork();
    switch (pid)
    {
    case -1:
        perror("fork");
        exit(1);
    case 0:
        printf("CHILD: Executing 'ls' command \n");
        close(pfds[0]); // Child can close reading end of the pipe
        close(1); // Close stdout to make it available for dup() system call
        dup(pfds[1]); // Duplicate writing end to first available file descriptor
        execlp("ls", "ls", NULL);
        close(pfds[1]);
        exit(0);
    default:
        printf("PARENT: executing 'wc -l' command\n");
        close(pfds[1]); // Parent can close writing end of the pipe
        close(0); // Close stdin to make it available for dup() system call
        dup(pfds[0]); // Duplicate reading end to first available file descriptor
        execlp("wc", "wc", "-l", NULL);
        close(pfds[0]);
        wait(NULL);
    }
}

This example is simplified, but in essence this is how pipes work in Unix shells. To illustrate further, let's take this command as example:

ls -l | grep -v '^d' | awk '{print $9}' | sort

This command lists all the files in the current directory with ls, filters out all the directories with grep, prints the filenames using awk an then sorts the filenames with sort.

When executed, the shell will fork 4 processes and create 3 pipes. Standard input and outputs for each command will be redirected to the file descriptors of the respective pipe and all commands will start in parallel. Standard output of the final command will not be changed so that the user can see the result on the screen. Each pipe is closed after both commands involved with it are completed.

After clarifying the | and its inner workings, let's look at another IPC technique called named pipes or FIFOs.

Did you find this article valuable?

Support Mladen Drmac by becoming a sponsor. Any amount is appreciated!