A Little About Linux: Streaming, Piping and Redirecting Data

The Streaming, Piping and the general Redirecting of data is one of the most powerful aspects of the Linux command line.  In fact, Windows had nothing like this capability until Powershell came along in 2006.  The whole idea of this topic is that Linux has the ability to treat any output from programs or the input into programs as a stream of data which can be manipulated for example, the data can be redirected to a file or piped to another program.

Types Of Streams
There are three main types of streams;
Standard Input (stdin): Data coming into the computer from a keyboard.
Standard Output (stdout): This is usually any data being displayed on a screen.
Standard Error (stderr): This stream is designed to carry error messages so that you can split it out from the stdout and perhaps redirect the errors to a file whilst all other messages are going to the screen.

Basic Redirection
To redirect data, you use a number of symbols which have different purposes and depending on the direction these symbols are inserted in the command line, will either redirect data out of a program or into a program.  The main ones are shown below;

>     Redirect stdout to a file, create a new file if it doesn't exist already or overwrite if it does.
>>   Append stdout to a file, create a new file if it doesn't already exist.

2>   Redirect stderr to a file, create a new file if it doesn't exist already or overwrite if it does.

2>> Append stderr to a file, create a new file if it doesn't already exist. 

&>   Creates a joint file with stdout and stderr in.  Will overwrite if the file already exists.

<      Sends the contents of a file as stdin.

<<    Usually used in a script, to allow a lot of data to be redirected into a command. This               would normally be of the type; command << EOF This means take all if the data after           this and pass it into the command until you hit the EOF symbol again on its own line             (with nothing else) and then stop. (EOF is the ASCII equivalent of CTRL+D)

If you want to 'get rid of data' you can redirect output to /dev/null e.g. ls > /dev/null

The final redirection command is one of the most useful and one which which I use in scripts much of the time.  The TEE command allows for stdout to appear on the screen and also to be logged to one or many files.


As an example;

echo "Hello World" | tee file.txt file2.txt 

will print Hello World on the screen but will also redirect it to the two files file.txt and file2.txt.  Use the -a if you want append to the files rather than overwriting.

Piping Data
Piping data from one program to another is one of the fundamental ways that Linux works.  Linux uses the output of one application to be the input of another application as the basis for its simple operation... Why write duplicate functionality in a core app, when another already has it, just output from one app to another.  The PIPE command is the way that this happens and is represented by the | symbol on the keyboard.

The way it works is quite simple (we showed it when demonstrating the tee command).  Take the output of app1, pipe it into app2 and pipe the output of that into app3 e.g.

app1 | app2 | app3

As a simple example, this will list all RPM files installed, pipe the output to grep which will perform a search for the string 'gnu' within the RPM names;

rpm -qa | grep gnu

Building Command Lines from STDIN
Occasionally, if you are trying to pipe data into another command, you may get into the situation where you can't use the standard options to substitute a single command and you will have to build or generate a command line using the xargs command.

One of the best ways to show this is if you want to find all of the files that match a specific case on a partition and then delete them all if they match.  This type of problem can only be solved by using this method as rm (the command to delete) doesn't have any ability to search, so you have to search for a matching pattern using another tool (find in this example) and then pipe the results into rm.

Lets break it down and try it a step at a time.  First, we need to get the search phrase correct so we can find the files we want to delete.  I want to delete all of the files with the .bak extension anywhere on my file system;

find / -iname *.bak

When I run this on its own, it returns 6 files on my machine and prints out the names of the files on the screen.  Now all we need to do is pipe it to rm using xargs;

find / -iname *.bak | xargs rm

Once that has completed, run find / -iname *.bak again an you will see that all of those files have been deleted.

You can use the 'backtick' character to perform much the same thing e.g.

rm `find / -iname *.bak`

N.B. Please note that backtick i.e. ` is the not that same as an apostrophe '  They are found at different places on the keyboard and perform different functions.  Please be careful if you are pasting something in from Microsoft Word which tries to pre-empt what you mean and quite often converts the backtick ` to an apostrophe ' with horrible consequences.

No comments: