How to Use Gawk Command on Linux

Gawk is a version of the GNU Project for the awk programming language. You can use this powerful command in Linux for pattern scanning and processing language. It also doesn’t require compiling; users can use variables, numeric functions, string functions, and logical operators.

You are often required to do repetitive tasks when working with text files. And because of this, there might be instances where you want to extract specific lines and then discard the rest. So, you need to make changes wherever certain patterns appear, then leave the rest of the file as it is.

To write single-use programs for these kinds of tasks in languages such as C++, C or Java can be time-consuming and inconvenient. This is where the Gawk command comes in handy.

The Gawk command can be used to: 

  • Manage personal databases.
  • Generate formatted reports.
  • Validate data in files.
  • Scan files line by line.
  • Split each line into fields.
  • Sort data from files.
  • Compare lines or fields to patterns.
  • Perform actions for matched patterns.
  • Make changes to data files.
  • Format outputs.
  • Work with arithmetic and string operations.
  • Work with conditionals and loops.

Basic Syntax

This is what the basic syntax for Gawk looks like:

$ gawk '/Pattern/ {Action}' filename

The Gawk Command: Different Options

There are many options available for Gawk. Before you learn how to use the Gawk command on Linux, it is crucial to understand the following options.

  • -f program-file, –file program-file: This will read the GAWK program source from the source-file instead of the first command-line argument.
  • -F fs, –field-separator fs: This will set the FS variable to fs (Value of the FS predefined variable).
  • -v var=val, –assign var=val: This will set the variable var to the value val before the program begins its execution. These kinds of variable values are available in the BEGIN rule.
  • -b, –characters-as-bytes: This will have Gawk treat all input data as single-byte characters. Additionally, all output from print or printf is treated as single-byte characters. Gawk normally follows the POSIX standard and tries to process input data according to the current locale. It often involves converting multibyte characters to wide characters and leads to problems when the input data do not contain valid multibyte characters. This option is a simple way to get Gawk back off your data.
  • -c, –traditional: This will enable compatibility mode and disable GNU extensions to the Gawk language. In compatibility mode, Gawk behaves like BWK awk; GNU-specific extensions will not be recognized.
  • -C, –copyright: This will print the short version of the General Public License on the standard output, then exit afterward.
  • -e program-text, –source program-text: This will use program-text as your source code. This option allows you to easily mix library functions (use via the -f and –file options) with source code entered on the command line.
  • -E file, –exec file: This option prevents mischievous users from passing in assignments, options, or source code to the CGI application. 
  • -h, –help: If you want to print out a summary of the short-style and long-style, use this option.
  • -O, –optimize: This will enable default optimizations for Gawk in the internal representation of the program. By default, optimization is enabled. 
  • -P, –posix: This will strictly enable POSIX mode, and disable all Gawk extensions. It is similar to the –traditional option.
  • --: This will mark the end of all options.

Prerequisite

All you need is a Linux OS with GAWK installed and a text file you can use the gawk command with.

How to Use the Gawk Command on Linux?

Now that you’ve got enough information, let’s dive in and learn how to use the gawk command on Linux.

The power of Gawk starts to become evident when you use its column processing features. Gawk automatically splits each line or records them into fields. By default, it uses the “space” character to separate each field.

But you can change that by providing the command line parameter “-F” followed by the desired separator. But for starters, let’s try printing a text file exactly as it is.

First, find where your file is located and verify if the file exists using the ls command.

Syntax:

$ ls

As shown in the below screenshot, by simply running the ls command, you can verify that you are in the right directory. For our example, the name of our text file is “NameList”.

local host

Here you can see two files in the Documents directory, a .rpm file and a file called NameList.

Next, use the gawk command to print all contents in a text file.

Syntax:

$ gawk ’ { action } ‘ filename

Example:

$ gawk ‘ { print } ‘ NameList
sample output

As you can see, there are five names in this text file, formatted as first name followed by the last name, and a single space separates them.

Syntax:

$ gawk ’ { action column_number } ‘ filename

Example:

$ gawk ‘ { print $1 } ‘ NameList
gawk command on linux

Here you can see that the gawk command printed only the contents of the 1st column by indicating $1 after the action print.

Syntax:

$ gawk ’ { action column_number2 } ‘ filename

Example:

$ gawk ‘ { print $2 } ‘ NameList
printing the namelist

Here is what happens if you indicate “$2” instead of “$1”, Gawk will print out everything from the second column instead.

But what happens if you include the 2nd column by adding “$2” after declaring the first column “$1”? Let’s find out in the next section.

Syntax:

$ gawk ’ { action column_number1 column_number2 } ‘ filename

Example:

$ gawk ‘ { print $1 $2 } ‘ NameList
gawk command on linux

As you can see, Gawk printed out everything from column 1 and column 2. However, the difference is that there are no spaces between the 1st and 2nd columns.

To add space between them, you simply need to place “,” between the two columns in the syntax.

Syntax:

$ gawk ’ { action column_number1 , column_number2 } ‘ filename

Example:

$ gawk ‘ { print $1, $2 } ‘ NameList
name list

Now you see, it’s pretty much the same as running the command below.

Input:

$ gawk ‘ { print } ‘ NameList

In the following basic examples, you will understand how powerful the gawk command is when it comes to pattern scanning or managing a small personal database.

Syntax:

$ gawk ’ / pattern / { action } ‘ filename

Example:

$ gawk ‘ / E / { print } ‘ NameList
gawk command on linux

As you can see, by declaring a particular pattern, you can narrow down the result to your needs. For this example, the pattern we used is the capital letter “E”. And so, Gawk printed out everything in certain rows with the capital letter “E”. 

Just a reminder, Gawk is case-sensitive. So this is what you get if you use the small letter “e” as your pattern.

gawk command

Syntax:

$ gawk ’ / pattern / { action } ‘ filename

Example:

$ gawk ‘ / e / { print } ‘ NameList

For an additional example, here is what you get if you use “El” capital letter E and a small letter L.

gawk command

Syntax:

$ gawk ’ / pattern / { action } ‘ filename

Example:

$ gawk ‘ / El / { print } ‘ NameList

To further narrow down the result, you can even tell Gawk to print out only the contents from the second column based on the pattern you declare.

gawk command

Syntax:

$ gawk ’ / pattern / { action column_number } ‘ filename

Example:

$ gawk ‘ / El / { print $2 } ‘ NameList

And that’s about it for this tutorial. We looked at what the Gawk command is all about. We’ve also learned how to use the Gawk command on Linux and perform different tasks as per our needs.

If this guide helped you, please share it. ?

Leave a Reply
Related Posts