Count word occurrences in text file on Linux 1

How to Count Occurrence in a Text File on Linux

Count the word occurrences in a text file on Linux and learn about the content it holds in detail. Although the task sounds too heavy to deal with, the availability of grep and tr commands make things simple.

There are over 100 commands in Linux, and adding the sub-commands and flags to each will create a big combination of more. This article mainly concentrates on the commands that help us work with the text files; the main motive is learning how to count the word occurrences in a text file seamlessly.

We will also look at which particular method sits best and the most time optimized. These commands help us count the occurrences in a text file and the occurrences in each line and the occurrences of the lines, only excluding the count in each line.

How to Count the Word Occurrences in a Text File on Linux

Now that you’ve got enough information, it is the perfect time to uncover the best ways to count the word occurrence in a text file.

Count the Word Occurrences in a Text File: The grep Command Method

The first method involves the usage of the grep command. The process is pretty simple, provided you adhere to what I’m going to discuss in the next section.

Creating a Text File 

So in Linux, before you start counting the frequency of word occurrence in a text file, it is crucial to know how to create a text file and how to write contents to it. For that purpose, the best way is to employ the touch command to create an empty text file.

Launch the Terminal by using the “Ctrl+Alt+T” key combination

Pass the following command:

$ touch name_of_file.txt
creating a text file

Editing Contents Inside

Then the next task is adding contents into the text file using the echo command. Something like $ echo > "These are the contents to be written." 

Note: You can use both of the commands in a single step like $ cat > name_of_file.txt, and then the cursor will be placed on the next line to enter the text just like in a word or a notepad. The use of the cat command can also be simplified to view all the contents inside the text file.

View the Content and Count the Word Occurrences

After creating the text file and adding the contents to it, you can start counting the occurrence of each word.  To start with it, all you need to do is invoke the grep command. What the grep command mainly does is it would consider our input as a pattern to be searched and then search for it if we are searching for a string ‘search‘ in a text file named sample.txt.

Here is what the command should look like:

$ grep -o 'search' sample.txt | wc -l
count the word occurrence in a file on Linux

Approaching with the grep Command

The -o option used alongside the grep command instructs the utility to output each corresponding match, but in a unique line and will print the output in a separate line. Then you’ll need to define the -l flag with the wc option to count each word. 

Remember, the -l flag will denote that it is needed to be counted in each line. You can also add the -i flag, which denotes that the string search will be concluded irrespective of whether it is uppercase or lowercase.

$ grep -o -i ‘search’ sample.txt | wc -l
using grep command

You can invoke several input files alongside the concerned grep command. It’ll then work its way, search into, sort all the files, and return the total character count for each file. The command for this purpose will look something like this:

$ grep -o -i ‘search’ sample1.txt sample2.txt | wc -l
grep command to count words

Here, we’ve created a new file sample1.txt and sample2.txt and then performed the operations related to character count on both files. When you pass the two concerned files as a unified argument to the grep command, it will simply initiate the search and come up with the count.

Count the Word Occurrences on Linux: The tr Command Method

Apart from the grep command, we’ve another brilliant command-line utility called tr. It can help count the word occurrences in a text file with no issues whatsoever.

Here you’ll be using two flags -c and -d. While the former will take the compliment of the set, that is, words that do not match the imputed string pattern, the latter will delete those complimented words.

The structure of the tr command to count the word occurrences in a text file:

$ tr -c -d 'search' < sample.txt | wc -c
using tr command

-c: The c flag is invoked to employ the compliment of the set

-d: It deletes all the existing characters mentioned by the concerned set

The string, in this case, bags a single character, search. Since we’ll be combining the -c and -d options, it will eventually delete all the characters, avoiding the ones we’ve not defined in the set. The resultant string then heads towards the wc command, which is nothing but ‘word count.’ 

The -c flag used in the wc or the word count command works its way to return the total character count. 

Also, you can use the command tr together with the grep command differently and proceed with the search for text patterns in the desired file.

Sample Input:

$ tr ‘ [:space:]’ ‘[\n*]’ < sample.txt | grep -i -c search 
tr command Count Occurrence in a Text File on Linux

The word search will then initiate in the sample.txt file, and the total occurrence in each line will get summed up and printed. 

Using awk Command to Count the Word Occurrences

Another command known as the awk proves beneficial when you’re concerned about getting the frequency of word occurrence notes. It is a utility that takes input data, processes it, and returns the desired output.

Although compared to the methods I’ve already discussed, this one is very difficult to understand. For that reason, it lists among the least used method, and I’d recommend you to stick to either grep command or tr command. However, having an idea won’t hurt.

Sample command format:

$ awk -F ‘sample’ ‘{s+=(NF-1)} END {print s}’ sample.txt 

Understanding awk Command

Over here, I’ve changed the current separator to ‘sample’ using the flag -F (-F is used to input the phase that needs to get searched. In our case, it is named search). The data that lies here will be separated at each occurrence of the word search.

Coming to the {s+=(NF-1)} END {print s} section, this is the command that counts all the sub-phases included in the text generated and then eventually decreases it one by one from the main and get the desired character count. The subtraction occurs as one character match will do nothing but split the data into two corresponding parts.

Finally, the outputted count value will be added for each line, and at last, we get the TCC(Total character Count) for the entire text.

That’s basically how you can count the word occurrences in a text file on Linux. If you ask me to help you choose one, I’d say going for the ‘tr’ command as it seems more efficient and time optimized. However, if you’re not after efficiency, all three methods can aid the purpose.

If this guide helped you, please share it.

Leave a Reply
Related Posts