Trimming a string with Bash

Note: the below solution should actually work on all POSIX-compatible shells (including Bash and Zsh).

The problem

Let’s say we have a piece of text that looks like this:

			 
this    string    	
  has  	 whitespace

 everywhere 
 why

and the task is to delete all of the leading and trailing whitespace characters, that is, trim the string, using Bash.

To make it clear which characters are actually present in the above, here’s the original string with escape sequences shown:

\n\t\t\t \nthis    string    \t\n  has  \t whitespace\n\n everywhere \r\n why\t \t\n\t\t 

The reason why I’m restricting this problem to Bash is because this can be done in Python in one line using the built-in strip method:

s.strip() # prints 'this    string    \t\n  has  \t whitespace\n\n everywhere \r\n why'

so let’s see how we can do it in Bash.

Finding solutions online

Googling the term ‘trim string bash’ gives this article as the first hit, which offers a couple of solutions.

Much to my surprise, after running their code on my seemingly simple example, none of them worked as expected, and either removed too few, or too many characters. Other solutions I’ve encountered seem to be assuming we’re dealing with strings that don’t have any newline characters (\n). Additionally, they often use other utilities (commonly sed, awk, and xargs), so it would be useful to have a solution using only Bash-isms instead.

String removal

In Bash, we can remove a given character or string from the front and the back of an input string using the following syntax:

x='some string'
echo "${x#s}" # will print 'ome string'
echo "${x%g}" # will print 'some strin'

We can also remove any one of the following characters in the square brackets as demonstrated in this SO answer by using:

# removes the specified (whitespace) characters from the beginning
echo "${x#[$'\r\t\n ']}"

Note that the list of characters to be removed is specified as a string with a $ prefix, because then the characters are properly escaped, as explained in this Unix.SE answer.

Digging through the Bash man page reveals that it’s instead possible to just use the keyword [:space:] to explicitly specify the entire class of whitespace characters, so the generalization of the above is then:

# removes _any_ whitespace character from the beginning
echo "${x#[[:space:]]}"

Test code

Since we can get the size of a string using ${#variable}, our task is straightforward - keep removing whitespace characters until there’s nothing else to remove, i.e. until size(string before trimming) == size(string after trimming), which is achieved with the following code:

s='		some string  '

size_before=${#s}
size_after=0
while [ ${size_before} -ne ${size_after} ]
do
	size_before=${#s}
	s="${s#[[:space:]]}"
	s="${s%[[:space:]]}"
	size_after=${#s}
done

echo "${s}" # prints 'some string'

Note that using something like ${s##[[:space:]]} won’t work properly. According to the Bash manual, this would remove the longest substring matching the pattern, which means if we our original string was, say, \t\t\n\t actual string, it would just remove \t\t, and leave the rest as-is.

Putting it all together

To make things handy, we can put everything in a function called trimstring, which can then be added to a ~/.bashrc file or similar:

trimstring(){
    if [ $# -ne 1 ]
    then
        echo "USAGE: trimstring [STRING]"
        return 1
    fi
    s="${1}"
    size_before=${#s}
    size_after=0
    while [ ${size_before} -ne ${size_after} ]
    do
        size_before=${#s}
        s="${s#[[:space:]]}"
        s="${s%[[:space:]]}"
        size_after=${#s}
    done
    echo "${s}"
    return 0
}

After a lot of testing, below is a table of commonly used shells in which the above works and doesn’t work.

Shell	Ash	Bash	Dash	Csh	Tcsh	Ksh	Zsh	Fish
Works?	✓	✓	✓	✗	✗	✓	✓	✗

Appendix: solution using GNU Sed

After some more googling, the Sed-based solution from the article mentioned above works if we restrict ourselves to using GNU Sed, which has a -z option¹ that treats the null character as the end of a line instead; this means that $ will only match the end of the whole text stream instead of individual newline (\n) characters, while ^ will match the beginning of the stream. This allows us to make the following script:

trimstring_sed(){
	s="${1}"
	s="$(printf "${s}" | sed -z 's/^[[:space:]]*//')"
	s="$(printf "${s}" | sed -z 's/[[:space:]]*$//')"
	echo "${s}"
	return 0
}

The above basically matches zero or more instances of any whitespace character at the beginning and end of the input string, and removes them.

Comparing the timings of trimstring with trimstring_sed gives trimstring an obvious edge when it comes to speed though:

string="$(cat test.txt)" # contains the initial string

time for i in {1..1000}; do trimstring "${string}" > /dev/null; done
real	0m0.222s
user	0m0.217s
sys	0m0.006s

time for i in {1..1000}; do trimstring_sed "${string}" > /dev/null; done
real	0m3.521s
user	0m2.853s
sys	0m1.270s

Thus, there’s at least an order of magnitude difference between just using built-in Bash-isms vs. using an external tool like Sed; of course, the Python solution is by far the fastest, but only if we’re already inside the interpreter, and aren’t using a shell in the first place.

see this SO answer and/or this Unix.SE answer ↩