8.1. Readings#

8.1.1. What are strings?#

We have used strings extensively throughout this books but now is finally the time to introduce them more formally and thoroughly. Strings are one of the most important built-in data-types in Python. They stand somewhere in between lists and tuples. They are similar to tuples because they are immutable, while they are similar to lists because they can be viewed as a list of characters.

8.1.2. Formatting string outputs#

8.1.2.1. Format specification#

Previously, we have seen strings of the form f'{...}'. These are called formatted strings or f-strings and hold placeholders inside the {} part, so that their values can be formatted according to users specifications. One usage example would be:

f'{10:d}'
'10'

This code formats the integer value 10 as a string. d is the presentation type of an integer. It tells the underlying function that the value that should be formatted as a string (which in our case is a 10), is an integer. After running the code, you can see that 10 is not anymore an integer, but a string since it has the '' around the value. We can also check with the type() function.

type(f'{10:d}')
str

In Table 8.1, you will find some of the data types and their corresponding presentation types for string formatting.

Table 8.1 Presentation types#

Variable Type

Presentation Type

integer

d

float

f

string

s

character

c

Below we will look at some usage examples for them:

f'{2.589:.2f}'
'2.59'

As you can see, the format specification gives us the possibility to not only represent floating-point values as strings but at the same time to print them rounded. In this case, the .2 before the presentation type indicates that we want the value to be rounded to 2 decimal points, while f says that the value to be printed as a string will be a float.

Another interesting example would be the usage of format specification with characters:

f'{67:c}'
'C'

In this example, we print the uppercase letter C, by using its corresponding ASCII code. You can find more information on ASCII codes here, but they are essentially the numbers that the computer uses for understanding characters. In this example, c tells the interpreter that it should print the character version of the number 67. Then the interpreter will use the ASCII table to convert it to the letter C, which is printed as string in the end.

8.1.2.2. Spacing and alignment#

By default, Python aligns numbers to the right of the printing space and other data-types to the left.

f'{10:5d}'
'   10'

As you can see above, value 10 is right-aligned in a printing space of 5, which means that before it has 3 space characters (try selecting after ' and before 10). Number 5 after the colon :, tells the interpreter to use a printing space of 5 characters. Since by default numbers are right aligned in Python, number 10 will be printed in the place of the last 2 spaces, hence the reason of the 3 spaces in front of 10.

Strings on the other hand are left-aligned:

f'{"Food Science":15s}'
'Food Science   '

As you can see, Food Science is in the left of the printing field and is trailed by 3 spaces (15-12=3).

In case we want to print data types differently from what the default alignment implies, we can use < (left-alignment), > (right-alignment), and ^ (justified-text). These signes are put just after the colon (:).

f'{10:<5d}' 
'10   '
f'{"Food Science":>15s}'
'   Food Science'
f'{10:^5d}'
' 10  '

8.1.2.3. Further number formatting options#

If we would like to explicitly put a positive sign to a number, then we can do so by adding the plus (+) sign after the :.

f'{10:+5d}' 
'  +10'

For negative numbers, the sign goes directly to the number:

f'{-10:5d}' 
'  -10'

In addition, for positive numbers, if we add a space where the plus sign would appear (unless we want the plus sign) then they will easily be aligned with negative numbers:

print(f'{10: 5d}' )
print(f'{-10:5d}') 
   10
  -10

Something useful is the usage of the thousands separator or comma (,) when dealing with big numbers (notice, how you can also use the “_” sign to type out the number in an easier way):

print(f'{1_000_000_000: ,.2f}' )
 1,000,000,000.00

As you can see, the above code rounds the number to 2 decimal places and at the same time adds the thousands separator every 3 digits starting from the right to the left of the integer that will be formatted.

8.1.2.4. format() method#

All of the above can be achieved using the format() method as well, although nowadays the formatted-string f{...} syntax is preferred.

'{:.2f}'.format(12.45778)
'12.46'

This is equivalent to writing:

f'{12.45778:.2f}'
'12.46'

We can also use multiple placeholders with the format() method. Each placeholder will correspond to the arguments passed to the format() method from left to right.

'{0:.2f}, {1:.2f}'.format(12.45778, 14.6587)
'12.46, 14.66'

0 before the : specifies that we are referencing the first argument passed to the format() method, while 1 that we are referencing the second argument.

In addition, we can specify keyword placeholders:

'{age:d}, {name}'.format(age=28, name='John')
'28, John'

8.1.2.5. Escape characters#

Escape characters are preceded by a backslash character (\). They usually include characters that will cause an error in a string literal such as double quotes inside double quotes, backslash itself, etc. Escape characters can be used to adjust white spaces in strings as well. For example, to add a new line we can use \n or to add a tab space \t. In Table 8.2, you will find some of the escape characters and a description of their functionality:.

Table 8.2 Presentation types#

Escape Character

Description

\n

add a new line, the text that comes after it will be added in a new line

\t

add a tab separator

\\

the backslash will be printed in the string

\"

the double quote will be printed together with the string, even if it is enclosed in double quotes

\'

the single quote will be printed together with the string, even if it is enclosed in single quotes

Let us look at some usage examples below:

print('Welcome to 'Food Science'!')
  Cell In[18], line 1
    print('Welcome to 'Food Science'!')
                       ^
SyntaxError: invalid syntax

The above code throws a SyntaxError because we are enclosing some text in a string with single quotes, when the string itself is enclosed in single quotes. To circumvent this, we must escape the single quotes with the \ character. As a result, the single quotes will be printed as well.

print('Welcome to \'Food Science\'!')
Welcome to 'Food Science'!

An example of the tab and new line escape characters would be:

print('Welcome \t to \n Food Science!')
Welcome 	 to 
 Food Science!

As you can see, multiple spaces (equivalent to a tab space) were added after welcome and after to a newline was added.

8.1.3. Substrings#

Substrings are part of a strings. More precisely, a substring is formed by some consecutive letters of a string. E.g: if 'hello' is the string, then some of the substrings would be 'he', 'h', 'hel', 'ello', etc.

8.1.3.1. Searching for substrings#

8.1.3.1.1. in operator#

We can use the in operator to search for a substring in a string:

food_science_message = 'Welcome to Food Science!'
'Food' in food_science_message
True
'Frowns' in food_science_message
False

8.1.3.1.2. startswith() and endswith()#

In order to check if a string starts with a specific substring we can use the startswith() method:

food_science_message.startswith('we')
False

As you can see, we get False because w has a different underlying numerical value than W. In other words, this method is case-sensitive.

Similarly, we can check if a string ends with a specific substring using the endswith() method.

food_science_message.endswith('ence!')
True

8.1.3.1.3. index() and rindex() vs find() and rfind()#

Methods index() and rindex() will search for the substring passed as the argument in the string where they are called and will return the first index where the substring is found. If the substring is not found, a ValueError will be raised. The difference between index() and rindex() is that index() starts the search from the beginning of the string, while rindex() from the end of the string.

food_science_message.index('to')
8
food_science_message.rindex('to')
8

Despite starting the search from the end, it still will return the index where the first match from the end occurs from the beginning of the string. Let us look at another example:

food_science_message.index('o')
4
food_science_message.rindex('o')
13

Fig. 8.1 illustrates the results of both methods.

index-rindex-illustration

Fig. 8.1 Illustration of index() and rindex()#

food_science_message.index('x')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/var/folders/7f/7nw_x13n5q965rss_qz6061m0000gq/T/ipykernel_67164/1873182247.py in <module>
----> 1 food_science_message.index('x')

ValueError: substring not found

As you can see, a ValueError with substring not found message is raised.

On the other hand, find() and rfind() serve exactly the same purpose but instead of raising a ValueError when the substring is not found, they just return -1.

food_science_message.find('o')
4
food_science_message.rfind('o')
13
food_science_message.find('x')
-1

Here we do not get an error anymore, just -1 to indicate that the substring was not found.

8.1.3.2. Counting occurrences#

To see how many times a substring appears in a string we can use the count() method that we have seen in lists and tuples as well.

food_science_message.count('el')
1

We see that el appears only once in the Welcome to Food Science! string.

food_science_message.count('e')
4

While the substring e appears 4 times.

8.1.3.3. Replacing substrings#

Important

All the methods discussed in the subsequent sections work on deep copies of the original string. This means that the changes introduced by the methods do not modify the original string, instead they just return a modified copy of the original string.

Python provides the replace() string method that allows us to replace a substring with another one.

food_science_message_1 = food_science_message.replace(' ', ', ')
food_science_message_1
'Welcome, to, Food, Science!'

This method replaces every occurrence of the first substring with the second substring provided as argument.

8.1.4. Other string methods#

8.1.4.1. Joining strings#

To join two or more strings we can either use the concatenation operator (+) or the join() method.

welcome = 'Welcome'
to = 'to'
food_science = 'Food Science!'
message_1 = welcome + ' ' + to + ' '+ food_science
message_1
'Welcome to Food Science!'
message_2 = ' '.join([welcome, to, food_science])
message_2
'Welcome to Food Science!'

The join() method is more multi-purpose. In our case, it concatenates all strings we provided as a list together by adding the white space character that we have called the method on. In the place of the white space we can use any other character:

', '.join([welcome, to, food_science])
'Welcome, to, Food Science!'

As you can see now, instead of space the words are separated by a comma (,) and a space.

Tip

Ever wondered what happens if you multiply a string with a number? Check here:

'*'*5
'*****'

8.1.4.2. Splitting strings#

8.1.4.2.1. split()#

You can split two strings using the split() method. It takes as an argument the character that will be used to split a string. It returns a list of the substrings of the string that are separated by the selected character. To better understand this let us look at an example:

tokens_1 = food_science_message.split(' ')
tokens_1
['Welcome', 'to', 'Food', 'Science!']

In this example, the splitting character was the space ' '. The method goes over the string and cuts it at all places where it finds a space. All the remaining parts are put in a list afterwards. In our case there were 3 spaces, as a result we have 4 tokens in the list, which are substrings of the original string.

We can also call split() with two arguments, where the second one is an integer. In this case, the integer specifies the number of times that we are going to cut the original string in the specified split character. For example:

tokens_2 = food_science_message.split(' ', 2)
tokens_2
['Welcome', 'to', 'Food Science!']

This time, we were interested in splitting only the parts of the first two spaces. The remaining substring will form a whole part of its own. The first and second spaces occurred after Welcome and to, so they are split and put into the list as separate elements. Since the method already found the 2 spaces we were looking for, it will stop and leave the other part as a whole and put it into the list.

8.1.4.2.2. partition() and rpartition()#

These methods have a similar function to the split() method, but they return a tuple with 3 elements instead. Again we provide our split character, based on which the split will happen, and the method returns a tuple that contains: the part of the string before the split, the split character itself and the part after the split character.

food_science_message.partition(' ')
('Welcome', ' ', 'to Food Science!')

As you can see, it finds the first occurrence of the splitting character and makes the split there.

A more useful case would be probably when we have to separate numerical content from text, like here:

name_and_string = 'John - 25 years'
name_and_string.partition(' - ')
('John', ' - ', '25 years')

rpartition() does the same but starts the search from the end of the string.

food_science_message.rpartition(' ')
('Welcome to Food', ' ', 'Science!')

Whenever the splitting character is not present in the string we are trying to split, then the string will not be split, however a tuple with three elements will still be returned. Depending on whether we used partition() or rpartition() it will contain an empty character '' in the last two or the first two entries.

food_science_message.partition('hello')
('Welcome to Food Science!', '', '')
food_science_message.rpartition('hello')
('', '', 'Welcome to Food Science!')

8.1.4.2.3. splitlines()#

This method splits large amounts of text, stored in strings, into a list of lines. This is done by splitting based on newline characters (\n).

text = """To be, or not to be, that is the question: 
Whether tis nobler in the mind to suffer 
The slings and arrows of outrageous fortune,
Or to take Arms against a Sea of troubles,
And by opposing end them: to die, to sleep
No more; and by a sleep, to say we end."""
text.splitlines()
['To be, or not to be, that is the question: ',
 'Whether tis nobler in the mind to suffer ',
 'The slings and arrows of outrageous fortune,',
 'Or to take Arms against a Sea of troubles,',
 'And by opposing end them: to die, to sleep',
 'No more; and by a sleep, to say we end.']

As you can see, the text is split in lines and each line is put in a list as a separate element.

8.1.4.3. strip(), lstrip() and rstrip()#

These methods are used to remove white space (be it a simple space or a tab) from a string. The strip() method removes both the leading and trailing whitespace from a string. rstrip() removes only the trailing spaces while lstrip() only the leading spaces.

food_science_message = '   Welcome to Food Science!   '
food_science_message
'   Welcome to Food Science!   '
food_science_message.strip()
'Welcome to Food Science!'

The strip() method removed both the leading and trailing spaces.

print(food_science_message.lstrip()) # remove leading
print(food_science_message.rstrip()) # remove trailing
Welcome to Food Science!   
   Welcome to Food Science!

These methods can be used to remove any leading and/or trailing character, not only whitespaces (which are stripped by default when no argument is provided). Note that the argument is a string specifying the set of characters to be removed, not a substring. Hence the order and composition are irrelevant:

print(food_science_message.lstrip('l eW!#@$')) # remove leading
print(food_science_message.rstrip('! ec48n092Si'), end='!!!!') # remove trailing
come to Food Science!   
   Welcome to Food!!!!

8.1.4.4. lower(), upper(), capitalize(), title() and swapcase()#

The lower() method will convert all letters of the string in lowercase letters.

food_science_message = 'Welcome to Food Science!'
food_science_message.lower()
'welcome to food science!'

The upper() method will convert all letters of the string in uppercase letters.

food_science_message.upper()
'WELCOME TO FOOD SCIENCE!'

The capitalize() method will convert the string to sentence case. This means that only the first letter of the string will be capitalized.

food_science_message.capitalize()
'Welcome to food science!'

The title() method will capitalize the first letter of every word in a string.

food_science_message.title()
'Welcome To Food Science!'

The swapcase() method will convert all uppercase letters of the string to lowercase letters and vice-versa.

food_science_message.swapcase()
'wELCOME TO fOOD sCIENCE!'