Lesson A3 – Working with strings

Storing strings

Sequences of characters (so text in a broader sense) are represented as string objects in Python. When we put single quotes (') or double quotes (") or triple double quotes (""") around characters, they are recognised as strings.

[1]:
# string
s = 'Hello, World!'
print(type(s))
<class 'str'>

Combinig single and double quotes does also work and is in fact necessary if we want to use single or double quotes as literal characters.

[2]:
print("""Huxley's "Brave New World" """)
Huxley's "Brave New World"

Strings in triple quotes can even span multiple lines.

[3]:
print("""
    You're doing a good job!
        Have some ice cream ...

        @
      (' .)
     (*.`. )
    (*.~.*. )
     \#####/
      \###/
       \#/
        V
      """)

    You're doing a good job!
        Have some ice cream ...

        @
      (' .)
     (*.`. )
    (*.~.*. )
     \#####/
      \###/
       \#/
        V

Within text you can use several special character combinations, starting with a backslash (\), that have a special meaning. For example, to generate a line break you can use \n (n = newline). This is called an escape sequence.

Since a backslash in a string is interpreted as the start of an escape sequence, how could you then type in a literal backslash "\"? Simply use another backslash to escape the backslash: (\\).

[4]:
print("Hello Aldous,\ndo you know this character: '\\'?")
Hello Aldous,
do you know this character: '\'?

A string in Python is a sequence. That means we can access the individual elements of the sequence by indexing. We use square brackets to denote an index. Indices are zero-based, meaning in Python the indices of a sequence with \(n\) elements go from 0 to \(n-1\). Indices follow the general scheme: string[start:stop:step].

[5]:
s = "AGGAVAA"   # Peptide sequence
print(s)        # print the whole string
print(s[0])     # print the first character of the string
print(s[-1])    # print the last character of the string
print(s[:2])    # print the first two characters of the string
print(s[1:-1])  # print the second to the last character of the string
print(s[::2])   # print every second character
print(s[::-1])  # print every character in reversed order
print(s[10])    # raises IndexError
AGGAVAA
A
A
AG
GGAVA
AGVA
AAVAGGA
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-5-806a4d387826> in <module>
      7 print(s[::2])   # print every second character
      8 print(s[::-1])  # print every character in reversed order
----> 9 print(s[10])    # raises IndexError

IndexError: string index out of range

Note: Strings are sequences, a collective of individual characters.

Operations with strings

Strings can be modified, transformed and formatted in many ways in Python. The operators * and + can be used on strings, as you might have not expected.

[6]:
print("=" * 72)  # Use * to multiply a string
========================================================================
[7]:
print("a" + "b")  # Use + to join strings
ab

Strings separated by no operator are joined as well, which is especially useful when splitting strings over multiple lines.

[8]:
print(
    " 20\n"
    "+ 5\n"
    "---\n"
    " 25\n"
    )
 20
+ 5
---
 25

String methods

String objects have a lot of very useful functionalities, so called methods to manipulate the string. This comes in handy when you need to work with text input. In scientific computing, these string methods can be for example used to automatically process large amounts of data as the content of big text files or to handle filenames of potentially many (often hundreds) of files.

Advanced: Methods are function attributes attached to objects that can be accessed via dot-notation.

[9]:
print("AGGTVAA".replace("T", "A"))
# Substitute a substring with another string
AGGAVAA
[10]:
print("AGGTVAA".strip("A"))
print("AGGTVAA".lstrip("A"))
print("AGGTVAA".rstrip("A"))
# Remove a substring at either end of the string
GGTV
GGTVAA
AGGTV
[11]:
print("AGGTVAA".count("A"))
# Occurrence of substring in string
3
[12]:
print("AGGTVAA".lower())
# Convert to lower case
aggtvaa

Note: You can use dir(string) to get a list of available methods.

String formatting

Whenever you want to combine strings with objects of other types, string formatting may be what you need, which means the insertion of other types into strings. The most convenient way to do this are so-called f-strings. Prepend a string with the letter f to make it an f-string. Such a string can be composed of characters and variable names enclosed in curly brackets. You have additional options to process the variable.

[13]:
result = 3.14159
print(f"The result is {result:.2f}")
The result is 3.14

You can for example also pad numbers to a certain amount of digits with this, which can be useful if you want to have, say filenames, to be numbered consistently.

[14]:
n = 1
print(f"mydata_{n:0>5}.txt")
n = 2
print(f"mydata_{n:0>5}.txt")
print("...")
n = 99999
print(f"mydata_{n:0>5}.txt")
mydata_00001.txt
mydata_00002.txt
...
mydata_99999.txt

Note: These f-strings are really a nice thing. Have a look at this guide if you want to learn more about them.