Lesson A5 – Data containers

We know already that we can assign single values to variables in Python. Let’s assume now we have many values to work on. Of course we can use an individual variable for all of these values.

[1]:
water = "water"
etoh = "ethanol"
dmf = "N,N-dimethylformamide"
dcm = "dichloromethane"

While this might seem fine if the number of objects we are handling is small, you probably see that this can quickly go out of hand. In particular, we have a problem when we want to access the value of a variable, that is we have to remember the variable name.

[2]:
print(water.upper())
WATER

Luckily, Python offers a variety of containers to store values together in a single object. These containers are also called collections.

Lists

Lists are probably the most commonly used type of container in Python. We make a list with square brackets instead of parentheses.

[3]:
# List elements enclosed in square brackets
list_ = [1, "water", "Hey!", 2.0]

Lists are sequences (like strings), so indexing works like we have seen it before.

[4]:
print(list_[0])     # print the first character of the string
print(list_[-1])    # print the last character of the string
print(list_[:2])    # print the first two characters of the string
print(list_[1:-1])  # print the second to the last character of the string
print(list_[::2])   # print every second character
print(list_[::-1])  # print every character in reversed order
print(list_[10])    # raises IndexError
1
2.0
[1, 'water']
['water', 'Hey!']
[1, 'Hey!']
[2.0, 'Hey!', 'water', 1]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-4-8dcb81973ad3> in <module>
      5 print(list_[::2])   # print every second character
      6 print(list_[::-1])  # print every character in reversed order
----> 7 print(list_[10])    # raises IndexError

IndexError: list index out of range

The index() method works of course, too.

[5]:
# Only the first occurrence of an element is returned
list_.index(1)
[5]:
0

The crucial thing about lists is, that they are mutable, which means we can modify the elements of an existing list. Use a list whenever you want to maintain an ordered sequence of objects that needs to be flexible about what and how much is stored.

[6]:
list_[2] += "!!"
print(list_)
[1, 'water', 'Hey!!!', 2.0]

We can also add elements to the list.

[7]:
# Add an element to the list
x = "NewElement"
list_.append(x)
print(list_)
[1, 'water', 'Hey!!!', 2.0, 'NewElement']
[8]:
# Add elements from another list to the list
tmp_list = [3, 2, 1]
list_.extend(tmp_list)
print(list_)
[1, 'water', 'Hey!!!', 2.0, 'NewElement', 3, 2, 1]

If we don’t want to add the element at the end of the list, we can use a different method.

[9]:
# Add element at first position
list_.insert(0, 1)
print(list_)
[1, 1, 'water', 'Hey!!!', 2.0, 'NewElement', 3, 2, 1]

And we can delete elements by index. Note the pop() method modifies the list in-place and returns the deleted element.

[10]:
# Remove an element from the list by index
deleted = list_.pop(0)  # deletes the first element in-place
print(list_)
print("Deleted element: ", deleted)
[1, 'water', 'Hey!!!', 2.0, 'NewElement', 3, 2, 1]
Deleted element:  1

Lists have even a sort() method directly attached (which requires, however, that the elements are sortable, i.e. comparable).

[11]:
list_.sort()
print(list_)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-11-4c75802b3e12> in <module>
----> 1 list_.sort()
      2 print(list_)

TypeError: '<' not supported between instances of 'str' and 'int'
[12]:
l = ["x", "y", "a"]
l.sort()  # In-place sorting
print(l)
['a', 'x', 'y']

Advanced: Sorting is very important and a topic on its own. More on how to sort here.

Finally, how + and the * work for lists is no surprise anymore.

[13]:
# Concatenate lists
print([1, 2, 3] + ["1", "2", "3"])
[1, 2, 3, '1', '2', '3']
[14]:
# multiply lists
print(["1"] * 10)
['1', '1', '1', '1', '1', '1', '1', '1', '1', '1']

Note: A list in Python can hold an ordered arbitrary number of objects without restriction on the type. Lists are mutable.

Dictionaries

Dictionaries are probably the most versatile type of containers we want to discuss here. What would you do, if you have elements you want to store, and you want to be able to access them not via indexing, but rather by using a unique name? You would use a dictionary! We define dictionaries in Python by using curly brackets (as for sets), and by using an identifier (a key) for every element (value).

[15]:
solvents_dict = {
    "water": 100,
    "ethanol": 78,
    "N,N-dimethylformamide": 153,
    "dichloromethane": 40,
    }
print(solvents_dict)
{'water': 100, 'ethanol': 78, 'N,N-dimethylformamide': 153, 'dichloromethane': 40}

You can immediately see, that this can be extremely useful. In dictionaries we have a mapping of variable names to stored elements. Dictionaries are not sequences, so indexing does not work. But (even better) we can access elements by their key.

[16]:
solvents_dict[0]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-16-c53a4a94cc93> in <module>
----> 1 solvents_dict[0]

KeyError: 0
[17]:
solvents_dict["water"]
[17]:
100

Dictionaries are mutable. We can add and remove key-value pairs.

[18]:
# Add a new key-value pair
solvents_dict["dimethyl sulfoxide"] = 189
print(solvents_dict)
{'water': 100, 'ethanol': 78, 'N,N-dimethylformamide': 153, 'dichloromethane': 40, 'dimethyl sulfoxide': 189}
[19]:
# Update dictiary keys from another one
solvents_dict.update({"diethyl ether": 35, "pyridine": 115})
print(solvents_dict)
{'water': 100, 'ethanol': 78, 'N,N-dimethylformamide': 153, 'dichloromethane': 40, 'dimethyl sulfoxide': 189, 'diethyl ether': 35, 'pyridine': 115}

We can use everything as a value, there is no restriction of the type. On the other hand we can only use immutable types as keys. So allowed objects for keys are for example integers, strings and tuples (see section Advanced), but lists are not allowed.

[20]:
# You can use tuples as keys
{(1, "one"): "eins",}
[20]:
{(1, 'one'): 'eins'}
[21]:
# You cannot use lists as keys
{[1, "one"]: "eins",}
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-21-76ef99c2bb3e> in <module>
      1 # You cannot use lists as keys
----> 2 {[1, "one"]: "eins",}

TypeError: unhashable type: 'list'

To extract the keys or the values (or both) from a dictionary, you can use the keys(), values(), and items() methods.

[22]:
solvents_dict.keys()
[22]:
dict_keys(['water', 'ethanol', 'N,N-dimethylformamide', 'dichloromethane', 'dimethyl sulfoxide', 'diethyl ether', 'pyridine'])
[23]:
solvents_dict.values()
[23]:
dict_values([100, 78, 153, 40, 189, 35, 115])
[24]:
solvents_dict.items()
[24]:
dict_items([('water', 100), ('ethanol', 78), ('N,N-dimethylformamide', 153), ('dichloromethane', 40), ('dimethyl sulfoxide', 189), ('diethyl ether', 35), ('pyridine', 115)])

Note that dict_keys(), dict_values(), and dict_items() are their own container types, that you can however convert (see next section below).

[25]:
type(solvents_dict.keys())
[25]:
dict_keys

Advanced

Tuples

Tuples are values enclosed by parentheses, separated by a comma.

[26]:
solvents = (
    "water",
    "ethanol",
    "N,N-dimethylformamide",
    "dichloromethane",
    )
print(solvents)
('water', 'ethanol', 'N,N-dimethylformamide', 'dichloromethane')

Tuples are sequences (like strings). That means we can access the elements of the tuple by indexing.

[27]:
print(solvents[0])     # print the first character of the string
print(solvents[-1])    # print the last character of the string
print(solvents[:2])    # print the first two characters of the string
print(solvents[1:-1])  # print the second to the last character of the string
print(solvents[::2])   # print every second character
print(solvents[::-1])  # print every character in reversed order
print(solvents[10])    # raises IndexError
water
dichloromethane
('water', 'ethanol')
('ethanol', 'N,N-dimethylformamide')
('water', 'N,N-dimethylformamide')
('dichloromethane', 'N,N-dimethylformamide', 'ethanol', 'water')
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-27-4aa2d42ec81b> in <module>
      5 print(solvents[::2])   # print every second character
      6 print(solvents[::-1])  # print every character in reversed order
----> 7 print(solvents[10])    # raises IndexError

IndexError: tuple index out of range

Indexable sequences also have a index() method. This returns the index of a queried element, if it exists in the sequence.

[28]:
solvents.index("ethanol")
[28]:
1
[29]:
solvents.index("pyridine")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-29-bc6fa6dca466> in <module>
----> 1 solvents.index("pyridine")

ValueError: tuple.index(x): x not in tuple

Tuples have no restriction on the type of objects stored in them. We can even mix objects of different types.

[30]:
solvents_bp = (  # solvents and boiling points
    "water", 100,
    "ethanol", 78,
    "N,N-dimethylformamide", 153,
    "dichloromethane", 40,
    )
print(solvents_bp)
('water', 100, 'ethanol', 78, 'N,N-dimethylformamide', 153, 'dichloromethane', 40)

Tuples are immutable, so once we have created a tuple we can not modify its elements. The number of elements and the stored values are fixed. When we try to mess with one of the elements, a type error is raised.

[31]:
solvents[0] = "benzene"
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-31-3b4fc110d65a> in <module>
----> 1 solvents[0] = "benzene"

TypeError: 'tuple' object does not support item assignment

This is a limitation that depending on your usecase can be helpful or hindering. Using a tuple is a good idea if you want to store objects whose number and value should not be changed later. For example, point coordinates in arbitrarily high dimensional spaces can be well represented by tuples.

[32]:
q = (0, 1)  # Point in 2D space

For string, which are sequences, we saw that the + and the * operator have a defined meaning. This holds true for tuples, too.

[33]:
# Concatenate tuples
print((1, 2, 3) + ("1", "2", "3"))
(1, 2, 3, '1', '2', '3')
[34]:
# multiply tuples
print(("1", ) * 10)
('1', '1', '1', '1', '1', '1', '1', '1', '1', '1')

As you can see from the last example, it is a common source of problems, to forget a trailing comma when a tuple with just one element is initialised. A single value enclosed in parentheses is not interpreted as a tuple when no comma is used. This is mainly because parentheses can also be set to group statements (like when performing arithmetics).

[35]:
# multiply tuples
print(("1") * 10)
1111111111

Note: A tuple in Python can hold an ordered arbitrary number of objects without restriction on the type and is immutable.

Sets

Yet another very handy container in Python are sets. We indicate that we want to use as set with curly brackets.

[36]:
# Set elements enclosed in curly brackets
set_ = {1, "water", "Hey!", "Hey!", 2.0, 1}
print(set_)
{'Hey!', 1, 2.0, 'water'}

What happened here? A set is fundamentally different to tuples and lists as it stores every unique element only once. Adding twice the same element to a set leaves the set unchanged. You may have also noticed that the order of the elements in the created set is not the same as in the input. You should use a set whenever you want to maintain a collection of unique objects and you do not care about the ordering. A set is not sequence. As a consequence it does not support indexing.

[37]:
set_[0]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-37-f2cbabde3797> in <module>
----> 1 set_[0]

TypeError: 'set' object does not support indexing

A set is, however, mutable. So we can add and remove elements.

[38]:
# Add an element to the set
set_.add(1)
print(set_)
{'Hey!', 1, 2.0, 'water'}
[39]:
# Add elements from a list to the set
set_.update([1, 2, 3])
print(set_)
{1, 2.0, 3, 'water', 'Hey!'}
[40]:
# Remove an element from the set
set_.remove(1)
print(set_)
{2.0, 3, 'water', 'Hey!'}
[41]:
# Discard an elememt from the set.
# Does not raise a KeyError if the element is not present
set_.discard(1)
print(set_)
{2.0, 3, 'water', 'Hey!'}

The + and the * are not defined for sets.

[42]:
{1, 2, 3} + {"1", "2", "3"}
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-42-dcfad9d889e7> in <module>
----> 1 {1, 2, 3} + {"1", "2", "3"}

TypeError: unsupported operand type(s) for +: 'set' and 'set'
[43]:
{1} * 10
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-43-38e1338db26d> in <module>
----> 1 {1} * 10

TypeError: unsupported operand type(s) for *: 'set' and 'int'

Instead, we can use the | operator to combine sets.

[44]:
# Return set with elements in set1 OR in set2
{1, 2, 3} | {"1", "2", "3"}
[44]:
{'1', 1, 2, '2', 3, '3'}

Or the & operator to get the common elements of two sets.

[45]:
# Return set with elements in set1 AND in set2
{1, 2, 3} & {"1", "2", "3"}
[45]:
set()

Subtracting a set from a another with th - operator gives the difference.

[46]:
# Return set with elements only in set 2
{1, 2, 3} - {1, 2}
[46]:
{3}

Sets also provide handy methods for different operations. For example:

[47]:
# Give me elements that are not in another set!
{1, 2}.difference({2})
[47]:
{1}

Note: A set in Python can hold an unordered arbitrary number of unique objects without restriction on the type.

Type conversion

We have seen functions before that allow the transformation of one type into another. Similar functions do exist for containers as well.

  • list()

  • tuple()

  • set()

  • dict()

Advanced: Maybe you noticed that we named our list in the list-examples list_ with a trailing underscore. This is to prevent a naming conflict with the list() function.

[48]:
# Convert to list
string_list = list("chloroform")
print(string_list)
['c', 'h', 'l', 'o', 'r', 'o', 'f', 'o', 'r', 'm']
[49]:
# Convert to tuple
string_tuple = tuple(string_list)
print(string_tuple)
('c', 'h', 'l', 'o', 'r', 'o', 'f', 'o', 'r', 'm')
[50]:
# Convert to set
string_set = set(string_tuple)
print(string_set)
{'o', 'm', 'l', 'r', 'f', 'c', 'h'}
[51]:
# Convert to dictionary
string_dict = dict(zip(string_set, string_set))
print(string_dict)
{'o': 'o', 'm': 'm', 'l': 'l', 'r': 'r', 'f': 'f', 'c': 'c', 'h': 'h'}

The conversion to a dictionary may look a bit strange to you. Remember that we need actually two sequences to build a dictionary: keys and values. Precisely the dict() function expects a sequence of key-value pairs. Exactly this we get with the zip() function.

[52]:
list(zip((1, 2, 3, ), ("a", "b", "c", )))
[52]:
[(1, 'a'), (2, 'b'), (3, 'c')]