{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lesson A5 – Data containers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We know already that we can assign single values to variables in Python. Let's assume now we have many values to work on. Of course we can use an individual variable for all of these values." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "water = \"water\"\n", "etoh = \"ethanol\"\n", "dmf = \"N,N-dimethylformamide\"\n", "dcm = \"dichloromethane\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While this might seem fine if the number of objects we are handling is small,\n", "you probably see that this can quickly go out of hand. In particular, we have\n", "a problem when we want to access the value of a variable, that is we have to\n", "remember the variable name." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "WATER\n" ] } ], "source": [ "print(water.upper())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Luckily, Python offers a variety of *containers* to store values together \n", "in a single object. These containers are also called *collections*." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lists" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lists are probably the most commonly used type of container in Python. We make a list with square brackets instead of parentheses." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# List elements enclosed in square brackets\n", "list_ = [1, \"water\", \"Hey!\", 2.0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lists are sequences (like strings), so indexing works like we have seen it before." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1\n", "2.0\n", "[1, 'water']\n", "['water', 'Hey!']\n", "[1, 'Hey!']\n", "[2.0, 'Hey!', 'water', 1]\n" ] }, { "ename": "IndexError", "evalue": "list index out of range", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mIndexError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlist_\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# print every second character\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlist_\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# print every character in reversed order\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 7\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlist_\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m10\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# raises IndexError\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mIndexError\u001b[0m: list index out of range" ] } ], "source": [ "print(list_[0]) # print the first character of the string\n", "print(list_[-1]) # print the last character of the string\n", "print(list_[:2]) # print the first two characters of the string\n", "print(list_[1:-1]) # print the second to the last character of the string\n", "print(list_[::2]) # print every second character\n", "print(list_[::-1]) # print every character in reversed order\n", "print(list_[10]) # raises IndexError" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `index()` method works of course, too." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Only the first occurrence of an element is returned\n", "list_.index(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The crucial thing about lists is, that they are *mutable*, which means we can modify the elements of an existing list. Use a list whenever you want to maintain an ordered sequence of objects that needs to be flexible about what and how much is stored." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1, 'water', 'Hey!!!', 2.0]\n" ] } ], "source": [ "list_[2] += \"!!\"\n", "print(list_)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also add elements to the list." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1, 'water', 'Hey!!!', 2.0, 'NewElement']\n" ] } ], "source": [ "# Add an element to the list\n", "x = \"NewElement\"\n", "list_.append(x)\n", "print(list_)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1, 'water', 'Hey!!!', 2.0, 'NewElement', 3, 2, 1]\n" ] } ], "source": [ "# Add elements from another list to the list\n", "tmp_list = [3, 2, 1]\n", "list_.extend(tmp_list)\n", "print(list_)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we don't want to add the element at the end of the list, we can use a different method." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1, 1, 'water', 'Hey!!!', 2.0, 'NewElement', 3, 2, 1]\n" ] } ], "source": [ "# Add element at first position\n", "list_.insert(0, 1)\n", "print(list_)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And we can delete elements by index. Note the `pop()` method modifies the list in-place and returns the deleted element." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1, 'water', 'Hey!!!', 2.0, 'NewElement', 3, 2, 1]\n", "Deleted element: 1\n" ] } ], "source": [ "# Remove an element from the list by index\n", "deleted = list_.pop(0) # deletes the first element in-place\n", "print(list_)\n", "print(\"Deleted element: \", deleted)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lists have even a `sort()` method directly attached (which requires, however, that the elements are sortable, i.e. comparable)." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "ename": "TypeError", "evalue": "'<' not supported between instances of 'str' and 'int'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mlist_\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msort\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlist_\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mTypeError\u001b[0m: '<' not supported between instances of 'str' and 'int'" ] } ], "source": [ "list_.sort()\n", "print(list_)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['a', 'x', 'y']\n" ] } ], "source": [ "l = [\"x\", \"y\", \"a\"]\n", "l.sort() # In-place sorting\n", "print(l)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**Advanced:** Sorting is very important and a topic on its own. More on how to sort [here](https://docs.python.org/3/howto/sorting.html).\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, how `+` and the `*` work for lists is no surprise anymore." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1, 2, 3, '1', '2', '3']\n" ] } ], "source": [ "# Concatenate lists\n", "print([1, 2, 3] + [\"1\", \"2\", \"3\"])" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['1', '1', '1', '1', '1', '1', '1', '1', '1', '1']\n" ] } ], "source": [ "# multiply lists\n", "print([\"1\"] * 10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**Note:** A list in Python can hold an __ordered__ arbitrary number of objects without restriction on the type. Lists are __mutable__.\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dictionaries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Dictionaries are probably the most versatile type of containers we want to discuss here. What would you do, if you have elements you want to store, and you want to be able to access them not via indexing, but rather by using a unique name? You would use a dictionary! We define dictionaries in Python by using curly brackets (as for sets), and by using an identifier (a key) for every element (value)." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'water': 100, 'ethanol': 78, 'N,N-dimethylformamide': 153, 'dichloromethane': 40}\n" ] } ], "source": [ "solvents_dict = {\n", " \"water\": 100,\n", " \"ethanol\": 78,\n", " \"N,N-dimethylformamide\": 153,\n", " \"dichloromethane\": 40,\n", " }\n", "print(solvents_dict)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can immediately see, that this can be extremely useful. In dictionaries we have a mapping of variable names to stored elements. Dictionaries are *not* sequences, so indexing does not work. But (even better) we can access elements by their key." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "ename": "KeyError", "evalue": "0", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0msolvents_dict\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mKeyError\u001b[0m: 0" ] } ], "source": [ "solvents_dict[0]" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "100" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "solvents_dict[\"water\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Dictionaries are mutable. We can add and remove key-value pairs." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'water': 100, 'ethanol': 78, 'N,N-dimethylformamide': 153, 'dichloromethane': 40, 'dimethyl sulfoxide': 189}\n" ] } ], "source": [ "# Add a new key-value pair\n", "solvents_dict[\"dimethyl sulfoxide\"] = 189\n", "print(solvents_dict)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'water': 100, 'ethanol': 78, 'N,N-dimethylformamide': 153, 'dichloromethane': 40, 'dimethyl sulfoxide': 189, 'diethyl ether': 35, 'pyridine': 115}\n" ] } ], "source": [ "# Update dictiary keys from another one\n", "solvents_dict.update({\"diethyl ether\": 35, \"pyridine\": 115})\n", "print(solvents_dict)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use everything as a value, there is no restriction of the type.\n", "On the other hand we can only use immutable types as keys. So allowed objects\n", "for keys are for example integers, strings and tuples (see section Advanced), but lists are not allowed." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{(1, 'one'): 'eins'}" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# You can use tuples as keys\n", "{(1, \"one\"): \"eins\",}" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "ename": "TypeError", "evalue": "unhashable type: 'list'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# You cannot use lists as keys\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0;34m{\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"one\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m\"eins\"\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m}\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: unhashable type: 'list'" ] } ], "source": [ "# You cannot use lists as keys\n", "{[1, \"one\"]: \"eins\",}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To extract the keys or the values (or both) from a dictionary, you can use the `keys()`, `values()`, and `items()` methods." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_keys(['water', 'ethanol', 'N,N-dimethylformamide', 'dichloromethane', 'dimethyl sulfoxide', 'diethyl ether', 'pyridine'])" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "solvents_dict.keys()" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_values([100, 78, 153, 40, 189, 35, 115])" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "solvents_dict.values()" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_items([('water', 100), ('ethanol', 78), ('N,N-dimethylformamide', 153), ('dichloromethane', 40), ('dimethyl sulfoxide', 189), ('diethyl ether', 35), ('pyridine', 115)])" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "solvents_dict.items()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that `dict_keys()`, `dict_values()`, and `dict_items()` are their own container types, that you can however convert (see next section below)." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_keys" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(solvents_dict.keys())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Advanced" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tuples" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Tuples are values enclosed by parentheses, separated by a comma." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('water', 'ethanol', 'N,N-dimethylformamide', 'dichloromethane')\n" ] } ], "source": [ "solvents = (\n", " \"water\", \n", " \"ethanol\",\n", " \"N,N-dimethylformamide\",\n", " \"dichloromethane\",\n", " )\n", "print(solvents)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Tuples are *sequences* (like strings). That means we can access the elements\n", "of the tuple by indexing. " ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "water\n", "dichloromethane\n", "('water', 'ethanol')\n", "('ethanol', 'N,N-dimethylformamide')\n", "('water', 'N,N-dimethylformamide')\n", "('dichloromethane', 'N,N-dimethylformamide', 'ethanol', 'water')\n" ] }, { "ename": "IndexError", "evalue": "tuple index out of range", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mIndexError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msolvents\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# print every second character\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msolvents\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# print every character in reversed order\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 7\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msolvents\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m10\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# raises IndexError\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mIndexError\u001b[0m: tuple index out of range" ] } ], "source": [ "print(solvents[0]) # print the first character of the string\n", "print(solvents[-1]) # print the last character of the string\n", "print(solvents[:2]) # print the first two characters of the string\n", "print(solvents[1:-1]) # print the second to the last character of the string\n", "print(solvents[::2]) # print every second character\n", "print(solvents[::-1]) # print every character in reversed order\n", "print(solvents[10]) # raises IndexError" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Indexable sequences also have a `index()` method. This returns the index of a queried element, if it exists in the sequence." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "solvents.index(\"ethanol\")" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "ename": "ValueError", "evalue": "tuple.index(x): x not in tuple", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0msolvents\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mindex\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"pyridine\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mValueError\u001b[0m: tuple.index(x): x not in tuple" ] } ], "source": [ "solvents.index(\"pyridine\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Tuples have no restriction on the type of objects stored in them. We can even\n", "mix objects of different types." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('water', 100, 'ethanol', 78, 'N,N-dimethylformamide', 153, 'dichloromethane', 40)\n" ] } ], "source": [ "solvents_bp = ( # solvents and boiling points\n", " \"water\", 100,\n", " \"ethanol\", 78,\n", " \"N,N-dimethylformamide\", 153,\n", " \"dichloromethane\", 40,\n", " )\n", "print(solvents_bp)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Tuples are *immutable*, so once we have created a tuple we can not modify its\n", "elements. The number of elements and the stored values are fixed. When we try to\n", "mess with one of the elements, a type error is raised." ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "scrolled": true }, "outputs": [ { "ename": "TypeError", "evalue": "'tuple' object does not support item assignment", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0msolvents\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m\"benzene\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: 'tuple' object does not support item assignment" ] } ], "source": [ "solvents[0] = \"benzene\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a limitation that depending on your usecase can be helpful or hindering.\n", "Using a tuple is a good idea if you want to store objects whose number and value\n", "should not be changed later. For example, point coordinates in arbitrarily high\n", "dimensional spaces can be well represented by tuples." ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "q = (0, 1) # Point in 2D space" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For string, which are sequences, we saw that the `+` and the `*` operator have a defined meaning. This holds true for tuples, too." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(1, 2, 3, '1', '2', '3')\n" ] } ], "source": [ "# Concatenate tuples\n", "print((1, 2, 3) + (\"1\", \"2\", \"3\"))" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('1', '1', '1', '1', '1', '1', '1', '1', '1', '1')\n" ] } ], "source": [ "# multiply tuples\n", "print((\"1\", ) * 10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see from the last example, it is a common source of problems, to forget a trailing comma when a tuple with just one element is initialised. A single value enclosed in parentheses is not interpreted as a tuple when no comma is used. This is mainly because parentheses can also be set to group statements (like when performing arithmetics)." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1111111111\n" ] } ], "source": [ "# multiply tuples\n", "print((\"1\") * 10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**Note:** A tuple in Python can hold an __ordered__ arbitrary number of objects without restriction on the type and is __immutable__.\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Yet another very handy container in Python are sets. We indicate that we want to use as set with curly brackets." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'Hey!', 1, 2.0, 'water'}\n" ] } ], "source": [ "# Set elements enclosed in curly brackets\n", "set_ = {1, \"water\", \"Hey!\", \"Hey!\", 2.0, 1}\n", "print(set_)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What happened here? A set is fundamentally different to tuples and lists as it stores every unique element only once. Adding twice the same element to a set leaves the set unchanged. You may have also noticed that the order of the elements in the created set is not the same as in the input. You should use a set whenever you want to maintain a collection of unique objects and you do not care about the ordering. A set is *not* sequence. As a consequence it does not support indexing." ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "ename": "TypeError", "evalue": "'set' object does not support indexing", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mset_\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: 'set' object does not support indexing" ] } ], "source": [ "set_[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A set is, however, mutable. So we can add and remove elements." ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'Hey!', 1, 2.0, 'water'}\n" ] } ], "source": [ "# Add an element to the set\n", "set_.add(1)\n", "print(set_)" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{1, 2.0, 3, 'water', 'Hey!'}\n" ] } ], "source": [ "# Add elements from a list to the set\n", "set_.update([1, 2, 3])\n", "print(set_)" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{2.0, 3, 'water', 'Hey!'}\n" ] } ], "source": [ "# Remove an element from the set\n", "set_.remove(1)\n", "print(set_)" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{2.0, 3, 'water', 'Hey!'}\n" ] } ], "source": [ "# Discard an elememt from the set.\n", "# Does not raise a KeyError if the element is not present\n", "set_.discard(1)\n", "print(set_)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `+` and the `*` are not defined for sets." ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "ename": "TypeError", "evalue": "unsupported operand type(s) for +: 'set' and 'set'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;34m{\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m3\u001b[0m\u001b[0;34m}\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0;34m{\u001b[0m\u001b[0;34m\"1\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"2\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"3\"\u001b[0m\u001b[0;34m}\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: unsupported operand type(s) for +: 'set' and 'set'" ] } ], "source": [ "{1, 2, 3} + {\"1\", \"2\", \"3\"}" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "ename": "TypeError", "evalue": "unsupported operand type(s) for *: 'set' and 'int'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;34m{\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m}\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0;36m10\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: unsupported operand type(s) for *: 'set' and 'int'" ] } ], "source": [ "{1} * 10" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Instead, we can use the `|` operator to combine sets." ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'1', 1, 2, '2', 3, '3'}" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Return set with elements in set1 OR in set2\n", "{1, 2, 3} | {\"1\", \"2\", \"3\"}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or the `&` operator to get the common elements of two sets." ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "set()" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Return set with elements in set1 AND in set2\n", "{1, 2, 3} & {\"1\", \"2\", \"3\"}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Subtracting a set from a another with th `-` operator gives the difference." ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{3}" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Return set with elements only in set 2\n", "{1, 2, 3} - {1, 2}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sets also provide handy methods for different operations. For example:" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{1}" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Give me elements that are not in another set!\n", "{1, 2}.difference({2})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**Note:** A set in Python can hold an __unordered__ arbitrary number of __unique__\n", " objects without restriction on the type.\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Type conversion" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have seen functions before that allow the transformation of one type into another. Similar functions do exist for containers as well.\n", "\n", " - `list()`\n", " - `tuple()`\n", " - `set()`\n", " - `dict()`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**Advanced:** Maybe you noticed that we named our list in the list-examples `list_` with a trailing underscore. This is to prevent a naming conflict with the `list()` function.\n", "\n", "
\n" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['c', 'h', 'l', 'o', 'r', 'o', 'f', 'o', 'r', 'm']\n" ] } ], "source": [ "# Convert to list\n", "string_list = list(\"chloroform\")\n", "print(string_list)" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('c', 'h', 'l', 'o', 'r', 'o', 'f', 'o', 'r', 'm')\n" ] } ], "source": [ "# Convert to tuple\n", "string_tuple = tuple(string_list)\n", "print(string_tuple)" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'o', 'm', 'l', 'r', 'f', 'c', 'h'}\n" ] } ], "source": [ "# Convert to set\n", "string_set = set(string_tuple)\n", "print(string_set)" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'o': 'o', 'm': 'm', 'l': 'l', 'r': 'r', 'f': 'f', 'c': 'c', 'h': 'h'}\n" ] } ], "source": [ "# Convert to dictionary\n", "string_dict = dict(zip(string_set, string_set))\n", "print(string_dict)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The conversion to a dictionary may look a bit strange to you. Remember that we need actually two sequences to build a dictionary: keys and values. Precisely the `dict()` function expects a sequence of key-value pairs. Exactly this we get with the `zip()` function." ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "run_control": { "marked": false } }, "outputs": [ { "data": { "text/plain": [ "[(1, 'a'), (2, 'b'), (3, 'c')]" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(zip((1, 2, 3, ), (\"a\", \"b\", \"c\", )))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.10" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": true, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": { "height": "calc(100% - 180px)", "left": "10px", "top": "150px", "width": "165px" }, "toc_section_display": true, "toc_window_display": true }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }