{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lesson A3 – Working with strings" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Storing strings" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sequences of characters (so text in a broader sense) are represented as *string* objects in Python. When we put single quotes (`'`) or double quotes (`\"`) or triple double quotes (`\"\"\"`)\n", "around characters, they are recognised as strings." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "# string\n", "s = 'Hello, World!'\n", "print(type(s))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Combinig single and double quotes does also work and is in fact necessary if we want to use single or double quotes as literal characters." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Huxley's \"Brave New World\" \n" ] } ], "source": [ "print(\"\"\"Huxley's \"Brave New World\" \"\"\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Strings in triple quotes can even span multiple lines." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " You're doing a good job!\n", " Have some ice cream ...\n", "\n", " @\n", " (' .)\n", " (*.`. )\n", " (*.~.*. )\n", " \\#####/\n", " \\###/\n", " \\#/\n", " V\n", " \n" ] } ], "source": [ "print(\"\"\"\n", " You're doing a good job!\n", " Have some ice cream ...\n", "\n", " @\n", " (' .)\n", " (*.`. )\n", " (*.~.*. )\n", " \\#####/\n", " \\###/\n", " \\#/\n", " V\n", " \"\"\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Within text you can use several special character combinations, starting with a backslash (`\\`), that have a special meaning. For example, to generate a line break you can use `\\n` (n = *newline*). This is called an escape sequence.\n", "\n", "Since a backslash in a string is interpreted as the start of an escape sequence, how could you then type in a literal backslash `\"\\\"`? Simply use another backslash to escape the backslash: (`\\\\`)." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello Aldous,\n", "do you know this character: '\\'?\n" ] } ], "source": [ "print(\"Hello Aldous,\\ndo you know this character: '\\\\'?\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A string in Python is a *sequence*. That means we can access the individual\n", "elements of the sequence by indexing. We use square brackets\n", "to denote an index. Indices are zero-based, meaning in Python the indices of a sequence with $n$ elements go from 0 to $n-1$. Indices follow the general scheme:\n", "`string[start:stop:step]`." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "AGGAVAA\n", "A\n", "A\n", "AG\n", "GGAVA\n", "AGVA\n", "AAVAGGA\n" ] }, { "ename": "IndexError", "evalue": "string index out of range", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mIndexError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 7\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ms\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# print every second character\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 8\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ms\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# print every character in reversed order\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 9\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ms\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m10\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# raises IndexError\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mIndexError\u001b[0m: string index out of range" ] } ], "source": [ "s = \"AGGAVAA\" # Peptide sequence\n", "print(s) # print the whole string\n", "print(s[0]) # print the first character of the string\n", "print(s[-1]) # print the last character of the string\n", "print(s[:2]) # print the first two characters of the string\n", "print(s[1:-1]) # print the second to the last character of the string\n", "print(s[::2]) # print every second character\n", "print(s[::-1]) # print every character in reversed order\n", "print(s[10]) # raises IndexError" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**Note:** Strings are __sequences__, a collective of individual characters.\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Operations with strings" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Strings can be modified, transformed and formatted in many ways in Python. The operators `*` and `+` can be used on strings, as you might have not expected." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "========================================================================\n" ] } ], "source": [ "print(\"=\" * 72) # Use * to multiply a string" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ab\n" ] } ], "source": [ "print(\"a\" + \"b\") # Use + to join strings" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Strings separated by no operator are joined as well, which is especially\n", "useful when splitting strings over multiple lines." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 20\n", "+ 5\n", "---\n", " 25\n", "\n" ] } ], "source": [ "print(\n", " \" 20\\n\"\n", " \"+ 5\\n\"\n", " \"---\\n\"\n", " \" 25\\n\"\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## String methods" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "String objects have a lot of very useful functionalities, so called *methods* to manipulate the string. This comes in handy when you need to work with text input. In scientific computing, these string methods can be for example used to automatically process large amounts of data as the content of big text files or to handle filenames of potentially many (often hundreds) of files." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**Advanced:** Methods are function attributes attached to objects that can be\n", " accessed via dot-notation.\n", "
" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "AGGAVAA\n" ] } ], "source": [ "print(\"AGGTVAA\".replace(\"T\", \"A\"))\n", "# Substitute a substring with another string" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "GGTV\n", "GGTVAA\n", "AGGTV\n" ] } ], "source": [ "print(\"AGGTVAA\".strip(\"A\"))\n", "print(\"AGGTVAA\".lstrip(\"A\"))\n", "print(\"AGGTVAA\".rstrip(\"A\"))\n", "# Remove a substring at either end of the string" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3\n" ] } ], "source": [ "print(\"AGGTVAA\".count(\"A\"))\n", "# Occurrence of substring in string" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "aggtvaa\n" ] } ], "source": [ "print(\"AGGTVAA\".lower())\n", "# Convert to lower case" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**Note:** You can use `dir(string)` to get a list of available methods.\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## String formatting" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Whenever you want to combine strings with objects of other types, string\n", "formatting may be what you need, which means the insertion of other types into\n", "strings. The most convenient way to do this are so-called *f-strings*. Prepend\n", "a string with the letter f to make it an f-string. Such a string can\n", "be composed of characters and variable names enclosed in curly brackets. You have\n", "additional options to process the variable." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The result is 3.14\n" ] } ], "source": [ "result = 3.14159\n", "print(f\"The result is {result:.2f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can for example also *pad* numbers to a certain amount of digits with this, which can be useful if you want to have, say filenames, to be numbered consistently." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "mydata_00001.txt\n", "mydata_00002.txt\n", "...\n", "mydata_99999.txt\n" ] } ], "source": [ "n = 1\n", "print(f\"mydata_{n:0>5}.txt\")\n", "n = 2\n", "print(f\"mydata_{n:0>5}.txt\")\n", "print(\"...\")\n", "n = 99999\n", "print(f\"mydata_{n:0>5}.txt\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**Note:** These f-strings are really a nice thing. Have a look at this\n", " [guide](https://realpython.com/python-f-strings/) if you want to learn more about them.\n", "\n", "
" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.10" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": true, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": { "height": "calc(100% - 180px)", "left": "10px", "top": "150px", "width": "165px" }, "toc_section_display": true, "toc_window_display": true }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }