{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lesson A11 – NumPy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This tutorial will focus on the NumPy module. NumPy is fundamental for scientific computing and will provide the basis for our everyday work, as well as offer advanced functionalities for numerical operations." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Refer also to the NumPy [Quickstart](https://docs.scipy.org/doc/numpy/user/quickstart.html) tutorial online or to\n", "\n", "\"Travis E. Oliphant. __2015__. *Guide to NumPy* (2nd. ed.). CreateSpace Independent Publishing Platform, North Charleston, SC, USA.\"\n", "\n", "if you prefer books." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To import the module run the cell below. NumPy is a third-party Python module that you might need to install first (e.g. using the conda package manager with `conda install numpy`)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2021-05-06T06:24:08.869224Z", "start_time": "2021-05-06T06:24:08.856739Z" } }, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now call any function `x` implemented into NumPy with `np.x()`. Lets generate some lists to get started. You should aready know about lists. Recall that a list in Python is a mutable, ordered collection of objects with no restriction on the type." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2021-05-06T06:24:08.891981Z", "start_time": "2021-05-06T06:24:08.871528Z" } }, "outputs": [], "source": [ "list1 = [1, 3, 5]\n", "list2 = [0.0, 2.0, 4.0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For numerical operations, this data type is of limited use." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Arrays basics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We want to turn these lists into *NumPy arrays*. Arrays are a type of collection better suited for certain tasks and are essential for our day-to-day-work. They offer a lot of functionality. To convert these lists to arrays we run:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2021-05-06T06:24:08.906625Z", "start_time": "2021-05-06T06:24:08.895167Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "A = [1 3 5] \n", "B = [0. 2. 4.] \n" ] } ], "source": [ "A = np.asarray(list1)\n", "B = np.asarray(list2)\n", "print('A = ', A, type(A))\n", "print('B = ', B, type(B))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We could have also created a NumPy array directly." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "ExecuteTime": { "end_time": "2021-05-06T06:24:08.917838Z", "start_time": "2021-05-06T06:24:08.909362Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "A = [1 3 5] \n" ] } ], "source": [ "A = np.array([1, 3, 5])\n", "print('A = ', A, type(A))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On a first glance, the arrays we created look like lists when printed, but they are in fact very different.\n", "The type of objects collected in an array is constrained to one specific type. We can discover the data type associated with an array from the `dtype` array-attribute." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecuteTime": { "end_time": "2021-05-06T06:24:08.929590Z", "start_time": "2021-05-06T06:24:08.920244Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Type stored in A: [1 3 5] int64\n", "Type stored in B: [0. 2. 4.] float64\n" ] } ], "source": [ "print('Type stored in A: ', A, A.dtype)\n", "print('Type stored in B: ', B, B.dtype)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**Note:** Arrays are mutable, ordered sequences in which all stored objects\n", " need to have the same type.\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**Advanced:** The Python standard library also offers a type of collection\n", " called array. These arrays are similar to the arrays from the NumPy module,\n", " with respect to the data type restriction, but they are handled\n", " differently. You can use the array module with `import array`. Do not\n", " confuse arrays of the type `numpy.ndarray` with those of the type\n", " `array.array`.\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Because of the restriction to only one type, lists and other sequences can only\n", "be converted to NumPy arrays straight away if they fulfill this criterion. Otherwise the objects they contain are transformed into a common basic type if possible." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "ExecuteTime": { "end_time": "2021-05-06T06:24:08.947145Z", "start_time": "2021-05-06T06:24:08.932010Z" } }, "outputs": [ { "data": { "text/plain": [ "array(['2', 'True', 'x'], dtype='\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[0;32m 6\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mA\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;36m2\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m)\u001b[0m \u001b[1;31m# print every second character\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 7\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mA\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m-\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m)\u001b[0m \u001b[1;31m# print every character in reversed order\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 8\u001b[1;33m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mA\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m10\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m)\u001b[0m \u001b[1;31m# raises IndexError\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[1;31mIndexError\u001b[0m: index 10 is out of bounds for axis 0 with size 3" ] } ], "source": [ "print(A) # print the whole string\n", "print(A[0]) # print the first character of the string\n", "print(A[-1]) # print the last character of the string\n", "print(A[:2]) # print the first two characters of the string\n", "print(A[1:-1]) # print the second to the last character of the string\n", "print(A[::2]) # print every second character\n", "print(A[::-1]) # print every character in reversed order\n", "print(A[10]) # raises IndexError" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The indexing capabilities of arrays go, however, even further. We can for example use another sequence (e.g. another array or list)\n", "to specifiy the indices of elements we want to get." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "ExecuteTime": { "end_time": "2021-05-06T06:24:19.774967Z", "start_time": "2021-05-06T06:24:19.769032Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[3 7]\n" ] } ], "source": [ "# Using a sequence of indices to index an array\n", "A = np.array([1, 3, 5, 7]) # Array of elements\n", "B = [1, 3] # List of indices\n", "print(A[B]) # Get all elements in A with indices given in B" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is not working for lists for example." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "ExecuteTime": { "end_time": "2021-05-06T06:24:20.516082Z", "start_time": "2021-05-06T06:24:20.504970Z" } }, "outputs": [ { "ename": "TypeError", "evalue": "list indices must be integers or slices, not list", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[0;32m 2\u001b[0m \u001b[0mA\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;33m[\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;36m3\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;36m5\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;36m7\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;31m# List of elements\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 3\u001b[0m \u001b[0mB\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;33m[\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;36m3\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;31m# List of indices\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 4\u001b[1;33m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mA\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mB\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m)\u001b[0m \u001b[1;31m# Can't get elements in A with indices given in B\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[1;31mTypeError\u001b[0m: list indices must be integers or slices, not list" ] } ], "source": [ "# Trying to use a sequence of indices to index a list\n", "A = [1, 3, 5, 7] # List of elements\n", "B = [1, 3] # List of indices\n", "print(A[B]) # Can't get elements in A with indices given in B" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Also very useful, we can use a sequence of objects of the boolean type (`True` and `False`)\n", "with the same length as the array we want to index to get only specific values." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "ExecuteTime": { "end_time": "2021-05-06T06:24:20.854295Z", "start_time": "2021-05-06T06:24:20.848088Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1 3 7]\n" ] } ], "source": [ "# Using a boolean sequence of indices to index an array\n", "A = np.asarray([1, 3, 5, 7])\n", "B = [True, True, False, True]\n", "# A and B must be of same length\n", "print(A[B])" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "ExecuteTime": { "end_time": "2021-05-06T06:24:21.036285Z", "start_time": "2021-05-06T06:24:21.024870Z" } }, "outputs": [ { "ename": "TypeError", "evalue": "list indices must be integers or slices, not list", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[0;32m 2\u001b[0m \u001b[0mA\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;33m[\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;36m3\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;36m5\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;36m7\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 3\u001b[0m \u001b[0mB\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;33m[\u001b[0m\u001b[1;32mTrue\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;32mTrue\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;32mFalse\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;32mTrue\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 4\u001b[1;33m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mA\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mB\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[1;31mTypeError\u001b[0m: list indices must be integers or slices, not list" ] } ], "source": [ "# Trying to use a boolean sequence of indices to index a list\n", "A = [1, 3, 5, 7]\n", "B = [True, True, False, True]\n", "print(A[B])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This extends to testing an array against a condition to get a boolean sequence for indexing (see also [Boolean expressions and np.where() to find them](#sec:Boolean_expr))." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "ExecuteTime": { "end_time": "2021-05-06T06:24:21.662038Z", "start_time": "2021-05-06T06:24:21.656285Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[3 5 7]\n" ] } ], "source": [ "A = np.asarray([1, 3, 5, 7])\n", "print(A[A > 1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One of the most striking advantages of arrays over other sequences, is that we can perform\n", "certain operations on all of their elements at the same time. We can add, subtract, multiply or divide scalars and the arrays. The operation will be performed element-wise." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "ExecuteTime": { "end_time": "2021-05-06T06:24:22.060039Z", "start_time": "2021-05-06T06:24:22.049941Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "A + 5.0 = [ 6. 8. 10. 12.]\n", "A - 5,0 = [-4. -2. 0. 2.]\n", "A * 5.0 = [ 5. 15. 25. 35.]\n", "A / 5.0 = [0.2 0.6 1. 1.4]\n" ] } ], "source": [ "A = np.asarray([1, 3, 5, 7])\n", "print('A + 5.0 = ', A + 5.0)\n", "print('A - 5,0 = ', A - 5.0)\n", "print('A * 5.0 = ', A * 5.0)\n", "print('A / 5.0 = ', A / 5.0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also add, subtract, multiply or divide arrays (of same length/size). The operation will be performed again element-wise." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "ExecuteTime": { "end_time": "2021-05-06T06:24:22.821734Z", "start_time": "2021-05-06T06:24:22.810819Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "A + B = [ 1. 5. 9. 15.5]\n", "A - B = [ 1. 1. 1. -1.5]\n", "A * B = [ 0. 6. 20. 59.5]\n", "A / B = [ inf 1.5 1.25 0.82352941]\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":6: RuntimeWarning: divide by zero encountered in true_divide\n", " print('A / B = ', A / B)\n" ] } ], "source": [ "A = np.asarray([1, 3, 5, 7])\n", "B = np.asarray([0, 2.0, 4.0, 8.5])\n", "print('A + B = ', A + B)\n", "print('A - B = ', A - B)\n", "print('A * B = ', A * B)\n", "print('A / B = ', A / B)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The division by zero in the last line is the reason behind the warning message (`RuntimeWarning`). Note that a division by zero outside NumPy would result in a `ZeroDivisionError`. NumPy can handle this and has a type to represent infinity (`np.inf`)." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "ExecuteTime": { "end_time": "2021-05-06T06:24:23.139584Z", "start_time": "2021-05-06T06:24:23.131913Z" } }, "outputs": [ { "ename": "ZeroDivisionError", "evalue": "division by zero", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mZeroDivisionError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[1;36m1\u001b[0m \u001b[1;33m/\u001b[0m \u001b[1;36m0\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[1;31mZeroDivisionError\u001b[0m: division by zero" ] } ], "source": [ "1 / 0" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The operations above can be translated to vector operations in linear algebra. Let's define $\\mathbf{a}$ and $\\mathbf{b}$ as the vectors corresponding to the NumPy arrays `A` and `B`.\n", "\n", "\\begin{equation*}\n", "\\mathbf{a} = \\begin{pmatrix} 1\\\\ 3\\\\ 5\\\\ 7 \\end{pmatrix},~~~~~~~\n", "\\mathbf{b} = \\begin{pmatrix} 0\\\\ 2.0\\\\ 4.0\\\\ 8.57 \\end{pmatrix}\n", "\\end{equation*}\n", "\n", "Let $\\mathbf{e} = \\mathbf{1}$ be a vector of all ones and $\\mathbf{I}$ the identity matrix.\n", "\n", "\\begin{equation*}\n", "\\mathbf{e} = \\begin{pmatrix} 1\\\\ 1\\\\ 1\\\\ 1 \\end{pmatrix},~~~~~~~\n", "\\mathbf{I} = \\begin{pmatrix} 1& 0& 0& 0\\\\ 0& 1& 0& 0\\\\ 0& 0& 1& 0\\\\ 0& 0& 0& 1 \\end{pmatrix}\n", "\\end{equation*}\n", "\n", "Then, the NumPy operations have the following algebraic expressions:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`A + B` $\\rightarrow$ $\\mathbf{a} + \\mathbf{b}$\n", "\n", "`A - B` $\\rightarrow$ $\\mathbf{a} - \\mathbf{b}$\n", "\n", "`A * 5` $\\rightarrow$ $5 \\mathbf{a}$\n", "\n", "`A / 5` $\\rightarrow$ $\\frac{1}{5} \\mathbf{a}$\n", "\n", "`A + 5` $\\rightarrow$ $\\mathbf{a} + 5\\cdot\\mathbf{e}$\n", "\n", "`A - 5` $\\rightarrow$ $\\mathbf{a} - 5\\cdot\\mathbf{e}$\n", "\n", "`A * B` $\\rightarrow$ $\\mathbf{a} \\cdot \\mathbf{I} \\cdot \\mathbf{b}$\n", "\n", "`A / B` $\\rightarrow$ $\\mathbf{a} \\cdot (\\mathbf{I} \\cdot \\mathbf{b})^{-1}$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With $(\\mathbf{I} \\cdot \\mathbf{b})^{-1}$ being the inverse of $\\mathbf{I} \\cdot \\mathbf{b}$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**Note:** Be careful, `A * B` is not the scalar product $\\mathbf{a} \\cdot \\mathbf{b}$. This you get with `np.dot(A, B)` or `A @ B`.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Arrays come with many handy methods like `sum()`, `mean()`, `std()` etc. Here are some examples." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**Note:** Remember that you can get an overview over all these methods by using\n", " `dir(object)`. You can also explore them by typing `\"A.\"` and then pressing\n", " tab to use the interactive help in a Jupyter notebook.\n", "
" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "ExecuteTime": { "end_time": "2021-05-06T06:24:24.306038Z", "start_time": "2021-05-06T06:24:24.299000Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "A.sum() = 16\n", "A.max() = 7\n", "A.argmax() = 3\n" ] } ], "source": [ "print(\"A.sum() = \", A.sum())\n", "# The largest Entry of the Array\n", "print(\"A.max() = \", A.max())\n", "# The index of the largest entry\n", "print(\"A.argmax() = \", A.argmax())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Arrays can be multidimensional. This is how you represent matrices in NumPy. The dimensionality of an array can be viewed with the `shape` property. We are going to make a 2D-array `C`, which will be the array made from a list of lists. " ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "ExecuteTime": { "end_time": "2021-05-06T06:24:24.645082Z", "start_time": "2021-05-06T06:24:24.632288Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "C = \n", "[[1. 3. 5.]\n", " [0. 2. 4.]]\n", "\n", "\n", "C.shape (rows, columns) = (2, 3)\n", "\n", "\n", "C[0] = [1. 3. 5.]\n", "C[1] = [0. 2. 4.]\n", "C[:, 0] = [1. 0.]\n" ] } ], "source": [ "list1 = [1, 3, 5]\n", "list2 = [0.0, 2.0, 4.0]\n", "\n", "C = np.array([\n", " list1,\n", " list2,\n", " ])\n", "\n", "print(\"C = \")\n", "print(C)\n", "print(\"\\n\")\n", "print('C.shape (rows, columns) = ', C.shape)\n", "print(\"\\n\")\n", "print('C[0] = ', C[0])\n", "print('C[1] = ', C[1])\n", "print('C[:, 0] = ', C[:, 0]) # [rowindex, columnindex]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that this is fundamentally different to having a nested list of lists. " ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "ExecuteTime": { "end_time": "2021-05-06T06:24:24.973263Z", "start_time": "2021-05-06T06:24:24.963422Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "nested_list[0][0] (nested index) => 1\n" ] }, { "ename": "TypeError", "evalue": "list indices must be integers or slices, not tuple", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[0;32m 5\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 6\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"nested_list[0][0] (nested index) =>\"\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mnested_list\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m0\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m0\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 7\u001b[1;33m \u001b[0mnested_list\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;36m0\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;31m# raises TypeError\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[1;31mTypeError\u001b[0m: list indices must be integers or slices, not tuple" ] } ], "source": [ "nested_list = [\n", " list1,\n", " list2,\n", " ]\n", "\n", "print(\"nested_list[0][0] (nested index) =>\", nested_list[0][0])\n", "nested_list[:, 0] # raises TypeError" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you depend on fast row- or column-wise indexing, e.g. when working with matrices,\n", "NumPy arrays can make your life much easier. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Boolean expressions and np.where() to find them" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another essential function that deserves extra mentioning is `np.where()`. In the most simple case, it takes a boolean sequence, i.e. a sequence of `True` and `False`, as an input and returns the indices where the sequence is *true*." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "ExecuteTime": { "end_time": "2021-05-06T06:24:25.665788Z", "start_time": "2021-05-06T06:24:25.660079Z" }, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on function where in module numpy:\n", "\n", "where(...)\n", " where(condition, [x, y])\n", " \n", " Return elements chosen from `x` or `y` depending on `condition`.\n", " \n", " .. note::\n", " When only `condition` is provided, this function is a shorthand for\n", " ``np.asarray(condition).nonzero()``. Using `nonzero` directly should be\n", " preferred, as it behaves correctly for subclasses. The rest of this\n", " documentation covers only the case where all three arguments are\n", " provided.\n", " \n", " Parameters\n", " ----------\n", " condition : array_like, bool\n", " Where True, yield `x`, otherwise yield `y`.\n", " x, y : array_like\n", " Values from which to choose. `x`, `y` and `condition` need to be\n", " broadcastable to some shape.\n", " \n", " Returns\n", " -------\n", " out : ndarray\n", " An array with elements from `x` where `condition` is True, and elements\n", " from `y` elsewhere.\n", " \n", " See Also\n", " --------\n", " choose\n", " nonzero : The function that is called when x and y are omitted\n", " \n", " Notes\n", " -----\n", " If all the arrays are 1-D, `where` is equivalent to::\n", " \n", " [xv if c else yv\n", " for c, xv, yv in zip(condition, x, y)]\n", " \n", " Examples\n", " --------\n", " >>> a = np.arange(10)\n", " >>> a\n", " array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])\n", " >>> np.where(a < 5, a, 10*a)\n", " array([ 0, 1, 2, 3, 4, 50, 60, 70, 80, 90])\n", " \n", " This can be used on multidimensional arrays too:\n", " \n", " >>> np.where([[True, False], [True, True]],\n", " ... [[1, 2], [3, 4]],\n", " ... [[9, 8], [7, 6]])\n", " array([[1, 8],\n", " [3, 4]])\n", " \n", " The shapes of x, y, and the condition are broadcast together:\n", " \n", " >>> x, y = np.ogrid[:3, :4]\n", " >>> np.where(x < y, x, 10 + y) # both x and 10+y are broadcast\n", " array([[10, 0, 0, 0],\n", " [10, 11, 1, 1],\n", " [10, 11, 12, 2]])\n", " \n", " >>> a = np.array([[0, 1, 2],\n", " ... [0, 2, 4],\n", " ... [0, 3, 6]])\n", " >>> np.where(a < 4, a, -1) # -1 is broadcast\n", " array([[ 0, 1, 2],\n", " [ 0, 2, -1],\n", " [ 0, 3, -1]])\n", "\n" ] } ], "source": [ "help(np.where)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "ExecuteTime": { "end_time": "2021-05-06T06:24:25.830762Z", "start_time": "2021-05-06T06:24:25.824566Z" } }, "outputs": [ { "data": { "text/plain": [ "(array([0, 2]),)" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.where([True, False, True]) # returns (x-indices, y-indices)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is most often used when the boolean sequence is derived from a boolean expression involving an array. " ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "ExecuteTime": { "end_time": "2021-05-06T06:24:26.180571Z", "start_time": "2021-05-06T06:24:26.173923Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[False True False False]\n", "(array([1]),)\n" ] } ], "source": [ "A = np.array([1, 3, 5, 7])\n", "print(A == 3)\n", "print(np.where(A == 3)) # returns (x-indices, y-indices)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is of course also working in two dimensions. Lets see where `C` is greater than or equal to 3." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "ExecuteTime": { "end_time": "2021-05-06T06:24:26.552670Z", "start_time": "2021-05-06T06:24:26.545732Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1. 3. 5.]\n", " [0. 2. 4.]]\n" ] } ], "source": [ "list1 = [1, 3, 5]\n", "list2 = [0.0, 2.0, 4.0]\n", "\n", "C = np.array([\n", " list1,\n", " list2,\n", " ])\n", "\n", "print(C)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This can be also expressed as the corresponding matrix:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\\begin{equation*}\n", "\\mathbf{C} = \\begin{pmatrix} 1& 3& 5\\\\ 0& 2& 4 \\end{pmatrix}\n", "\\end{equation*}" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "ExecuteTime": { "end_time": "2021-05-06T06:24:27.098310Z", "start_time": "2021-05-06T06:24:27.092815Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(array([0, 0, 1]), array([1, 2, 2]))\n" ] } ], "source": [ "print(np.where(C >= 3)) # returns (x-indices, y-indices)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The result are two arrays with three indices each. We get *two arrays* because `C` is *two-dimensional*. Each array has *three entries* because in total there are *three hits*. \n", "This means that the statement is `True` for the indices `C[0, 1]`, `C[0, 2]` and `C[1, 2]`. Lets check:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "ExecuteTime": { "end_time": "2021-05-06T06:24:27.473754Z", "start_time": "2021-05-06T06:24:27.466788Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "C[0, 1] = 3.0\n", "C[0, 2] = 5.0\n", "C[1, 2] = 4.0\n" ] } ], "source": [ "# All greater than three?\n", "print(\"C[0, 1] =\", C[0, 1])\n", "print(\"C[0, 2] =\", C[0, 2]) \n", "print(\"C[1, 2] =\", C[1, 2])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since indices in Python are zero-based but they are 1-based in terms of linear algebra, we need to shift the indices by one to get the corresponding matrix elements." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$\\mathbf{C}_{12} = 3$\n", "\n", "$\\mathbf{C}_{13} = 5$\n", "\n", "$\\mathbf{C}_{23} = 4$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Advanced" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Optionally, we can pass two additional sequences to `np.where` from which values are returned\n", "instead of just the *true*-indices." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "ExecuteTime": { "end_time": "2021-05-06T06:24:28.408851Z", "start_time": "2021-05-06T06:24:28.402862Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['yes' 'no' 'yes']\n" ] } ], "source": [ "print(np.where([True, False, True], [\"yes\", \"yes\", \"yes\"], [\"no\", \"no\", \"no\"]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or simply:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "ExecuteTime": { "end_time": "2021-05-06T06:24:28.762355Z", "start_time": "2021-05-06T06:24:28.756712Z" } }, "outputs": [ { "data": { "text/plain": [ "array(['yes', 'no', 'yes'], dtype='\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[1;31m# C.T = C transposed\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 2\u001b[1;33m \u001b[0mC\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mT\u001b[0m \u001b[1;33m-\u001b[0m \u001b[0mA\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[1;31mValueError\u001b[0m: operands could not be broadcast together with shapes (3,2) (3,) " ] } ], "source": [ "# C.T = C transposed\n", "C.T - A" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "ExecuteTime": { "end_time": "2021-05-06T06:24:40.227409Z", "start_time": "2021-05-06T06:24:40.221750Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1 4]\n", " [2 5]\n", " [3 6]] (3, 2)\n" ] } ], "source": [ "print(C.T, np.shape(C.T))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We are somehow trying to combine apples and oranges here, but NumPy provides a fix. `np.newaxis` can be used to extend an array in a dimension. In this case we would like A to be of shape `(3, 1)`, a one dimensional array in two dimensions so to say. Lets try:" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "ExecuteTime": { "end_time": "2021-05-06T06:24:40.987863Z", "start_time": "2021-05-06T06:24:40.977212Z" } }, "outputs": [ { "ename": "ValueError", "evalue": "operands could not be broadcast together with shapes (3,2) (1,3) ", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mC\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mT\u001b[0m \u001b[1;33m-\u001b[0m \u001b[0mA\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mnp\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mnewaxis\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[1;31mValueError\u001b[0m: operands could not be broadcast together with shapes (3,2) (1,3) " ] } ], "source": [ "C.T - A[np.newaxis]" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "ExecuteTime": { "end_time": "2021-05-06T06:24:41.244279Z", "start_time": "2021-05-06T06:24:41.238414Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1 2 3]] (1, 3)\n" ] } ], "source": [ "print(A[np.newaxis], np.shape(A[np.newaxis]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This failed, too because the new axis (or new dimension) is in the wrong place! We want to conserve the first dimension and append the new one thereafter. " ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "ExecuteTime": { "end_time": "2021-05-06T06:24:42.668753Z", "start_time": "2021-05-06T06:24:42.662320Z" } }, "outputs": [ { "data": { "text/plain": [ "array([[0, 3],\n", " [0, 3],\n", " [0, 3]])" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "C.T - A[:, np.newaxis]" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "ExecuteTime": { "end_time": "2021-05-06T06:24:42.894534Z", "start_time": "2021-05-06T06:24:42.888647Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1]\n", " [2]\n", " [3]] (3, 1)\n" ] } ], "source": [ "print(A[:, np.newaxis], np.shape(A[:, np.newaxis]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, our operation was succesfull." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**Note:** Since `C` is a 2D array, we can think of it as a matrix. Using `C.T` is the transpose of this matrix. In higher dimensions array.T reverses the order of all dimensions. \n", "
" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }