{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "id": "TD06UcxH2Aap"
   },
   "source": [
    "# 8. Operating on DataFrames\n",
    "\n",
    "We have seen in the very first chapter that we could easily import CSV or Excel sheets as DataFrames in Python. We have also seen that those dataframes are essentially two-dimensional tables where each element can be located via an index and a column name. We have also seen that each column is in fact a Numpy array and have learned how to handle and process those arrays.\n",
    "\n",
    "Here we will show how most of the concepts we learned for Numpy are directly translateable into Pandas."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "executionInfo": {
     "elapsed": 722,
     "status": "ok",
     "timestamp": 1614349113268,
     "user": {
      "displayName": "Guillaume Witz",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GgT0K2JVYzEsjzsS5nhkUVjUrSIJ5jHzXnBoYrmVf8=s64",
      "userId": "16033870147214403532"
     },
     "user_tz": -60
    },
    "id": "PppCId1k23Bb",
    "tags": []
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ohXAnizD235F"
   },
   "source": [
    "We here again use our composer data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "executionInfo": {
     "elapsed": 2000,
     "status": "ok",
     "timestamp": 1614349123619,
     "user": {
      "displayName": "Guillaume Witz",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GgT0K2JVYzEsjzsS5nhkUVjUrSIJ5jHzXnBoYrmVf8=s64",
      "userId": "16033870147214403532"
     },
     "user_tz": -60
    },
    "id": "IE2lYZGo3ywH",
    "tags": []
   },
   "outputs": [],
   "source": [
    "composers = pd.read_excel('https://github.com/guiwitz/NumpyPandas_course/blob/master/Data/composers.xlsx?raw=true')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 173
    },
    "executionInfo": {
     "elapsed": 1604,
     "status": "ok",
     "timestamp": 1614349123620,
     "user": {
      "displayName": "Guillaume Witz",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GgT0K2JVYzEsjzsS5nhkUVjUrSIJ5jHzXnBoYrmVf8=s64",
      "userId": "16033870147214403532"
     },
     "user_tz": -60
    },
    "id": "M-T0Ua8J31xZ",
    "outputId": "1f850085-1c6f-4c7c-a42b-bf1d1cfbbefe",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>composer</th>\n",
       "      <th>birth</th>\n",
       "      <th>death</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Mahler</td>\n",
       "      <td>1860</td>\n",
       "      <td>1911</td>\n",
       "      <td>Kaliste</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Beethoven</td>\n",
       "      <td>1770</td>\n",
       "      <td>1827</td>\n",
       "      <td>Bonn</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Puccini</td>\n",
       "      <td>1858</td>\n",
       "      <td>1924</td>\n",
       "      <td>Lucques</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Shostakovich</td>\n",
       "      <td>1906</td>\n",
       "      <td>1975</td>\n",
       "      <td>Saint-Petersburg</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       composer  birth  death              city\n",
       "0        Mahler   1860   1911           Kaliste\n",
       "1     Beethoven   1770   1827              Bonn\n",
       "2       Puccini   1858   1924           Lucques\n",
       "3  Shostakovich   1906   1975  Saint-Petersburg"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "composers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "sHx_oGrs37bT"
   },
   "source": [
    "Just like with Numpy arrays, we can apply various mathematical operations on DataFrames columns or entire DataFrames. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "HbHCGJNEtO3W"
   },
   "source": [
    "## Arithmetics single columns\n",
    "Just like we could do an operation like ```my_array = my_array + 1```, we can do the same with Pandas columns. Doing such operations, we directly reaplace an existing column or create new one like here:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 173
    },
    "executionInfo": {
     "elapsed": 466,
     "status": "ok",
     "timestamp": 1614349212760,
     "user": {
      "displayName": "Guillaume Witz",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GgT0K2JVYzEsjzsS5nhkUVjUrSIJ5jHzXnBoYrmVf8=s64",
      "userId": "16033870147214403532"
     },
     "user_tz": -60
    },
    "id": "sMZmTQ894M4M",
    "outputId": "1748c398-4385-4aed-970f-e17a2825bc48",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>composer</th>\n",
       "      <th>birth</th>\n",
       "      <th>death</th>\n",
       "      <th>city</th>\n",
       "      <th>death_100</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Mahler</td>\n",
       "      <td>1860</td>\n",
       "      <td>1911</td>\n",
       "      <td>Kaliste</td>\n",
       "      <td>2011</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Beethoven</td>\n",
       "      <td>1770</td>\n",
       "      <td>1827</td>\n",
       "      <td>Bonn</td>\n",
       "      <td>1927</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Puccini</td>\n",
       "      <td>1858</td>\n",
       "      <td>1924</td>\n",
       "      <td>Lucques</td>\n",
       "      <td>2024</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Shostakovich</td>\n",
       "      <td>1906</td>\n",
       "      <td>1975</td>\n",
       "      <td>Saint-Petersburg</td>\n",
       "      <td>2075</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       composer  birth  death              city  death_100\n",
       "0        Mahler   1860   1911           Kaliste       2011\n",
       "1     Beethoven   1770   1827              Bonn       1927\n",
       "2       Puccini   1858   1924           Lucques       2024\n",
       "3  Shostakovich   1906   1975  Saint-Petersburg       2075"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "composers['death_100'] = composers['death'] + 100\n",
    "composers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "0Ra395dx4Qb0"
   },
   "source": [
    "### Arithmetics multiple columns\n",
    "\n",
    "We can also do maths with multiple columns as we have previously done with arrays. Just like with arrays, **operations are performed element-wise**. We can for example compute the composers' age at death:\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 544,
     "status": "ok",
     "timestamp": 1614349226184,
     "user": {
      "displayName": "Guillaume Witz",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GgT0K2JVYzEsjzsS5nhkUVjUrSIJ5jHzXnBoYrmVf8=s64",
      "userId": "16033870147214403532"
     },
     "user_tz": -60
    },
    "id": "ijOU4ik34qoB",
    "outputId": "b32ef230-f677-4b13-9257-2cd98e2882d0",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    51\n",
       "1    57\n",
       "2    66\n",
       "3    69\n",
       "dtype: int64"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "composers['death']-composers['birth']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "WfEsNtkCZVZd"
   },
   "source": [
    "Note that the output here is a series with the correct length, so we can simply add it to our original table directly!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 173
    },
    "executionInfo": {
     "elapsed": 531,
     "status": "ok",
     "timestamp": 1614349231334,
     "user": {
      "displayName": "Guillaume Witz",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GgT0K2JVYzEsjzsS5nhkUVjUrSIJ5jHzXnBoYrmVf8=s64",
      "userId": "16033870147214403532"
     },
     "user_tz": -60
    },
    "id": "Ie2iIPfNZfCf",
    "outputId": "d6b800ed-b07f-4a45-8c7b-13ae768f453d",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>composer</th>\n",
       "      <th>birth</th>\n",
       "      <th>death</th>\n",
       "      <th>city</th>\n",
       "      <th>death_100</th>\n",
       "      <th>age</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Mahler</td>\n",
       "      <td>1860</td>\n",
       "      <td>1911</td>\n",
       "      <td>Kaliste</td>\n",
       "      <td>2011</td>\n",
       "      <td>51</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Beethoven</td>\n",
       "      <td>1770</td>\n",
       "      <td>1827</td>\n",
       "      <td>Bonn</td>\n",
       "      <td>1927</td>\n",
       "      <td>57</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Puccini</td>\n",
       "      <td>1858</td>\n",
       "      <td>1924</td>\n",
       "      <td>Lucques</td>\n",
       "      <td>2024</td>\n",
       "      <td>66</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Shostakovich</td>\n",
       "      <td>1906</td>\n",
       "      <td>1975</td>\n",
       "      <td>Saint-Petersburg</td>\n",
       "      <td>2075</td>\n",
       "      <td>69</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       composer  birth  death              city  death_100  age\n",
       "0        Mahler   1860   1911           Kaliste       2011   51\n",
       "1     Beethoven   1770   1827              Bonn       1927   57\n",
       "2       Puccini   1858   1924           Lucques       2024   66\n",
       "3  Shostakovich   1906   1975  Saint-Petersburg       2075   69"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "composers['age'] = composers['death']-composers['birth']\n",
    "composers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ZmvIVmKj4tjy"
   },
   "source": [
    "## Applying functions\n",
    "\n",
    "Again, just like we could previously apply a function to an array and recover an array of the same size where the function had been applied **element-wise** we can do that same with a DataFrame column and recover a column. And we can simply use Numpy functions!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 497,
     "status": "ok",
     "timestamp": 1614349243384,
     "user": {
      "displayName": "Guillaume Witz",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GgT0K2JVYzEsjzsS5nhkUVjUrSIJ5jHzXnBoYrmVf8=s64",
      "userId": "16033870147214403532"
     },
     "user_tz": -60
    },
    "id": "3SPc0aEJ58QH",
    "outputId": "8d71ab22-75b7-4196-c4e6-72ac9fe345be",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    3.269513\n",
       "1    3.247973\n",
       "2    3.269046\n",
       "3    3.280123\n",
       "Name: birth, dtype: float64"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.log10(composers['birth'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "GtTVjZyRts_w"
   },
   "source": [
    "## Applying array wide\n",
    "\n",
    "Not that you should be careful if you want to apply a function to more than one column. For example if we want to apply the above operation to the entire array:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 249
    },
    "executionInfo": {
     "elapsed": 543,
     "status": "error",
     "timestamp": 1614349303728,
     "user": {
      "displayName": "Guillaume Witz",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GgT0K2JVYzEsjzsS5nhkUVjUrSIJ5jHzXnBoYrmVf8=s64",
      "userId": "16033870147214403532"
     },
     "user_tz": -60
    },
    "id": "zyvq9zDzt2Jb",
    "outputId": "79d527ec-466d-4741-87d6-4f1f3e326c4e",
    "tags": []
   },
   "outputs": [
    {
     "ename": "TypeError",
     "evalue": "loop of ufunc does not support argument 0 of type str which has no callable log method",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mAttributeError\u001b[0m                            Traceback (most recent call last)",
      "\u001b[0;31mAttributeError\u001b[0m: 'str' object has no attribute 'log'",
      "\nThe above exception was the direct cause of the following exception:\n",
      "\u001b[0;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
      "Cell \u001b[0;32mIn[8], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mnp\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mlog\u001b[49m\u001b[43m(\u001b[49m\u001b[43mcomposers\u001b[49m\u001b[43m)\u001b[49m\n",
      "File \u001b[0;32m~/mambaforge/envs/DAVPy2023/lib/python3.10/site-packages/pandas/core/generic.py:2113\u001b[0m, in \u001b[0;36mNDFrame.__array_ufunc__\u001b[0;34m(self, ufunc, method, *inputs, **kwargs)\u001b[0m\n\u001b[1;32m   2109\u001b[0m \u001b[38;5;129m@final\u001b[39m\n\u001b[1;32m   2110\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m__array_ufunc__\u001b[39m(\n\u001b[1;32m   2111\u001b[0m     \u001b[38;5;28mself\u001b[39m, ufunc: np\u001b[38;5;241m.\u001b[39mufunc, method: \u001b[38;5;28mstr\u001b[39m, \u001b[38;5;241m*\u001b[39minputs: Any, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs: Any\n\u001b[1;32m   2112\u001b[0m ):\n\u001b[0;32m-> 2113\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43marraylike\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43marray_ufunc\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mufunc\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mmethod\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43minputs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n",
      "File \u001b[0;32m~/mambaforge/envs/DAVPy2023/lib/python3.10/site-packages/pandas/core/arraylike.py:410\u001b[0m, in \u001b[0;36marray_ufunc\u001b[0;34m(self, ufunc, method, *inputs, **kwargs)\u001b[0m\n\u001b[1;32m    405\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m method \u001b[38;5;241m==\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m__call__\u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m kwargs:\n\u001b[1;32m    406\u001b[0m     \u001b[38;5;66;03m# for np.<ufunc>(..) calls\u001b[39;00m\n\u001b[1;32m    407\u001b[0m     \u001b[38;5;66;03m# kwargs cannot necessarily be handled block-by-block, so only\u001b[39;00m\n\u001b[1;32m    408\u001b[0m     \u001b[38;5;66;03m# take this path if there are no kwargs\u001b[39;00m\n\u001b[1;32m    409\u001b[0m     mgr \u001b[38;5;241m=\u001b[39m inputs[\u001b[38;5;241m0\u001b[39m]\u001b[38;5;241m.\u001b[39m_mgr\n\u001b[0;32m--> 410\u001b[0m     result \u001b[38;5;241m=\u001b[39m \u001b[43mmgr\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mapply\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mgetattr\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mufunc\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mmethod\u001b[49m\u001b[43m)\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    411\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m    412\u001b[0m     \u001b[38;5;66;03m# otherwise specific ufunc methods (eg np.<ufunc>.accumulate(..))\u001b[39;00m\n\u001b[1;32m    413\u001b[0m     \u001b[38;5;66;03m# Those can have an axis keyword and thus can't be called block-by-block\u001b[39;00m\n\u001b[1;32m    414\u001b[0m     result \u001b[38;5;241m=\u001b[39m default_array_ufunc(inputs[\u001b[38;5;241m0\u001b[39m], ufunc, method, \u001b[38;5;241m*\u001b[39minputs, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs)\n",
      "File \u001b[0;32m~/mambaforge/envs/DAVPy2023/lib/python3.10/site-packages/pandas/core/internals/managers.py:350\u001b[0m, in \u001b[0;36mBaseBlockManager.apply\u001b[0;34m(self, f, align_keys, ignore_failures, **kwargs)\u001b[0m\n\u001b[1;32m    348\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m    349\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m callable(f):\n\u001b[0;32m--> 350\u001b[0m         applied \u001b[38;5;241m=\u001b[39m \u001b[43mb\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mapply\u001b[49m\u001b[43m(\u001b[49m\u001b[43mf\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    351\u001b[0m     \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m    352\u001b[0m         applied \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mgetattr\u001b[39m(b, f)(\u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs)\n",
      "File \u001b[0;32m~/mambaforge/envs/DAVPy2023/lib/python3.10/site-packages/pandas/core/internals/blocks.py:351\u001b[0m, in \u001b[0;36mBlock.apply\u001b[0;34m(self, func, **kwargs)\u001b[0m\n\u001b[1;32m    345\u001b[0m \u001b[38;5;129m@final\u001b[39m\n\u001b[1;32m    346\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mapply\u001b[39m(\u001b[38;5;28mself\u001b[39m, func, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m \u001b[38;5;28mlist\u001b[39m[Block]:\n\u001b[1;32m    347\u001b[0m \u001b[38;5;250m    \u001b[39m\u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[1;32m    348\u001b[0m \u001b[38;5;124;03m    apply the function to my values; return a block if we are not\u001b[39;00m\n\u001b[1;32m    349\u001b[0m \u001b[38;5;124;03m    one\u001b[39;00m\n\u001b[1;32m    350\u001b[0m \u001b[38;5;124;03m    \"\"\"\u001b[39;00m\n\u001b[0;32m--> 351\u001b[0m     result \u001b[38;5;241m=\u001b[39m \u001b[43mfunc\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mvalues\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    353\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_split_op_result(result)\n",
      "\u001b[0;31mTypeError\u001b[0m: loop of ufunc does not support argument 0 of type str which has no callable log method"
     ]
    }
   ],
   "source": [
    "np.log(composers)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "83cz99D6t4Ox"
   },
   "source": [
    "We get an error because some columns are **not numerical** and therefore Pandas doesn't know what to do. If we only operate on numerical columns this however works:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 173
    },
    "executionInfo": {
     "elapsed": 583,
     "status": "ok",
     "timestamp": 1614349363686,
     "user": {
      "displayName": "Guillaume Witz",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GgT0K2JVYzEsjzsS5nhkUVjUrSIJ5jHzXnBoYrmVf8=s64",
      "userId": "16033870147214403532"
     },
     "user_tz": -60
    },
    "id": "W0XgXednuByx",
    "outputId": "6547dd41-e2f8-4a73-80a8-121e04f29761",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>birth</th>\n",
       "      <th>death</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1860</td>\n",
       "      <td>1911</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1770</td>\n",
       "      <td>1827</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1858</td>\n",
       "      <td>1924</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1906</td>\n",
       "      <td>1975</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   birth  death\n",
       "0   1860   1911\n",
       "1   1770   1827\n",
       "2   1858   1924\n",
       "3   1906   1975"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sub_df = composers[['birth', 'death']]\n",
    "sub_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 173
    },
    "executionInfo": {
     "elapsed": 534,
     "status": "ok",
     "timestamp": 1614349372379,
     "user": {
      "displayName": "Guillaume Witz",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GgT0K2JVYzEsjzsS5nhkUVjUrSIJ5jHzXnBoYrmVf8=s64",
      "userId": "16033870147214403532"
     },
     "user_tz": -60
    },
    "id": "NAwWKMq5uGwa",
    "outputId": "f288b080-56ab-4864-e4f7-06ee325fd46e",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>birth</th>\n",
       "      <th>death</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3.269513</td>\n",
       "      <td>3.281261</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3.247973</td>\n",
       "      <td>3.261739</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3.269046</td>\n",
       "      <td>3.284205</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3.280123</td>\n",
       "      <td>3.295567</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      birth     death\n",
       "0  3.269513  3.281261\n",
       "1  3.247973  3.261739\n",
       "2  3.269046  3.284205\n",
       "3  3.280123  3.295567"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.log10(sub_df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "cmW_C6DtZvqA"
   },
   "source": [
    "## The ```apply``` function\n",
    "\n",
    "There is also a more general way of applying **custom functions** using the ```apply()``` function. There are many ways of using this function: on the full DataFrame, on single columns, with a single output value, a series etc. Here we look at the simplest case, where we apply a given function to each element of a given column, thus generating a new *modified* column.\n",
    "\n",
    "First we need to define a custom function. Here we want a function to tell us whether a composer is born before or after 1900 by looking at the ```birth``` column. So we define a standard function that takes a year as input and returns True or False:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "executionInfo": {
     "elapsed": 630,
     "status": "ok",
     "timestamp": 1614349600530,
     "user": {
      "displayName": "Guillaume Witz",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GgT0K2JVYzEsjzsS5nhkUVjUrSIJ5jHzXnBoYrmVf8=s64",
      "userId": "16033870147214403532"
     },
     "user_tz": -60
    },
    "id": "UKxF4zklbSch",
    "tags": []
   },
   "outputs": [],
   "source": [
    "def age_fun(age):\n",
    "    if age < 1900:\n",
    "        return False\n",
    "    else:\n",
    "        return True"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "MAK29LitbbT-"
   },
   "source": [
    "Let's test it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 632,
     "status": "ok",
     "timestamp": 1614349602992,
     "user": {
      "displayName": "Guillaume Witz",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GgT0K2JVYzEsjzsS5nhkUVjUrSIJ5jHzXnBoYrmVf8=s64",
      "userId": "16033870147214403532"
     },
     "user_tz": -60
    },
    "id": "7_Dmazwobdvr",
    "outputId": "8193a749-42f6-4533-f083-06231bd4ba7e",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "age_fun(1700)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 648,
     "status": "ok",
     "timestamp": 1614349603293,
     "user": {
      "displayName": "Guillaume Witz",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GgT0K2JVYzEsjzsS5nhkUVjUrSIJ5jHzXnBoYrmVf8=s64",
      "userId": "16033870147214403532"
     },
     "user_tz": -60
    },
    "id": "tPFgSrwBbf3E",
    "outputId": "6b824e2e-d1e7-47bc-d5c9-9be7898beeab",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "age_fun(1950)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "B3iXvKscbhmi"
   },
   "source": [
    "Now we simply pass this function to the ```apply``` method on a specific column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 593,
     "status": "ok",
     "timestamp": 1614349605341,
     "user": {
      "displayName": "Guillaume Witz",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GgT0K2JVYzEsjzsS5nhkUVjUrSIJ5jHzXnBoYrmVf8=s64",
      "userId": "16033870147214403532"
     },
     "user_tz": -60
    },
    "id": "59R2Gyjubsu6",
    "outputId": "35b78608-71be-42ac-8602-f20ac12d28e5",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    False\n",
       "1    False\n",
       "2    False\n",
       "3     True\n",
       "Name: birth, dtype: bool"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "composers['birth'].apply(age_fun)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "tRcncZvmbxcX"
   },
   "source": [
    "Again we can directly create a new columns with this output:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 173
    },
    "executionInfo": {
     "elapsed": 637,
     "status": "ok",
     "timestamp": 1614349608516,
     "user": {
      "displayName": "Guillaume Witz",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GgT0K2JVYzEsjzsS5nhkUVjUrSIJ5jHzXnBoYrmVf8=s64",
      "userId": "16033870147214403532"
     },
     "user_tz": -60
    },
    "id": "5bp_L05sb39A",
    "outputId": "952aefc4-56fe-444f-b6f2-c01c112bfc12",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>composer</th>\n",
       "      <th>birth</th>\n",
       "      <th>death</th>\n",
       "      <th>city</th>\n",
       "      <th>death_100</th>\n",
       "      <th>age</th>\n",
       "      <th>20th century</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Mahler</td>\n",
       "      <td>1860</td>\n",
       "      <td>1911</td>\n",
       "      <td>Kaliste</td>\n",
       "      <td>2011</td>\n",
       "      <td>51</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Beethoven</td>\n",
       "      <td>1770</td>\n",
       "      <td>1827</td>\n",
       "      <td>Bonn</td>\n",
       "      <td>1927</td>\n",
       "      <td>57</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Puccini</td>\n",
       "      <td>1858</td>\n",
       "      <td>1924</td>\n",
       "      <td>Lucques</td>\n",
       "      <td>2024</td>\n",
       "      <td>66</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Shostakovich</td>\n",
       "      <td>1906</td>\n",
       "      <td>1975</td>\n",
       "      <td>Saint-Petersburg</td>\n",
       "      <td>2075</td>\n",
       "      <td>69</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       composer  birth  death              city  death_100  age  20th century\n",
       "0        Mahler   1860   1911           Kaliste       2011   51         False\n",
       "1     Beethoven   1770   1827              Bonn       1927   57         False\n",
       "2       Puccini   1858   1924           Lucques       2024   66         False\n",
       "3  Shostakovich   1906   1975  Saint-Petersburg       2075   69          True"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "composers['20th century'] = composers['birth'].apply(age_fun)\n",
    "composers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "1QIL-5Ff6Ljs"
   },
   "source": [
    "## Summarizing functions\n",
    "Finally, we can summarized columns by using either Numpy function or the methods attached to a Dataframe or a column. For example we can take the mean:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 479,
     "status": "ok",
     "timestamp": 1614349618875,
     "user": {
      "displayName": "Guillaume Witz",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GgT0K2JVYzEsjzsS5nhkUVjUrSIJ5jHzXnBoYrmVf8=s64",
      "userId": "16033870147214403532"
     },
     "user_tz": -60
    },
    "id": "fBLc13Vv6fXm",
    "outputId": "c34bf20c-b038-4b23-c980-ae1ee137e736",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1848.5"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.mean(composers['birth'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 829,
     "status": "ok",
     "timestamp": 1614349619785,
     "user": {
      "displayName": "Guillaume Witz",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GgT0K2JVYzEsjzsS5nhkUVjUrSIJ5jHzXnBoYrmVf8=s64",
      "userId": "16033870147214403532"
     },
     "user_tz": -60
    },
    "id": "56rd355o6hwx",
    "outputId": "537a25e1-c699-4da9-b2f3-1c7750e0f85f",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1848.5"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "composers['birth'].mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "611XOtz_6kJp"
   },
   "source": [
    "## Exercise\n",
    "\n",
    "1. Import the penguin dataset https://raw.githubusercontent.com/allisonhorst/palmerpenguins/master/inst/extdata/penguins.csv\n",
    "2. Create a new columns called ```body_mass_kg``` by transforming the ```body_mass_g``` column into kilograms.\n",
    "3. Create a new column containiing the division the ```bill_length_mm``` by the ```bill_depth_mm``` column\n",
    "4. Calculate the median value of the ```body_mass_g``` column both by using a Numpy function and by using the method attached to the column. What do you observe? Can you infer what the problem is?"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "authorship_tag": "ABX9TyOW6mzLgtxOfld0ibM0SUYj",
   "collapsed_sections": [],
   "name": "07-Operate_on_DataFrames.ipynb",
   "provenance": [],
   "toc_visible": true
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}