{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "id": "2KGt0M0jmhmT"
   },
   "source": [
    "# 11. Real-world problems and additional topics\n",
    "\n",
    "During this course we have either used simple demo examples designed to illustrate specific aspects of Pandas, or real-world data where we ignored some of the potentials problems. In this notebook, we address a few of the common problems that you might encounter. We also address some other common situations that you might encounter such as dealing with specific formats like dates."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "id": "ZOGIWe8fm-lT",
    "tags": []
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "nO7Di3iDm7Y1"
   },
   "source": [
    "## Absent values\n",
    "\n",
    "Very often, data that you need are imperfect: they migth have missing values or some values might be \"incorrect\", e.g. in a column supposed to contain numbers you suddently encounter text. Pandas offers many ways to fix these problems.\n",
    "\n",
    "Let's once again import our penguin data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "id": "5m76IE2cnadj",
    "tags": []
   },
   "outputs": [],
   "source": [
    "penguins = pd.read_csv('https://raw.githubusercontent.com/allisonhorst/palmerpenguins/master/inst/extdata/penguins.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 1565,
     "status": "ok",
     "timestamp": 1614505363769,
     "user": {
      "displayName": "Guillaume Witz",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GgT0K2JVYzEsjzsS5nhkUVjUrSIJ5jHzXnBoYrmVf8=s64",
      "userId": "16033870147214403532"
     },
     "user_tz": -60
    },
    "id": "vnTRg-CyoEHg",
    "outputId": "d35e11a0-880c-4758-d9c6-736899f80bb1",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(344, 8)"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "penguins.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 204
    },
    "executionInfo": {
     "elapsed": 2066,
     "status": "ok",
     "timestamp": 1614505364499,
     "user": {
      "displayName": "Guillaume Witz",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GgT0K2JVYzEsjzsS5nhkUVjUrSIJ5jHzXnBoYrmVf8=s64",
      "userId": "16033870147214403532"
     },
     "user_tz": -60
    },
    "id": "DCZsRpzbnkoW",
    "outputId": "cdaa2e60-3d24-4f20-f47d-0caade5317ef",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>species</th>\n",
       "      <th>island</th>\n",
       "      <th>bill_length_mm</th>\n",
       "      <th>bill_depth_mm</th>\n",
       "      <th>flipper_length_mm</th>\n",
       "      <th>body_mass_g</th>\n",
       "      <th>sex</th>\n",
       "      <th>year</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.1</td>\n",
       "      <td>18.7</td>\n",
       "      <td>181.0</td>\n",
       "      <td>3750.0</td>\n",
       "      <td>male</td>\n",
       "      <td>2007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.5</td>\n",
       "      <td>17.4</td>\n",
       "      <td>186.0</td>\n",
       "      <td>3800.0</td>\n",
       "      <td>female</td>\n",
       "      <td>2007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>40.3</td>\n",
       "      <td>18.0</td>\n",
       "      <td>195.0</td>\n",
       "      <td>3250.0</td>\n",
       "      <td>female</td>\n",
       "      <td>2007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>36.7</td>\n",
       "      <td>19.3</td>\n",
       "      <td>193.0</td>\n",
       "      <td>3450.0</td>\n",
       "      <td>female</td>\n",
       "      <td>2007</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  species     island  bill_length_mm  bill_depth_mm  flipper_length_mm  \\\n",
       "0  Adelie  Torgersen            39.1           18.7              181.0   \n",
       "1  Adelie  Torgersen            39.5           17.4              186.0   \n",
       "2  Adelie  Torgersen            40.3           18.0              195.0   \n",
       "3  Adelie  Torgersen             NaN            NaN                NaN   \n",
       "4  Adelie  Torgersen            36.7           19.3              193.0   \n",
       "\n",
       "   body_mass_g     sex  year  \n",
       "0       3750.0    male  2007  \n",
       "1       3800.0  female  2007  \n",
       "2       3250.0  female  2007  \n",
       "3          NaN     NaN  2007  \n",
       "4       3450.0  female  2007  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "penguins.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "HCWF9lrYnncQ"
   },
   "source": [
    "Here we see that some values are missing and filled with ```NaN```. As we have seen previously, Pandas is capable of handling ```NaN``` values e.g. when calculating statistics. But sometimes we want to remove those values. We can for example use ```.dropna()``` for this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "id": "wXxAP6Uvn_IN",
    "tags": []
   },
   "outputs": [],
   "source": [
    "penguins_clean = penguins.dropna()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 204
    },
    "executionInfo": {
     "elapsed": 1431,
     "status": "ok",
     "timestamp": 1614505364963,
     "user": {
      "displayName": "Guillaume Witz",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GgT0K2JVYzEsjzsS5nhkUVjUrSIJ5jHzXnBoYrmVf8=s64",
      "userId": "16033870147214403532"
     },
     "user_tz": -60
    },
    "id": "LKYIaY3ooBmV",
    "outputId": "38c24c5e-9242-4683-9cf2-d743bae43186",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>species</th>\n",
       "      <th>island</th>\n",
       "      <th>bill_length_mm</th>\n",
       "      <th>bill_depth_mm</th>\n",
       "      <th>flipper_length_mm</th>\n",
       "      <th>body_mass_g</th>\n",
       "      <th>sex</th>\n",
       "      <th>year</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.1</td>\n",
       "      <td>18.7</td>\n",
       "      <td>181.0</td>\n",
       "      <td>3750.0</td>\n",
       "      <td>male</td>\n",
       "      <td>2007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.5</td>\n",
       "      <td>17.4</td>\n",
       "      <td>186.0</td>\n",
       "      <td>3800.0</td>\n",
       "      <td>female</td>\n",
       "      <td>2007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>40.3</td>\n",
       "      <td>18.0</td>\n",
       "      <td>195.0</td>\n",
       "      <td>3250.0</td>\n",
       "      <td>female</td>\n",
       "      <td>2007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>36.7</td>\n",
       "      <td>19.3</td>\n",
       "      <td>193.0</td>\n",
       "      <td>3450.0</td>\n",
       "      <td>female</td>\n",
       "      <td>2007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.3</td>\n",
       "      <td>20.6</td>\n",
       "      <td>190.0</td>\n",
       "      <td>3650.0</td>\n",
       "      <td>male</td>\n",
       "      <td>2007</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  species     island  bill_length_mm  bill_depth_mm  flipper_length_mm  \\\n",
       "0  Adelie  Torgersen            39.1           18.7              181.0   \n",
       "1  Adelie  Torgersen            39.5           17.4              186.0   \n",
       "2  Adelie  Torgersen            40.3           18.0              195.0   \n",
       "4  Adelie  Torgersen            36.7           19.3              193.0   \n",
       "5  Adelie  Torgersen            39.3           20.6              190.0   \n",
       "\n",
       "   body_mass_g     sex  year  \n",
       "0       3750.0    male  2007  \n",
       "1       3800.0  female  2007  \n",
       "2       3250.0  female  2007  \n",
       "4       3450.0  female  2007  \n",
       "5       3650.0    male  2007  "
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "penguins_clean.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "KciDiGOJoXGn"
   },
   "source": [
    "We see that for example the index 3 has disappeared. ```.dropna()``` drops each line that has an ```NaN``` in any column.\n",
    "\n",
    "You can also be more precise and check for ```NaN``` in a given column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 1224,
     "status": "ok",
     "timestamp": 1614505365771,
     "user": {
      "displayName": "Guillaume Witz",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GgT0K2JVYzEsjzsS5nhkUVjUrSIJ5jHzXnBoYrmVf8=s64",
      "userId": "16033870147214403532"
     },
     "user_tz": -60
    },
    "id": "ZXT99zLAaGqh",
    "outputId": "fd3dcee8-4af2-4165-ad09-271aeca75889",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      False\n",
       "1      False\n",
       "2      False\n",
       "3       True\n",
       "4      False\n",
       "       ...  \n",
       "339    False\n",
       "340    False\n",
       "341    False\n",
       "342    False\n",
       "343    False\n",
       "Name: bill_length_mm, Length: 344, dtype: bool"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "penguins.bill_length_mm.isna()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "e8ONUqmWaMhA"
   },
   "source": [
    "Using indexing, we can now remove **only lines** which have ```NA``` in that column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 204
    },
    "executionInfo": {
     "elapsed": 1220,
     "status": "ok",
     "timestamp": 1614505366575,
     "user": {
      "displayName": "Guillaume Witz",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GgT0K2JVYzEsjzsS5nhkUVjUrSIJ5jHzXnBoYrmVf8=s64",
      "userId": "16033870147214403532"
     },
     "user_tz": -60
    },
    "id": "obBjaiLnaU-7",
    "outputId": "378550b2-dbae-4dfa-9f3f-8504fb5402c5",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>species</th>\n",
       "      <th>island</th>\n",
       "      <th>bill_length_mm</th>\n",
       "      <th>bill_depth_mm</th>\n",
       "      <th>flipper_length_mm</th>\n",
       "      <th>body_mass_g</th>\n",
       "      <th>sex</th>\n",
       "      <th>year</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.1</td>\n",
       "      <td>18.7</td>\n",
       "      <td>181.0</td>\n",
       "      <td>3750.0</td>\n",
       "      <td>male</td>\n",
       "      <td>2007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.5</td>\n",
       "      <td>17.4</td>\n",
       "      <td>186.0</td>\n",
       "      <td>3800.0</td>\n",
       "      <td>female</td>\n",
       "      <td>2007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>40.3</td>\n",
       "      <td>18.0</td>\n",
       "      <td>195.0</td>\n",
       "      <td>3250.0</td>\n",
       "      <td>female</td>\n",
       "      <td>2007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>36.7</td>\n",
       "      <td>19.3</td>\n",
       "      <td>193.0</td>\n",
       "      <td>3450.0</td>\n",
       "      <td>female</td>\n",
       "      <td>2007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Adelie</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>39.3</td>\n",
       "      <td>20.6</td>\n",
       "      <td>190.0</td>\n",
       "      <td>3650.0</td>\n",
       "      <td>male</td>\n",
       "      <td>2007</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  species     island  bill_length_mm  bill_depth_mm  flipper_length_mm  \\\n",
       "0  Adelie  Torgersen            39.1           18.7              181.0   \n",
       "1  Adelie  Torgersen            39.5           17.4              186.0   \n",
       "2  Adelie  Torgersen            40.3           18.0              195.0   \n",
       "4  Adelie  Torgersen            36.7           19.3              193.0   \n",
       "5  Adelie  Torgersen            39.3           20.6              190.0   \n",
       "\n",
       "   body_mass_g     sex  year  \n",
       "0       3750.0    male  2007  \n",
       "1       3800.0  female  2007  \n",
       "2       3250.0  female  2007  \n",
       "4       3450.0  female  2007  \n",
       "5       3650.0    male  2007  "
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "penguins_clean2 = penguins[~penguins.bill_length_mm.isna()]\n",
    "penguins_clean2.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "zSV-nsx0admX"
   },
   "source": [
    "## Replacing *bad* values\n",
    "\n",
    "When importing a table, Pandas is capable of recognizing specific values as being ```NaN```. For example the text NaN or missing values are turned in \"offficial\" ```NaN```. But often there are just mistakes in the table. For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "id": "3Ns1W95HdLzx",
    "tags": []
   },
   "outputs": [],
   "source": [
    "composer_url = 'https://github.com/guiwitz/NumpyPandas_course/blob/master/Data/composers.xlsx?raw=true'\n",
    "composers = pd.read_excel(composer_url, sheet_name=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 235
    },
    "executionInfo": {
     "elapsed": 1698,
     "status": "ok",
     "timestamp": 1614505368437,
     "user": {
      "displayName": "Guillaume Witz",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GgT0K2JVYzEsjzsS5nhkUVjUrSIJ5jHzXnBoYrmVf8=s64",
      "userId": "16033870147214403532"
     },
     "user_tz": -60
    },
    "id": "-jx4v6LD8nhn",
    "outputId": "0a6f5976-6f64-4053-a2f5-9dbd69ac5ed4",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>composer</th>\n",
       "      <th>birth</th>\n",
       "      <th>death</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Mahler</td>\n",
       "      <td>1860.0</td>\n",
       "      <td>1911</td>\n",
       "      <td>Kaliste</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Beethoven</td>\n",
       "      <td>1770.0</td>\n",
       "      <td>1827</td>\n",
       "      <td>Bonn</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Puccini</td>\n",
       "      <td>1858.0</td>\n",
       "      <td>1924</td>\n",
       "      <td>Lucques</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Shostakovich</td>\n",
       "      <td>1906.0</td>\n",
       "      <td>1975</td>\n",
       "      <td>Saint-Petersburg</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Sibelius</td>\n",
       "      <td>10.0</td>\n",
       "      <td>unknown</td>\n",
       "      <td>unknown</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Haydn</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Röhrau</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       composer   birth    death              city\n",
       "0        Mahler  1860.0     1911           Kaliste\n",
       "1     Beethoven  1770.0     1827              Bonn\n",
       "2       Puccini  1858.0     1924           Lucques\n",
       "3  Shostakovich  1906.0     1975  Saint-Petersburg\n",
       "4      Sibelius    10.0  unknown           unknown\n",
       "5         Haydn     NaN      NaN            Röhrau"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "composers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "r3qwJzlodZZY"
   },
   "source": [
    "Here we see that we have both ```NaN``` values and other \"bad\" values in the form of ```uknown``` text. This is a problem because now the columns containing that text are not \"number-columns\" anymore and for example we can't do operations:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 368
    },
    "executionInfo": {
     "elapsed": 805,
     "status": "error",
     "timestamp": 1614505368437,
     "user": {
      "displayName": "Guillaume Witz",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GgT0K2JVYzEsjzsS5nhkUVjUrSIJ5jHzXnBoYrmVf8=s64",
      "userId": "16033870147214403532"
     },
     "user_tz": -60
    },
    "id": "e8THU0978j_V",
    "outputId": "a14cd42f-ba74-494f-ffad-0e0e240f7a45",
    "tags": []
   },
   "outputs": [
    {
     "ename": "TypeError",
     "evalue": "loop of ufunc does not support argument 0 of type int which has no callable log method",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mAttributeError\u001b[0m                            Traceback (most recent call last)",
      "\u001b[0;31mAttributeError\u001b[0m: 'int' object has no attribute 'log'",
      "\nThe above exception was the direct cause of the following exception:\n",
      "\u001b[0;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
      "Cell \u001b[0;32mIn[11], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mnp\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mlog\u001b[49m\u001b[43m(\u001b[49m\u001b[43mcomposers\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mdeath\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m)\u001b[49m\n",
      "File \u001b[0;32m~/mambaforge/envs/DAVPy2023/lib/python3.10/site-packages/pandas/core/generic.py:2113\u001b[0m, in \u001b[0;36mNDFrame.__array_ufunc__\u001b[0;34m(self, ufunc, method, *inputs, **kwargs)\u001b[0m\n\u001b[1;32m   2109\u001b[0m \u001b[38;5;129m@final\u001b[39m\n\u001b[1;32m   2110\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m__array_ufunc__\u001b[39m(\n\u001b[1;32m   2111\u001b[0m     \u001b[38;5;28mself\u001b[39m, ufunc: np\u001b[38;5;241m.\u001b[39mufunc, method: \u001b[38;5;28mstr\u001b[39m, \u001b[38;5;241m*\u001b[39minputs: Any, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs: Any\n\u001b[1;32m   2112\u001b[0m ):\n\u001b[0;32m-> 2113\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43marraylike\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43marray_ufunc\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mufunc\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mmethod\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43minputs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n",
      "File \u001b[0;32m~/mambaforge/envs/DAVPy2023/lib/python3.10/site-packages/pandas/core/arraylike.py:402\u001b[0m, in \u001b[0;36marray_ufunc\u001b[0;34m(self, ufunc, method, *inputs, **kwargs)\u001b[0m\n\u001b[1;32m    399\u001b[0m \u001b[38;5;28;01melif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mndim \u001b[38;5;241m==\u001b[39m \u001b[38;5;241m1\u001b[39m:\n\u001b[1;32m    400\u001b[0m     \u001b[38;5;66;03m# ufunc(series, ...)\u001b[39;00m\n\u001b[1;32m    401\u001b[0m     inputs \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mtuple\u001b[39m(extract_array(x, extract_numpy\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m) \u001b[38;5;28;01mfor\u001b[39;00m x \u001b[38;5;129;01min\u001b[39;00m inputs)\n\u001b[0;32m--> 402\u001b[0m     result \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mgetattr\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mufunc\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mmethod\u001b[49m\u001b[43m)\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43minputs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    403\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m    404\u001b[0m     \u001b[38;5;66;03m# ufunc(dataframe)\u001b[39;00m\n\u001b[1;32m    405\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m method \u001b[38;5;241m==\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m__call__\u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m kwargs:\n\u001b[1;32m    406\u001b[0m         \u001b[38;5;66;03m# for np.<ufunc>(..) calls\u001b[39;00m\n\u001b[1;32m    407\u001b[0m         \u001b[38;5;66;03m# kwargs cannot necessarily be handled block-by-block, so only\u001b[39;00m\n\u001b[1;32m    408\u001b[0m         \u001b[38;5;66;03m# take this path if there are no kwargs\u001b[39;00m\n",
      "\u001b[0;31mTypeError\u001b[0m: loop of ufunc does not support argument 0 of type int which has no callable log method"
     ]
    }
   ],
   "source": [
    "np.log(composers['death'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "B_egq9LI8t3b"
   },
   "source": [
    "We therefore have first to clean-up the table. One solution is to replace all bade values. For that we can use ```.replace```:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 235
    },
    "executionInfo": {
     "elapsed": 681,
     "status": "ok",
     "timestamp": 1614505369387,
     "user": {
      "displayName": "Guillaume Witz",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GgT0K2JVYzEsjzsS5nhkUVjUrSIJ5jHzXnBoYrmVf8=s64",
      "userId": "16033870147214403532"
     },
     "user_tz": -60
    },
    "id": "H7ohQK6U86Rv",
    "outputId": "03352ec2-57bd-415b-ce32-be54b75e2d54",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>composer</th>\n",
       "      <th>birth</th>\n",
       "      <th>death</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Mahler</td>\n",
       "      <td>1860.0</td>\n",
       "      <td>1911.0</td>\n",
       "      <td>Kaliste</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Beethoven</td>\n",
       "      <td>1770.0</td>\n",
       "      <td>1827.0</td>\n",
       "      <td>Bonn</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Puccini</td>\n",
       "      <td>1858.0</td>\n",
       "      <td>1924.0</td>\n",
       "      <td>Lucques</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Shostakovich</td>\n",
       "      <td>1906.0</td>\n",
       "      <td>1975.0</td>\n",
       "      <td>Saint-Petersburg</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Sibelius</td>\n",
       "      <td>10.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Haydn</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Röhrau</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       composer   birth   death              city\n",
       "0        Mahler  1860.0  1911.0           Kaliste\n",
       "1     Beethoven  1770.0  1827.0              Bonn\n",
       "2       Puccini  1858.0  1924.0           Lucques\n",
       "3  Shostakovich  1906.0  1975.0  Saint-Petersburg\n",
       "4      Sibelius    10.0     NaN               NaN\n",
       "5         Haydn     NaN     NaN            Röhrau"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "composers_clean = composers.replace('unknown', np.nan)\n",
    "composers_clean"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 447,
     "status": "ok",
     "timestamp": 1614505370487,
     "user": {
      "displayName": "Guillaume Witz",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GgT0K2JVYzEsjzsS5nhkUVjUrSIJ5jHzXnBoYrmVf8=s64",
      "userId": "16033870147214403532"
     },
     "user_tz": -60
    },
    "id": "6pkqrtZk9ACD",
    "outputId": "cafe8ae3-4523-4a4a-ccdc-4944ad33768e",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    7.555382\n",
       "1    7.510431\n",
       "2    7.562162\n",
       "3    7.588324\n",
       "4         NaN\n",
       "5         NaN\n",
       "Name: death, dtype: float64"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.log(composers_clean['death'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "MIhMecR79Jo0"
   },
   "source": [
    "Alternatively we can fix this problem at import time as well because we might have a lsit of all \"bad entries\". For that we can use the option"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 235
    },
    "executionInfo": {
     "elapsed": 653,
     "status": "ok",
     "timestamp": 1614505372062,
     "user": {
      "displayName": "Guillaume Witz",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GgT0K2JVYzEsjzsS5nhkUVjUrSIJ5jHzXnBoYrmVf8=s64",
      "userId": "16033870147214403532"
     },
     "user_tz": -60
    },
    "id": "oYfmjghQ9VRh",
    "outputId": "f7a66442-7a7f-40a9-fa04-a5acfee8264d",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>composer</th>\n",
       "      <th>birth</th>\n",
       "      <th>death</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Mahler</td>\n",
       "      <td>1860.0</td>\n",
       "      <td>1911.0</td>\n",
       "      <td>Kaliste</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Beethoven</td>\n",
       "      <td>1770.0</td>\n",
       "      <td>1827.0</td>\n",
       "      <td>Bonn</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Puccini</td>\n",
       "      <td>1858.0</td>\n",
       "      <td>1924.0</td>\n",
       "      <td>Lucques</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Shostakovich</td>\n",
       "      <td>1906.0</td>\n",
       "      <td>1975.0</td>\n",
       "      <td>Saint-Petersburg</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Sibelius</td>\n",
       "      <td>10.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Haydn</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Röhrau</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       composer   birth   death              city\n",
       "0        Mahler  1860.0  1911.0           Kaliste\n",
       "1     Beethoven  1770.0  1827.0              Bonn\n",
       "2       Puccini  1858.0  1924.0           Lucques\n",
       "3  Shostakovich  1906.0  1975.0  Saint-Petersburg\n",
       "4      Sibelius    10.0     NaN               NaN\n",
       "5         Haydn     NaN     NaN            Röhrau"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.read_excel(composer_url, na_values=['unknown'], sheet_name=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "0Hf1hMjx9qk7"
   },
   "source": [
    "## Dealing with time\n",
    "\n",
    "Pandas offers many possibilities to help dealing with data containing time, e.g. to calculate how many days have passed between two dates."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "id": "yfidLUDaJ5jO",
    "tags": []
   },
   "outputs": [],
   "source": [
    "composers = pd.read_excel(composer_url)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 173
    },
    "executionInfo": {
     "elapsed": 686,
     "status": "ok",
     "timestamp": 1614505634547,
     "user": {
      "displayName": "Guillaume Witz",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GgT0K2JVYzEsjzsS5nhkUVjUrSIJ5jHzXnBoYrmVf8=s64",
      "userId": "16033870147214403532"
     },
     "user_tz": -60
    },
    "id": "97miVvL5J9VD",
    "outputId": "073c4d19-9937-4ab6-d576-68a11f6186bb",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>composer</th>\n",
       "      <th>birth</th>\n",
       "      <th>death</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Mahler</td>\n",
       "      <td>1860</td>\n",
       "      <td>1911</td>\n",
       "      <td>Kaliste</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Beethoven</td>\n",
       "      <td>1770</td>\n",
       "      <td>1827</td>\n",
       "      <td>Bonn</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Puccini</td>\n",
       "      <td>1858</td>\n",
       "      <td>1924</td>\n",
       "      <td>Lucques</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Shostakovich</td>\n",
       "      <td>1906</td>\n",
       "      <td>1975</td>\n",
       "      <td>Saint-Petersburg</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       composer  birth  death              city\n",
       "0        Mahler   1860   1911           Kaliste\n",
       "1     Beethoven   1770   1827              Bonn\n",
       "2       Puccini   1858   1924           Lucques\n",
       "3  Shostakovich   1906   1975  Saint-Petersburg"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "composers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "5e0RLSNkBn5X"
   },
   "source": [
    "We transform now the ```birth``` and ```death``` columns into ```datetime``` objects using ```pd.to_datetime``` and by specifying a format, here just a year ```%Y```:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 173
    },
    "executionInfo": {
     "elapsed": 476,
     "status": "ok",
     "timestamp": 1614505634913,
     "user": {
      "displayName": "Guillaume Witz",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GgT0K2JVYzEsjzsS5nhkUVjUrSIJ5jHzXnBoYrmVf8=s64",
      "userId": "16033870147214403532"
     },
     "user_tz": -60
    },
    "id": "ambmnTlRI8KR",
    "outputId": "2f574dcd-5160-498e-d8b3-61b89078da6c",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>composer</th>\n",
       "      <th>birth</th>\n",
       "      <th>death</th>\n",
       "      <th>city</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Mahler</td>\n",
       "      <td>1860-01-01</td>\n",
       "      <td>1911-01-01</td>\n",
       "      <td>Kaliste</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Beethoven</td>\n",
       "      <td>1770-01-01</td>\n",
       "      <td>1827-01-01</td>\n",
       "      <td>Bonn</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Puccini</td>\n",
       "      <td>1858-01-01</td>\n",
       "      <td>1924-01-01</td>\n",
       "      <td>Lucques</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Shostakovich</td>\n",
       "      <td>1906-01-01</td>\n",
       "      <td>1975-01-01</td>\n",
       "      <td>Saint-Petersburg</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       composer      birth      death              city\n",
       "0        Mahler 1860-01-01 1911-01-01           Kaliste\n",
       "1     Beethoven 1770-01-01 1827-01-01              Bonn\n",
       "2       Puccini 1858-01-01 1924-01-01           Lucques\n",
       "3  Shostakovich 1906-01-01 1975-01-01  Saint-Petersburg"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "composers['birth'] = pd.to_datetime(composers['birth'], format = '%Y')\n",
    "composers['death'] = pd.to_datetime(composers['death'], format = '%Y')\n",
    "composers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "5XyQR8OmB9EJ"
   },
   "source": [
    "You can see that the year has been turned into an actual date (using 1st January as default). We create now an age column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 173
    },
    "executionInfo": {
     "elapsed": 785,
     "status": "ok",
     "timestamp": 1614505666064,
     "user": {
      "displayName": "Guillaume Witz",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GgT0K2JVYzEsjzsS5nhkUVjUrSIJ5jHzXnBoYrmVf8=s64",
      "userId": "16033870147214403532"
     },
     "user_tz": -60
    },
    "id": "xwgvIrdIJDH3",
    "outputId": "d6259546-67fa-4e95-c7bb-13f707957518",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>composer</th>\n",
       "      <th>birth</th>\n",
       "      <th>death</th>\n",
       "      <th>city</th>\n",
       "      <th>age</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Mahler</td>\n",
       "      <td>1860-01-01</td>\n",
       "      <td>1911-01-01</td>\n",
       "      <td>Kaliste</td>\n",
       "      <td>18627 days</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Beethoven</td>\n",
       "      <td>1770-01-01</td>\n",
       "      <td>1827-01-01</td>\n",
       "      <td>Bonn</td>\n",
       "      <td>20818 days</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Puccini</td>\n",
       "      <td>1858-01-01</td>\n",
       "      <td>1924-01-01</td>\n",
       "      <td>Lucques</td>\n",
       "      <td>24105 days</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Shostakovich</td>\n",
       "      <td>1906-01-01</td>\n",
       "      <td>1975-01-01</td>\n",
       "      <td>Saint-Petersburg</td>\n",
       "      <td>25202 days</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       composer      birth      death              city        age\n",
       "0        Mahler 1860-01-01 1911-01-01           Kaliste 18627 days\n",
       "1     Beethoven 1770-01-01 1827-01-01              Bonn 20818 days\n",
       "2       Puccini 1858-01-01 1924-01-01           Lucques 24105 days\n",
       "3  Shostakovich 1906-01-01 1975-01-01  Saint-Petersburg 25202 days"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "composers['age'] = (composers.death - composers.birth)\n",
    "composers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "XC4LJx9lJy0R"
   },
   "source": [
    "Since the columns are datetime objects, Pandas know that it should actually compute difference in dates and by defaults gives days. Here the computation is trivial, but if you deal with formats such as ```HH:MM:SS``` this is very useful."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "4JvfNZ5wCumv"
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "colab": {
   "authorship_tag": "ABX9TyODbHn5Qz1JFAEv8MvYp96r",
   "collapsed_sections": [],
   "name": "10-Real_world.ipynb",
   "provenance": [],
   "toc_visible": true
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}