Repeating and controlling#

In the previous chapter, we have recovered lists of files that we might potentially want to import for analysis. Of course we could manually import each file using file[0], file[1] etc. but that’s not efficient and can be done with loops. Also we may want to analyze files that have specific properties, in which case we’ll have to impose conditions. The syntax of loops and conditions is very similar and we treat them here together.

For loops#

Let’s create again a list of files. Here we want to point to the notebook folder, so we can use the Path object to get the current working directory:

from pathlib import Path

current_folder = Path.cwd()
current_folder
PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy')

And we create again a list of contents:

files_in_folder = list(current_folder.iterdir())
files_in_folder
[PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/22-Seaborn_distributions_relations.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/12-Minimal_plotting.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/30-AI_assistants.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/19-Matplotlib_content.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/environment.yml'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/26-Alternatives.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/14-Back_to_Pandas.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/09-Numpy_arrays.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/images'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/16-DataFrame_indexing.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/Solutions'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/21-Seaborn_concept.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/17-Pandas_combine.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/29-Image_processing.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/05-Data_structures.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/02-Notebooks.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/06-File_handling.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/Readme.md'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/_toc.yml'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/27-scipy_statsmodels.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/13-Images_as_arrays.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/.gitignore'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/24-Matplotlib_statistics.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/23-Seaborn_regression.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/_config.yml'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/08-Classes.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/.github'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/book'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/01-Introduction.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/newfolder'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/11-Numpy_indexing.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/10-Numpy_maths.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/28-scikit-learn.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/03-Variables.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/18-Real_world.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/25-Matplotlib_annotations.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/.git'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/15-Operate_on_DataFrames.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/04-Functions_packages.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/07-Flow_control.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/plots'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/20-Matplotlib_non_data_elements.ipynb')]

Let’s say we now want to find the extension of each of these files: we need to traverse the full list and apply the suffix function to each element:

files_in_folder[0].suffix
'.ipynb'

We can now write:

for f in files_in_folder:
    mysuffix = f.suffix

As you can see, for loops are written in a relatively “natural” way in Python, stating that “for each element f in files_in_folder execute the following lines”. Note that:

  1. f just stands for the currently selected element from files_in_folder.

  2. The for loop starts with the for statement

  3. The list used for iteration is specified, here files_in_folder

  4. Like function definition, the for loop definition ends with :

  5. The content of the loop is indented

You can also note that when we execute the cell nothing happens. This is because no graphical output is generated fro for loops. If we want to see the actual value we have to use the print() function:

for f in files_in_folder:
    mysuffix = f.suffix
print(mysuffix)
.ipynb

Only the last value is printed because we put the print() function outside the loop. If we want to see each value we have to indent the print() call so that it is included in the loop:

for f in files_in_folder:
    mysuffix = f.suffix
    print(mysuffix)
.ipynb
.ipynb
.ipynb
.ipynb
.yml
.ipynb
.ipynb
.ipynb

.ipynb

.ipynb
.ipynb
.ipynb
.ipynb
.ipynb
.ipynb
.md
.yml
.ipynb
.ipynb

.ipynb
.ipynb
.yml
.ipynb


.ipynb

.ipynb
.ipynb
.ipynb
.ipynb
.ipynb
.ipynb

.ipynb
.ipynb
.ipynb

.ipynb

Looping using a range#

Often we don’t want to loop over the content of a list but just want to do some operation N times or for indexes from 0 to N. To do that, we can use the built-in range() function that just does this: it provides numbers within a certainrange. The function doesn’t really produce a list per se but can be used as if it were one. For example:

for x in range(8):
    print(x)
0
1
2
3
4
5
6
7

Note that as always the first index is not 0 but 8. Of course we could use these indexes to access specific parts of a list. Coming back to the previous example, we might want to calculate the square only of the three first numbers:

mylist = [1, 2, 3]

for i in range(3):
    result = mylist[i] ** 2
    print(result)
1
4
9

With mylist[x] we simply use the numbers generated by range() as indexes of our list.

Using conditions#

if statement#

Now that we catch the extension of each file, we can run our workflow only if the extension is really ipynb! For this we need another very common statement in programming languages, which is the if statement. Let’s do a simple example first:

a = 3
if a > 4:
    print('Large')
if a < 4:
    print('Small')
Small

We see that the structure of the if statement is very similar to that of functions and for loops:

  • a condition is stated and ends with :

  • the block executed only if the statement is True is indented

In some cases, we want to execute a different block code when the if statement is False. For that we need to use the else statement which has the same structure:

a = 10

if a < 4:
    print('Small')
else:
    print('Large')
Large

You can even add multiple sub-cases with elif:

a = 10

if a < 4:
    print('Small')
elif a < 20:
    print('Intermediate')
else:
    print('Large')
Intermediate

Back to files#

So now we want to add an if statement in our routine, that will check the file format. Let’s see if we can come up with a check e.g.:

files_in_folder[2]
PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/30-AI_assistants.ipynb')
files_in_folder[2].suffix == '.ipynb'
True
files_in_folder[4]
PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/environment.yml')
files_in_folder[4].suffix == '.ipynb'
False

So we can compare our suffix to the string .ipynb and that should work:

for f in files_in_folder:
    
    print(f)
    if f.suffix == '.ipynb':
        print('Is notebook file')
    else:
        print('Is NOT notebook file')
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/22-Seaborn_distributions_relations.ipynb
Is notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/12-Minimal_plotting.ipynb
Is notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/30-AI_assistants.ipynb
Is notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/19-Matplotlib_content.ipynb
Is notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/environment.yml
Is NOT notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/26-Alternatives.ipynb
Is notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/14-Back_to_Pandas.ipynb
Is notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/09-Numpy_arrays.ipynb
Is notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/images
Is NOT notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/16-DataFrame_indexing.ipynb
Is notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/Solutions
Is NOT notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/21-Seaborn_concept.ipynb
Is notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/17-Pandas_combine.ipynb
Is notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/29-Image_processing.ipynb
Is notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/05-Data_structures.ipynb
Is notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/02-Notebooks.ipynb
Is notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/06-File_handling.ipynb
Is notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/Readme.md
Is NOT notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/_toc.yml
Is NOT notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/27-scipy_statsmodels.ipynb
Is notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/13-Images_as_arrays.ipynb
Is notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/.gitignore
Is NOT notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/24-Matplotlib_statistics.ipynb
Is notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/23-Seaborn_regression.ipynb
Is notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/_config.yml
Is NOT notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/08-Classes.ipynb
Is notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/.github
Is NOT notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/book
Is NOT notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/01-Introduction.ipynb
Is notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/newfolder
Is NOT notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/11-Numpy_indexing.ipynb
Is notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/10-Numpy_maths.ipynb
Is notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/28-scikit-learn.ipynb
Is notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/03-Variables.ipynb
Is notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/18-Real_world.ipynb
Is notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/25-Matplotlib_annotations.ipynb
Is notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/.git
Is NOT notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/15-Operate_on_DataFrames.ipynb
Is notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/04-Functions_packages.ipynb
Is notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/07-Flow_control.ipynb
Is notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/plots
Is NOT notebook file
/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/20-Matplotlib_non_data_elements.ipynb
Is notebook file

Complete routine#

So now we can finally go through all files, check the extension and execute the workflow only if the file is csv file:

keep_files = []

for f in files_in_folder:
    
    if f.suffix == '.ipynb':
        
        keep_files.append(f)
keep_files
[PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/22-Seaborn_distributions_relations.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/12-Minimal_plotting.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/30-AI_assistants.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/19-Matplotlib_content.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/26-Alternatives.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/14-Back_to_Pandas.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/09-Numpy_arrays.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/16-DataFrame_indexing.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/21-Seaborn_concept.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/17-Pandas_combine.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/29-Image_processing.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/05-Data_structures.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/02-Notebooks.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/06-File_handling.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/27-scipy_statsmodels.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/13-Images_as_arrays.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/24-Matplotlib_statistics.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/23-Seaborn_regression.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/08-Classes.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/01-Introduction.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/11-Numpy_indexing.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/10-Numpy_maths.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/28-scikit-learn.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/03-Variables.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/18-Real_world.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/25-Matplotlib_annotations.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/15-Operate_on_DataFrames.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/04-Functions_packages.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/07-Flow_control.ipynb'),
 PosixPath('/Users/gw18g940/GoogleDrive/DSL/Trainings/DAVPy/20-Matplotlib_non_data_elements.ipynb')]

Exercise#

Here’s a list of numbers. Create two new lists, one containing numbers larger than 50 the other smaller than 50, using a for loop and if statements.

exercise_list = [53, 2, 9, 21, 35, 97, 46, 101, 43]