7. Creating a pipeline
Contents
7. Creating a pipeline¶
In the previous chapter we created a procedure to isolate nuclei from images. Very often the goal is to repeat the exercise on multiple images in order to compare different conditions. Here we are going to see how to package our pipeline into a function, how to apply it to multiple images and how to plot the result for comparison.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import skimage
import skimage.io
import skimage.morphology
from scipy import ndimage as ndi
Creating an image processing function¶
We have seen previously how to define a function. It needs the def
descriptor, a name and inputs. What we are doing here is just copying all the lines that we used for our pipeline in the last chapter into this function. At the moment the only input is the path to a given file to analyze. We also try to document what the function does, what its input/output are etc.
def my_pipeline(image_path):
"""
This function extracts information about nuclei found in the third channel of an image
loaded at the given path
Parameters
----------
image_path: str
path to image
Returns
-------
my_table: pandas dataframe
dataframe with label, area, mean_intensity and extent information for each nucleus
"""
image_stack = skimage.io.imread(image_path)
image_nuclei = image_stack[:,:,2]#blue channel in RGB
image_signal = image_stack[:,:,1]#green channel in RGB
# filter image
image_nuclei = skimage.filters.median(image_nuclei, skimage.morphology.disk(5))
# create mask and clean-up
mask_nuclei = image_nuclei > skimage.filters.threshold_otsu(image_nuclei)
mask_nuclei = skimage.morphology.binary_closing(mask_nuclei, footprint=skimage.morphology.disk(5))
mask_nuclei = ndi.binary_fill_holes(mask_nuclei, skimage.morphology.disk(5))
# label image
my_labels = skimage.morphology.label(mask_nuclei)
# measure
my_regions = skimage.measure.regionprops_table(my_labels, image_signal, properties=('label','area', 'mean_intensity'))
plt.subplots(figsize=(10,10))
plt.imshow(mask_nuclei)
plt.show()
my_table = pd.DataFrame(my_regions)
return my_table
We can now test our function with a single image:
table = my_pipeline('https://github.com/guiwitz/PyImageCourse_beginner/raw/master/images/46658_784_B12_1.tif')
table
label | area | mean_intensity | |
---|---|---|---|
0 | 1 | 4221 | 79.265103 |
1 | 2 | 8386 | 65.997734 |
2 | 3 | 18258 | 70.261091 |
3 | 4 | 4043 | 63.026713 |
4 | 5 | 73 | 47.438356 |
5 | 6 | 49056 | 53.714510 |
6 | 7 | 45671 | 53.807099 |
7 | 8 | 20537 | 70.453912 |
8 | 9 | 46853 | 66.137729 |
9 | 10 | 43277 | 41.412552 |
10 | 11 | 40768 | 58.675039 |
11 | 12 | 15090 | 44.198144 |
12 | 13 | 35404 | 74.771551 |
13 | 14 | 47642 | 41.701902 |
14 | 15 | 54978 | 35.904107 |
15 | 16 | 26774 | 53.077874 |
16 | 17 | 4 | 112.000000 |
17 | 18 | 812 | 102.986453 |
18 | 19 | 54298 | 56.110207 |
19 | 20 | 29638 | 66.283791 |
Adjusting the function behavior¶
We have recovered the same result as previously and can analyze again our data as we did before. The mask is also plotted by default when calling the function. This is helpful to test the function and verify that nothing went dramatically wrong but we probably don’t want to see this image if we analyze hundreds of images. What we can do is leave the user the choice to see it or not. Let’s adjust our function to do that: we add an additional optional parameter do_plotting
which is simply a boolean (True
/False
). If True, plotting is happening, if False it’s not.
def my_pipeline(image_path, do_plotting=False):
"""
This function extracts information about nuclei found in the third channel of an image
loaded at the given path
Parameters
----------
image_path: str
path to image
do_plotting: bool
show segmentation or not
Returns
-------
my_table: pandas dataframe
dataframe with label, area, mean_intensity and extent information for each nucleus
"""
image_stack = skimage.io.imread(image_path)
image_nuclei = image_stack[:,:,2]#blue channel in RGB
image_signal = image_stack[:,:,1]#green channel in RGB
# filter image
image_nuclei = skimage.filters.median(image_nuclei, skimage.morphology.disk(5))
# create mask and clean-up
mask_nuclei = image_nuclei > skimage.filters.threshold_otsu(image_nuclei)
mask_nuclei = skimage.morphology.binary_closing(mask_nuclei, selem=skimage.morphology.disk(5))
mask_nuclei = ndi.binary_fill_holes(mask_nuclei, skimage.morphology.disk(5))
# label image
my_labels = skimage.morphology.label(mask_nuclei)
# measure
my_regions = skimage.measure.regionprops_table(my_labels, image_signal, properties=('label','area', 'mean_intensity'))
if do_plotting:
plt.subplots(figsize=(10,10))
plt.imshow(mask_nuclei)
plt.show()
my_table = pd.DataFrame(my_regions)
return my_table
You see that we added an ìf
statment in our code. This is the place where we control for plotting or not. To learn more about it, visit the next notebook!
table = my_pipeline('https://github.com/guiwitz/PyImageCourse_beginner/raw/master/images/46658_784_B12_1.tif')
Now indeed our function doesn’t produce any output if we don’t explicitly ask for it.
Analyzing multiple images¶
The point of creating an image processing pipeline was to be able to easily analyze multiple images. We can do that now by using a simple for loop. First we create a list of files we want to analyze and that we can see here and here.
files_to_analyze = [
'https://github.com/guiwitz/PyImageCourse_beginner/raw/master/images/46658_784_B12_1.tif',
'https://github.com/guiwitz/PyImageCourse_beginner/raw/master/images/27897_273_C8_2.tif'
]
Now we create a for
loop that will go through this list of files and analyze each of them with our function. To learn more about for
loops visit the next chapter on Logical Flow. Before the loop, we create an empty list that is going to be filled with the output of the function. We also add to the output DataFrame a column with the filename that we can recover by splitting the address files_to_analyze[0].split('/')[-1]
. Note that the split
function is a method of string objects that allows to split string where we find a certain character (here /):
all_tables = []
for file in files_to_analyze:
#use the function
new_table = my_pipeline(file)
#add image index
new_table['filename'] = file.split('/')[-1]
#append the result to the list
all_tables.append(new_table)
Finally we can concatenate our results into a single table. For this we use a Pandas function called concat
which just glues two DataFrames together:
complete_info = pd.concat(all_tables)
complete_info
label | area | mean_intensity | filename | |
---|---|---|---|---|
0 | 1 | 4221 | 79.265103 | 46658_784_B12_1.tif |
1 | 2 | 8386 | 65.997734 | 46658_784_B12_1.tif |
2 | 3 | 18258 | 70.261091 | 46658_784_B12_1.tif |
3 | 4 | 4043 | 63.026713 | 46658_784_B12_1.tif |
4 | 5 | 73 | 47.438356 | 46658_784_B12_1.tif |
5 | 6 | 49056 | 53.714510 | 46658_784_B12_1.tif |
6 | 7 | 45671 | 53.807099 | 46658_784_B12_1.tif |
7 | 8 | 20537 | 70.453912 | 46658_784_B12_1.tif |
8 | 9 | 46853 | 66.137729 | 46658_784_B12_1.tif |
9 | 10 | 43277 | 41.412552 | 46658_784_B12_1.tif |
10 | 11 | 40768 | 58.675039 | 46658_784_B12_1.tif |
11 | 12 | 15090 | 44.198144 | 46658_784_B12_1.tif |
12 | 13 | 35404 | 74.771551 | 46658_784_B12_1.tif |
13 | 14 | 47642 | 41.701902 | 46658_784_B12_1.tif |
14 | 15 | 54978 | 35.904107 | 46658_784_B12_1.tif |
15 | 16 | 26774 | 53.077874 | 46658_784_B12_1.tif |
16 | 17 | 4 | 112.000000 | 46658_784_B12_1.tif |
17 | 18 | 812 | 102.986453 | 46658_784_B12_1.tif |
18 | 19 | 54298 | 56.110207 | 46658_784_B12_1.tif |
19 | 20 | 29638 | 66.283791 | 46658_784_B12_1.tif |
0 | 1 | 31346 | 91.784279 | 27897_273_C8_2.tif |
1 | 2 | 1022 | 118.964775 | 27897_273_C8_2.tif |
2 | 3 | 32149 | 108.144048 | 27897_273_C8_2.tif |
3 | 4 | 21866 | 87.044407 | 27897_273_C8_2.tif |
4 | 5 | 25406 | 82.585177 | 27897_273_C8_2.tif |
5 | 6 | 26270 | 93.764675 | 27897_273_C8_2.tif |
6 | 7 | 19557 | 101.465102 | 27897_273_C8_2.tif |
7 | 8 | 2 | 20.500000 | 27897_273_C8_2.tif |
8 | 9 | 16367 | 70.146331 | 27897_273_C8_2.tif |
9 | 10 | 28357 | 77.683218 | 27897_273_C8_2.tif |
10 | 11 | 19552 | 73.210158 | 27897_273_C8_2.tif |
11 | 12 | 19288 | 75.883036 | 27897_273_C8_2.tif |
12 | 13 | 58364 | 87.722003 | 27897_273_C8_2.tif |
13 | 14 | 22228 | 105.550837 | 27897_273_C8_2.tif |
14 | 15 | 49020 | 73.625255 | 27897_273_C8_2.tif |
15 | 16 | 2009 | 68.762071 | 27897_273_C8_2.tif |
We can then check if there’s any difference in intensity in the nucleus between the two images:
plt.hist(all_tables[0]['mean_intensity'], bins = np.arange(0,150,10), color = 'red')
plt.hist(all_tables[1]['mean_intensity'], bins = np.arange(0,150,10), alpha = 0.5, color = 'blue')
plt.show()
Of course our actual goal is not just to measure the intensity within the nucleus but to compare the intensity inside and on the edge of the nuclei. This is demonstrated in the Demo notebooks.