6. Region properties

Once you have segmented an image you usually want to gather information on the objects that you “discovered”. Instead of painstakingly do this manually, skimage offers a simplified way to do this with its regionprops_table tool.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import skimage
import skimage.io
import skimage.morphology
import scipy.ndimage as ndi

Let’s first create a mask of the nuclei and clean it up with morphological operations:

#image_stack = skimage.io.imread('images/46658_784_B12_1.tif')
image_stack = skimage.io.imread('https://github.com/guiwitz/PyImageCourse_beginner/raw/master/images/46658_784_B12_1.tif')

image_nuclei = image_stack[:,:,2]#blue channel in RGB
image_signal = image_stack[:,:,1]#green channel in RGB

# filter image
image_nuclei = skimage.filters.median(image_nuclei, skimage.morphology.disk(5))

# create mask and clean-up
mask_nuclei = image_nuclei > skimage.filters.threshold_otsu(image_nuclei)
mask_nuclei = skimage.morphology.binary_closing(mask_nuclei, footprint=skimage.morphology.disk(5))
mask_nuclei = ndi.binary_fill_holes(mask_nuclei, skimage.morphology.disk(5))
plt.subplots(figsize=(10,10))
plt.imshow(mask_nuclei, cmap = 'gray');
_images/06-Regions_5_0.png

Labelling

In order to measure objects in the image separately, we first need to label them individually. For that we can just use the skimage.morphology.label() function which looks for independent groups of white pixels and assigns them integer numbers:

my_labels = skimage.morphology.label(mask_nuclei)

The label map shows that numbers are assigned from top to bottom in the image:

plt.subplots(figsize=(10,10))
plt.imshow(my_labels);
_images/06-Regions_10_0.png

Region properties

Now that we have each region labeled with a different number we can use the skimage.measure.regionprops_table() function, which takes such as label map and analyzes the geometric properties of each region. We have to specifiy which properties we want to use. For example the area and labels:

my_regions = skimage.measure.regionprops_table(my_labels, properties=('label','area'))

The output is a dictionary of all properties that we asked to get out:

my_regions
{'label': array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
        18, 19, 20]),
 'area': array([ 4221,  8386, 18258,  4043,    73, 49056, 45671, 20537, 46853,
        43277, 40768, 15090, 35404, 47642, 54978, 26774,     4,   812,
        54298, 29638])}

Dictionaries

Until now, in terms of data structures, we have briefly seen lists mylist = [5, 4, 2] and Numpy arrays via the images. However Python offers additional types of data structures and dictionaries are one of them. As you can see in the output above, they are define with curly parentheses {} and contain pairs of elements: keys like label and area and a content for each key, here two Numpy arrays. To better understand let’s just create a simple one:

my_dict = {'fruit name': 'apple', 'weigth': 50, 'types': ['golden', 'gala', 'breaburn']}
my_dict
{'fruit name': 'apple', 'weigth': 50, 'types': ['golden', 'gala', 'breaburn']}

As you can see these dictionaries can contain all types of variables: strings, numbers, lists etc. They are for this reason ideal to hold information of various types and useful to describe entities thanks to the dictionary keys. Each entry in the dictionary can then be recovered via its key:

my_dict['weigth']
50

Recovering image intensity information

In what we did above, we only recovered information about our mask. However often we want to obtain information on pixel values of the original image. For example, “what is the average intensity of each nucleus?”

Luckily regionprops_table allows us to pass as additional argument intensity_image the image we want to use to quantify intensity. Then we can for example add as property to extract the mean_intensity:

my_regions = skimage.measure.regionprops_table(
    my_labels,intensity_image=image_signal, properties=('label','area','mean_intensity'))

Now we have access to additional parameters. For example the mean_intensity in each nucleus:

my_regions
{'label': array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
        18, 19, 20]),
 'area': array([ 4221,  8386, 18258,  4043,    73, 49056, 45671, 20537, 46853,
        43277, 40768, 15090, 35404, 47642, 54978, 26774,     4,   812,
        54298, 29638]),
 'mean_intensity': array([ 79.26510306,  65.99773432,  70.26109103,  63.02671284,
         47.43835616,  53.71450995,  53.8070986 ,  70.45391245,
         66.13772864,  41.4125517 ,  58.67503925,  44.19814447,
         74.77155124,  41.70190168,  35.9041071 ,  53.07787406,
        112.        , 102.9864532 ,  56.11020664,  66.28379108])}

Now that we have this information, we can of course, plot it. For example we can produce a histogram of mean nuclei intensities:

plt.hist(my_regions['mean_intensity']);
_images/06-Regions_26_0.png

Filtering information

Obviously, we had some “bad segmentations”, i.e. some fragments remaining from the processing that are not actual nuclei. We can easily filter those out for example based on size using Numpy logical indexing:

my_regions['area']
array([ 4221,  8386, 18258,  4043,    73, 49056, 45671, 20537, 46853,
       43277, 40768, 15090, 35404, 47642, 54978, 26774,     4,   812,
       54298, 29638])

We create a logical array by setting a condition on one dictionary entry:

selected = my_regions['area'] > 100
selected
array([ True,  True,  True,  True, False,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True, False,  True,
        True,  True])

And then use it for logical indexing:

my_regions['mean_intensity'][selected]
array([ 79.26510306,  65.99773432,  70.26109103,  63.02671284,
        53.71450995,  53.8070986 ,  70.45391245,  66.13772864,
        41.4125517 ,  58.67503925,  44.19814447,  74.77155124,
        41.70190168,  35.9041071 ,  53.07787406, 102.9864532 ,
        56.11020664,  66.28379108])

One step further: Pandas

In the above example, if we wanted to use one measurement to filter all other measurements, we would have to repeat the selection multiple times. Ideally, we would put all the measured properties into a table, one column per property, and then do typical database operations to sub-select parts of the data. This can be done using yet another data structure called a DataFrame. These structures are provided by the Pandas library, the main data science library in Python. We here give a very brief insight into that library. First we import it:

import pandas as pd

To understand what a DataFrame is, let’s transform a plain Numpy array into a DataFrame:

np.random.seed(42)
my_array = np.random.randint(0,100, (3,5))
my_array
array([[51, 92, 14, 71, 60],
       [20, 82, 86, 74, 74],
       [87, 99, 23,  2, 21]])

We can simply turn this array into a DataFrame by using:

pd.DataFrame(my_array)
0 1 2 3 4
0 51 92 14 71 60
1 20 82 86 74 74
2 87 99 23 2 21

We see that the array content is still there but in addition now we have column and row names, currently just indices. We could however give specific column names:

my_df = pd.DataFrame(my_array, columns=['a', 'b', 'c', 'd', 'e'])
my_df
a b c d e
0 51 92 14 71 60
1 20 82 86 74 74
2 87 99 23 2 21

The difference with Numpy arrays is that DataFrames can contain different types of information (text, numbers etc.) and that they should really be seen as “organized” data. So for example we can recover a column of the table without resorting to the type of indexing we did before:

my_df['c']
0    14
1    86
2    23
Name: c, dtype: int64

Now how can such a structure help us do the sort of data filtering we have mentioned before? Just like with arrays, we can use some constraining tests. For example we can ask: are there data points in column c which are smaller than 50?

my_df['c'] < 50
0     True
1    False
2     True
Name: c, dtype: bool

Similarly to what happened with arrays, we get a new column that is boolean. And again similarly to what we did with arrays we can use it for logical indexing using square parentheses:

my_df[my_df['c'] < 50]
a b c d e
0 51 92 14 71 60
2 87 99 23 2 21

What happened here is that we kept only those entries in the table where the values in the c column were smaller than 50: we filtered all the properties (columns) in our table in one go!

Back to our problem

In our analysis we ended up with a dictionary:

my_regions
{'label': array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
        18, 19, 20]),
 'area': array([ 4221,  8386, 18258,  4043,    73, 49056, 45671, 20537, 46853,
        43277, 40768, 15090, 35404, 47642, 54978, 26774,     4,   812,
        54298, 29638]),
 'mean_intensity': array([ 79.26510306,  65.99773432,  70.26109103,  63.02671284,
         47.43835616,  53.71450995,  53.8070986 ,  70.45391245,
         66.13772864,  41.4125517 ,  58.67503925,  44.19814447,
         74.77155124,  41.70190168,  35.9041071 ,  53.07787406,
        112.        , 102.9864532 ,  56.11020664,  66.28379108])}

We can also easily turn this dictionary into a DataFrame:

my_regions_df = pd.DataFrame(my_regions)
my_regions_df
label area mean_intensity
0 1 4221 79.265103
1 2 8386 65.997734
2 3 18258 70.261091
3 4 4043 63.026713
4 5 73 47.438356
5 6 49056 53.714510
6 7 45671 53.807099
7 8 20537 70.453912
8 9 46853 66.137729
9 10 43277 41.412552
10 11 40768 58.675039
11 12 15090 44.198144
12 13 35404 74.771551
13 14 47642 41.701902
14 15 54978 35.904107
15 16 26774 53.077874
16 17 4 112.000000
17 18 812 102.986453
18 19 54298 56.110207
19 20 29638 66.283791

And now we can use what we have just learned: let’s remove tiny regions with an area smaller than 100:

my_regions_df[my_regions_df['area'] > 100]
label area mean_intensity
0 1 4221 79.265103
1 2 8386 65.997734
2 3 18258 70.261091
3 4 4043 63.026713
5 6 49056 53.714510
6 7 45671 53.807099
7 8 20537 70.453912
8 9 46853 66.137729
9 10 43277 41.412552
10 11 40768 58.675039
11 12 15090 44.198144
12 13 35404 74.771551
13 14 47642 41.701902
14 15 54978 35.904107
15 16 26774 53.077874
17 18 812 102.986453
18 19 54298 56.110207
19 20 29638 66.283791

We see that we indeed removed to elements in that table, indices 4 and 16.

Pandas is a very powerful library and in this course we can’t offer more than this brief insight into how it can be useful for data post-processing. To learn more you can also visit this other course: https://guiwitz.github.io/DAVPy/Readme.html