Creating linked plots using Python's bokeh library

In this post, I am going to create interlinked, interactive scatter plots using the Bokeh library. Below is the description of the library from the homepage.

Bokeh is a Python interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of novel graphics in the style of D3.js, and to extend this capability with high-performance interactivity over very large or streaming datasets. Bokeh can help anyone who would like to quickly and easily create interactive plots, dashboards, and data applications.

I quite like its clean look and more than anything the interactive visualization capabilities. It also allows using javascript based web browser interactions without learning javascript. I have been picking on what it can do from its documentations and tutorials available on Bokeh NBViewer Gallery.

Load libraries

First, I am going to load the libraries I am going to use and run output_notebook function from the bokeh library. The function configures Bokeh plot objects to be displayed on the notebook.

import pandas as pd
from bokeh.io import output_notebook, output_file, show
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource
from bokeh.models import CategoricalColorMapper
from bokeh.models import Plot, Range1d, HoverTool
from bokeh.layouts import gridplot
from bokeh.palettes import Set2
output_notebook()

Load data

To enable interlinking between plots, a common ColumnDataSource needs to be used as the data source between plots. You can create one from a pandas DataFrame or a dictionary. I am going to use the diabetes dataset originally from here to demonstrate this. Below is a brief description of the dataset from the original source.

Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442 diabetes patients, as well as the response of interest, a quantitative measure of disease progression one year after baseline.

I am going to plot each of the 9 numeric features against the response variable on individual scatter plots. I will In the code block below, the dataset is loaded as a pandas DataFrame and a ColumnDataSource is defined using the DataFrame.

df = pd.read_table('../data/diabetes_tab.txt')
# assuming 1 is female and 2 is male
df['Gender'] = ['FEMALE' if x == 1 else 'MALE'
                for x in df.SEX.values]
df.rename(columns={'AGE': 'Age'}, inplace=True)
one_source = ColumnDataSource(df)
df.head()

	Age	SEX	BMI	BP	S1	S2	S3	S4	S5	S6	Y	Gender
0	59	2	32.1	101.0	157	93.2	38.0	4.0	4.8598	87	151	MALE
1	48	1	21.6	87.0	183	103.2	70.0	3.0	3.8918	69	75	FEMALE
2	72	2	30.5	93.0	156	93.6	41.0	4.0	4.6728	85	141	MALE
3	24	1	25.3	84.0	198	131.4	40.0	5.0	4.8903	89	206	FEMALE
4	50	1	23.0	101.0	192	125.4	52.0	4.0	4.2905	80	135	FEMALE

Create an interactive scatter plot

Next, I am going to create a single scatter plot with age and the response variable. I am going to add a few interaction effects including a hover effect showing the x, y values of each point.

Box select: Highlight data points selected in a rectangular box by dragging the mouse
Lasso select: Highlight data points selected in a lasso shape by dragging the mouse
Tap: Highlight selected data points by clicking the mouse
Wheel zoom: Zoom in and out of the plot using the mouse wheel zoom
Reset: Reset the plot to its default state

# define a color map for SEX variable
cmap = CategoricalColorMapper(
    factors=('FEMALE', 'MALE'),
    palette=Set2[3]
)
# define a function to enable reuse
def plot_diabetes(x, width=480, height=320,
                  legend=None, legend_location=None,
                  legend_orientation='vertical'):
    hover = HoverTool(
        tooltips=[('Index', '$index'),
                  (x, '$x'),
                  ('Progression', '$y'),
                  ('Gender', '@Gender')
                 ])
    tools = [hover, 'box_select', 'tap',
             'wheel_zoom', 'reset', 'help']
    plt = figure(width=width, height=height,
                 title=x +' vs. diabetes progression',
                 tools=tools)
    plt.circle(x, 'Y', alpha=0.8, source=one_source,
               fill_color={'field': 'Gender', 'transform': cmap},
               line_color={'field': 'Gender', 'transform': cmap},
               # highlight when selected
               selection_alpha=1,
               selection_fill_color={'field': 'Gender', 'transform': cmap},
               selection_line_color={'field': 'Gender', 'transform': cmap},
               # mute when not selected
               nonselection_alpha=0.2,
               nonselection_fill_color={'field': 'Gender', 'transform': cmap},
               nonselection_line_color=None,
               legend=legend)
    plt.xaxis.axis_label = x
    plt.xaxis.axis_label_text_font_style = 'normal'
    plt.yaxis.axis_label = 'Diabetes progression'
    plt.yaxis.axis_label_text_font_style = 'normal'
    if(legend):
        plt.legend.location = legend_location
        plt.legend.orientation = legend_orientation
        plt.legend.background_fill_alpha = 0.7
    return(plt)

p1 = plot_diabetes('Age', legend='Gender', legend_location='top_left',
                   legend_orientation='horizontal')
output_file('../html/01-bokeh-plot-example-plot-01.html')
show(p1)

You can now see an interactive scatter plot. A toolbar is placed beside the plot where you can switch on and off different tools we included. In particular, in this plot you can see the values for each data point when you hover over them. You can set the list of values you want to show by configuring tooltips with a list of (label, value) pairs in the HoverTool object.

You can refer to different variables in the source dataset by prefixing @. Fields starting with $ will are used for “special fields” such as the coordinates ~~and the color~~ ^{_{apparently the color values are pulled from the data source, not the figure’s fill_color}} as used above.

Create multiple linked plots

Now, I am going to create multiple plots and place them in a single grid using bokeh library’s gridplot. The plots are linked by a single data source. Selecting data points in one plot will highlight the same data points in all.

plots = [plot_diabetes(x, 240, 180)
         for x in df.columns
         if x not in ['SEX', 'Gender', 'Y']]

# create an empty plot with only the title
gtitle = figure(width=240, height=80, title='Linked scatter plots')
gtitle.circle(0, 0, fill_color=None, line_color=None)
gtitle.title.text_font_size = '18px'
gtitle.border_fill_color = None
gtitle.grid.visible = False
gtitle.axis.visible = False
gtitle.outline_line_color = None

# create an empty plot with only the legend
glegend = figure(width=240, height=80, title=None)
glegend.circle(0,0, fill_color=Set2[3][0], line_color=Set2[3][0], legend='FEMALE')
glegend.circle(0,0, fill_color=Set2[3][1], line_color=Set2[3][1], legend='MALE')
glegend.border_fill_color = None
glegend.grid.visible = False
glegend.axis.visible = False
glegend.outline_line_color = None
glegend.legend.border_line_color = None
glegend.legend.location = 'center'

output_file('../html/01-bokeh-plot-example-plot-02.html')
show(gridplot([gtitle, None, glegend] + plots, ncols=3))

You can now see nine different plots linked with a single data source. When you select any data points in one plot the same data points are highlighted across all while the rest are ‘muted’.

This could be useful when inspecting data with multiple dimensions. For example, when I clicked on the person with the highest S1 measurement, I can she that he also had the highest measurements of S2 and S4. Besides, it is just fun playing with these plots. I am looking forward to going through more of the library examples and tutorials.

Ante interdum

Get in touch

Creating linked plots using Python's bokeh library

Load libraries

Load data

Create an interactive scatter plot

Create multiple linked plots

Tags

Share

Related posts