dotplot_util

Collection of utility functions for the pairwise dotplots

This module contains a collection of utility functions for the pairwise dotplots, as well as some more general functions that are used in the dotplot generation process. I wrote documentation for the functions here, but I didn’t write tests for them. I’m not sure this page is very valuable to end users.


source

highlight_cluster

 highlight_cluster (clusters:numpy.ndarray, cluster:Optional[str]=None,
                    bg:Union[str,tuple]='black',
                    hl:Union[str,tuple]='red')

Highlight a cluster in a list of clusters by setting the color of the cluster to hl and the color of the rest to bg.

Type Default Details
clusters ndarray The array of cluster names.
cluster typing.Optional[str] None The cluster to highlight (default: None).
bg typing.Union[str, tuple] black The background color. Accepts all matplotlib-compatible color formats (default: “black”).
hl typing.Union[str, tuple] red The highlight color. Accepts all matplotlib-compatible color formats (default: “red”).
Returns ndarray The array of colors, with the same length as clusters.

source

unique_genes

 unique_genes (connections:numpy.ndarray)

Extract the unique gene names from an array of connections.

Type Details
connections ndarray The array of connections. Columns should be (query_genes, target_genes, connection_strength). The last column is optional.
Returns ndarray The array of unique genes.

source

map_to_colormap

 map_to_colormap (x:numpy.ndarray,
                  cmap:Union[str,matplotlib.colors.Colormap]='magma_r',
                  vmin:float=0, vmax:Optional[float]=None)

Map an array of values to a color palette.

Type Default Details
x ndarray The array to map.
cmap typing.Union[str, matplotlib.colors.Colormap] magma_r The color map to use. Should be a matplotlib colormap object or a string with the name of a matplotlib colormap (default: “magma_r”).
vmin float 0 The value to obtain the minimum color in the colormap. Should be <= np.min(x) to avoid truncation (default: 0).
vmax typing.Optional[float] None The value to obtain the maximum color in the colormap. Should be >= np.max(x), and will use np.max(x) if set to None (default: None).
Returns ndarray Array of RGBA values with a shape of x.shape + (4, ).

source

map_array_to_color

 map_array_to_color (x:numpy.ndarray, palette:matplotlib.colors.Colormap,
                     xmax:Optional[float]=None)

Map an array of values to a color palette.

Type Default Details
x ndarray The array to map.
palette Colormap The color map to use. Should be a matplotlib colormap object.
xmax typing.Optional[float] None The maximum value to use for normalization. Should be >= np.max(x), and will use np.max(x) if set to None (default: None).
Returns ndarray Array of RGBA values with a shape of x.shape + (4, ).

source

add_homology_context

 add_homology_context (connections:numpy.ndarray,
                       orthology:pandas.core.frame.DataFrame)

Add homology context to the given connections based on the orthology information.

Type Details
connections ndarray The connections between genes. The columns should be (query_gene, target_gene).
orthology DataFrame The orthology information as a DataFrame.
Returns ndarray The connections array with homology context added. The columns will be (query_gene,
target_gene, connection_strength), and the values in connection_strength will depend on the
content of the orthology DataFrame.

source

plot_dotplot

 plot_dotplot (query_avg_expr:numpy.ndarray,
               target_avg_expr:numpy.ndarray,
               query_perc_expr:numpy.ndarray,
               target_perc_expr:numpy.ndarray, query_genes:List[str],
               target_genes:List[str], connections:numpy.ndarray,
               query_cluster_colors:Dict[str,str],
               target_cluster_colors:Dict[str,str],
               query_gene_colors:Dict[str,str],
               target_gene_colors:Dict[str,str], query_species:str,
               target_species:str, x_offset:float=1, y_offset:float=0,
               grid_offset:int=30, query_clustering:str='leiden',
               target_clustering:str='leiden',
               output:str='./paired_dotplot.png',
               title:Optional[str]=None, title_font_size:int=16,
               center:bool=True,
               cmap:matplotlib.colors.Colormap='magma_r')

Plot the paired dotplot based on the given data.

Type Default Details
query_avg_expr ndarray Array representing the average expression values of query genes.
target_avg_expr ndarray Array representing the average expression values of target genes.
query_perc_expr ndarray Array representing the percentage expression values of query genes.
target_perc_expr ndarray Array representing the percentage expression values of target genes.
query_genes typing.List[str] List of query gene names.
target_genes typing.List[str] List of target gene names.
connections ndarray An array where each row contains two genes and (optionally) the strength of their
connection.
query_cluster_colors typing.Dict[str, str] Dictionary mapping query cluster names to their colors.
target_cluster_colors typing.Dict[str, str] Dictionary mapping target cluster names to their colors.
query_gene_colors typing.Dict[str, str] Dictionary mapping query gene names to their colors.
target_gene_colors typing.Dict[str, str] Dictionary mapping target gene names to their colors.
query_species str Species name of the query genes.
target_species str Species name of the target genes.
x_offset float 1 Offset for the x-axis (default: 1).
y_offset float 0 Offset for the y-axis (default: 0).
grid_offset int 30 Offset for the grid spacing (default: 30).
query_clustering str leiden Clustering method for the query genes (default: “leiden”).
target_clustering str leiden Clustering method for the target genes (default: “leiden”).
output str ./paired_dotplot.png Output file path for the plot (default: “./paired_dotplot.png”).
title typing.Optional[str] None Title of the plot (default: None).
title_font_size int 16 Font size of the plot title (default: 16).
center bool True Whether to center the dotplots when the number of genes exceeds the maximum (default: True).
cmap Colormap magma_r
Returns None

source

add_connections

 add_connections (fig:matplotlib.figure.Figure, connections:numpy.ndarray,
                  query_gene_names:List[str],
                  query_gene_colors:Dict[str,str], label_offset:float)

Add connections between genes to the given paired dotplot figure.

Type Details
fig Figure The paired dotplot figure to which the connections will be added.
connections ndarray An array where each row contains two genes and (optionally) the strength of their
connection.
query_gene_names typing.List[str] The list of query gene names.
query_gene_colors typing.Dict[str, str] The dictionary mapping query gene names to their colors.
label_offset float The offset for label positioning.
Returns None

source

make_dotplot

 make_dotplot (ax:matplotlib.axes._axes.Axes, avg:numpy.ndarray,
               perc:numpy.ndarray, gene_names:List[str], species:str,
               clustering:str, clust_color:List[str],
               gene_color:List[str], side:str='left',
               cmap:matplotlib.colors.Colormap='magma_r')

Make a dotplot on the given Axes object based on the average and percentage expression values.

Type Default Details
ax Axes The Axes object on which to create the dotplot.
avg ndarray The average expression values.
perc ndarray The percentage expression values.
gene_names typing.List[str] The list of gene names.
species str The species name.
clustering str The clustering information.
clust_color typing.List[str] The list of colors for clusters.
gene_color typing.List[str] The list of colors for genes.
side str left The side to place the y-axis labels, either “left” or “right” (default: “left”).
cmap Colormap magma_r
Returns None

source

plot_colorbar_legend

 plot_colorbar_legend (cbar_legend:matplotlib.axes._axes.Axes,
                       query_avg_expr:numpy.ndarray,
                       target_avg_expr:numpy.ndarray,
                       cmap:matplotlib.colors.Colormap='magma_r')

Plot the colorbar legend based on the average expression values of query and target genes.

Type Default Details
cbar_legend Axes The Axes object representing the colorbar legend.
query_avg_expr ndarray Array representing the average expression values of query genes.
target_avg_expr ndarray Array representing the average expression values of target genes.
cmap Colormap magma_r The Colormap instance or registered colormap name used to map scalar data to colors
(default: “magma_r”).
Returns None

source

plot_dot_legend

 plot_dot_legend (dot_legend, size_exponent=1.5, dot_size=200)

Create the dotplot legend, explaining dot size.

Type Default Details
dot_legend matplotlib axis The subplot of the grid that contains the dotplot legend.
size_exponent float 1.5 The exponent to raise the fraction of cells in a group to, to get the dot size. The default
is 1.5.
dot_size int 200 The size of the largest dot. The default is 200.

source

get_dot_color

 get_dot_color (query:anndata._core.anndata.AnnData,
                target:anndata._core.anndata.AnnData,
                query_clustering:str, target_clustering:str,
                query_genes:Optional[numpy.ndarray]=None,
                target_genes:Optional[numpy.ndarray]=None,
                query_gene_names:Optional[numpy.ndarray]=None,
                target_gene_names:Optional[numpy.ndarray]=None,
                layer:Optional[str]=None)

Calculate average expression in each cluster and translate that to dot color for the dotplot. Note that this function does not know what you did with the matrix before; if you have log-transformed the data it will calculate an average of logs, not the log of the exp-transformed average.

Type Default Details
query AnnData The query dataset.
target AnnData The target dataset.
query_clustering str The .obs column name to use for the query dataset.
target_clustering str The .obs column name to use for the target dataset.
query_genes typing.Optional[numpy.ndarray] None Array of query genes to subset the data, if any. If None, use all genes (default: None).
target_genes typing.Optional[numpy.ndarray] None Array of target genes to subset the data, if any. If None, use all genes (default: None).
query_gene_names typing.Optional[numpy.ndarray] None Array of query gene names (default: None).
target_gene_names typing.Optional[numpy.ndarray] None Array of target gene names (default: None).
layer typing.Optional[str] None The layer to use for the average expression calculation. If not specified, it will use the
.X slot of the AnnData objects. It is vital to set this correctly to avoid calculating
average expression on log1p-transformed data (default: None).
Returns typing.Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame] A tuple containing the dot color values for query and target datasets, respectively.

source

get_dot_size

 get_dot_size (query:pandas.core.frame.DataFrame,
               target:pandas.core.frame.DataFrame, query_clustering:str,
               target_clustering:str,
               query_genes:Optional[numpy.ndarray]=None,
               target_genes:Optional[numpy.ndarray]=None,
               query_gene_names:Optional[numpy.ndarray]=None,
               target_gene_names:Optional[numpy.ndarray]=None)

Calculate which percentage of cells in each cluster express each gene, and translate that to dot size for the dotplot.

Type Default Details
query DataFrame The query dataset.
target DataFrame The target dataset.
query_clustering str The .obs column name to use for the query dataset.
target_clustering str The .obs column name to use for the target dataset.
query_genes typing.Optional[numpy.ndarray] None Array of query genes to subset the data, if any. If None, use all genes (default: None).
target_genes typing.Optional[numpy.ndarray] None Array of target genes to subset the data, if any. If None, use all genes (default: None).
query_gene_names typing.Optional[numpy.ndarray] None Array of query gene names (default: None).
target_gene_names typing.Optional[numpy.ndarray] None Array of target gene names (default: None).
Returns typing.Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame] A tuple containing the dot size values for query and target datasets, respectively.

source

feature_colors

 feature_colors (components:numpy.ndarray, query_G:int, seed:int=42)

Assign colors to the components based on the given array of components.

Type Default Details
components ndarray The array of components.
query_G int The number of components for the query genes.
seed int 42 The seed value for the random number generator (default: 42).
Returns typing.Tuple[numpy.ndarray, numpy.ndarray] A tuple containing the colored components for query genes and target genes, respectively.

source

gene_order

 gene_order (full_adjacency:numpy.ndarray, components:numpy.ndarray,
             query_G:int)

Calculate the order of genes based on the given full adjacency matrix and components. Highly connected genes are placed first, genes without any connections are randomly ordered in the bottom of the plot.

Type Details
full_adjacency ndarray The full adjacency matrix represented as a 2D numpy array.
components ndarray An array representing the components.
query_G int The number of query genes.
Returns typing.Tuple[numpy.ndarray, numpy.ndarray] A tuple containing the query gene order and the target gene order as numpy arrays.

source

calculate_adjacency_matrix

 calculate_adjacency_matrix (connections:numpy.ndarray,
                             query_genes:List[Any],
                             target_genes:List[Any])

Calculate the adjacency matrix based on the given connections, query genes, and target genes.

Type Details
connections ndarray The 2D array representing the connections between genes. Each row contains two gene
identifiers indicating a connection, and optionally the strength of that connection.
query_genes typing.List[typing.Any] A list of genes that act as queries.
target_genes typing.List[typing.Any] A list of genes that act as targets.
Returns ndarray The adjacency matrix represented as a 2D numpy array. It has dimensions (query_G + target_G)
x (query_G + target_G), where query_G and target_G are the lengths of query_genes and
target_genes, respectively.

source

label_pos

 label_pos (display_coords:Dict[str,Tuple[float,float,float,float]],
            key:str, side:str='left')

Get the edge coordinates of a label. Keep either the left or the right end of the word.

Type Default Details
display_coords typing.Dict[str, typing.Tuple[float, float, float, float]] A dictionary that holds the window extents of tick labels.
key str The label to retrieve; a gene name.
side str left One of “left” or “right”; depending on orientation will return the leftmost or rightmost
position of the label (default: “left”).
Returns typing.Tuple[float, float] A tuple containing the x and y coordinates of the label.

source

prepare_dotplot

 prepare_dotplot (avg_expr:pandas.core.frame.DataFrame,
                  perc_expr:pandas.core.frame.DataFrame,
                  cmap:Union[str,matplotlib.colors.Colormap]='magma_r',
                  vmin:float=0, vmax:Optional[float]=None,
                  size_exponent:float=1.5, dot_size:float=200)

Pivots average expression and percent expressed tables to make them dotplot-friendly.

Type Default Details
avg_expr DataFrame Data frame that holds average expression for all genes and all clusters.
perc_expr DataFrame Data frame that tracks the percentage of cells expressing each gene in every cluster.
cmap typing.Union[str, matplotlib.colors.Colormap] magma_r The Colormap instance or registered colormap name used to map scalar data to colors
(default: “magma_r”).
vmin float 0 Minimum average expression value to show (default: 0).
vmax typing.Optional[float] None Maximum average expression value to show (default: maximum average expr. value).
size_exponent float 1.5 Dot size is computed as fraction ** size_exponent * dot_size (default: 1.5).
dot_size float 200 The size of the largest dot (default: 200).
Returns typing.Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, numpy.ndarray] A tuple containing the melted average expression data frame, the melted percentage
expression data frame, and the array of RGBA-coded color values for the average expression
in a cluster/gene combination, according to the input color map.