dotplot_util

Collection of utility functions for the pairwise dotplots

This module contains a collection of utility functions for the pairwise dotplots, as well as some more general functions that are used in the dotplot generation process. I wrote documentation for the functions here, but I didn’t write tests for them. I’m not sure this page is very valuable to end users.

highlight_cluster

 highlight_cluster (clusters:numpy.ndarray, cluster:Optional[str]=None,
                    bg:Union[str,tuple]='black',
                    hl:Union[str,tuple]='red')

Highlight a cluster in a list of clusters by setting the color of the cluster to hl and the color of the rest to bg.

	Type	Default	Details
clusters	ndarray		The array of cluster names.
cluster	typing.Optional[str]	None	The cluster to highlight (default: None).
bg	typing.Union[str, tuple]	black	The background color. Accepts all matplotlib-compatible color formats (default: “black”).
hl	typing.Union[str, tuple]	red	The highlight color. Accepts all matplotlib-compatible color formats (default: “red”).
Returns	ndarray		The array of colors, with the same length as `clusters`.

unique_genes

 unique_genes (connections:numpy.ndarray)

Extract the unique gene names from an array of connections.

	Type	Details
connections	ndarray	The array of connections. Columns should be (query_genes, target_genes, connection_strength). The last column is optional.
Returns	ndarray	The array of unique genes.

map_to_colormap

 map_to_colormap (x:numpy.ndarray,
                  cmap:Union[str,matplotlib.colors.Colormap]='magma_r',
                  vmin:float=0, vmax:Optional[float]=None)

Map an array of values to a color palette.

	Type	Default	Details
x	ndarray		The array to map.
cmap	typing.Union[str, matplotlib.colors.Colormap]	magma_r	The color map to use. Should be a matplotlib colormap object or a string with the name of a matplotlib colormap (default: “magma_r”).
vmin	float	0	The value to obtain the minimum color in the colormap. Should be <= np.min(x) to avoid truncation (default: 0).
vmax	typing.Optional[float]	None	The value to obtain the maximum color in the colormap. Should be >= np.max(x), and will use np.max(x) if set to `None` (default: None).
Returns	ndarray		Array of RGBA values with a shape of `x.shape + (4, )`.

map_array_to_color

 map_array_to_color (x:numpy.ndarray, palette:matplotlib.colors.Colormap,
                     xmax:Optional[float]=None)

Map an array of values to a color palette.

	Type	Default	Details
x	ndarray		The array to map.
palette	Colormap		The color map to use. Should be a matplotlib colormap object.
xmax	typing.Optional[float]	None	The maximum value to use for normalization. Should be >= np.max(x), and will use np.max(x) if set to `None` (default: None).
Returns	ndarray		Array of RGBA values with a shape of `x.shape + (4, )`.

add_homology_context

 add_homology_context (connections:numpy.ndarray,
                       orthology:pandas.core.frame.DataFrame)

Add homology context to the given connections based on the orthology information.

	Type	Details
connections	ndarray	The connections between genes. The columns should be (query_gene, target_gene).
orthology	DataFrame	The orthology information as a DataFrame.
Returns	ndarray	The connections array with homology context added. The columns will be (query_gene, target_gene, connection_strength), and the values in connection_strength will depend on the content of the orthology DataFrame.

plot_dotplot

 plot_dotplot (query_avg_expr:numpy.ndarray,
               target_avg_expr:numpy.ndarray,
               query_perc_expr:numpy.ndarray,
               target_perc_expr:numpy.ndarray, query_genes:List[str],
               target_genes:List[str], connections:numpy.ndarray,
               query_cluster_colors:Dict[str,str],
               target_cluster_colors:Dict[str,str],
               query_gene_colors:Dict[str,str],
               target_gene_colors:Dict[str,str], query_species:str,
               target_species:str, x_offset:float=1, y_offset:float=0,
               grid_offset:int=30, query_clustering:str='leiden',
               target_clustering:str='leiden',
               output:str='./paired_dotplot.png',
               title:Optional[str]=None, title_font_size:int=16,
               center:bool=True,
               cmap:matplotlib.colors.Colormap='magma_r')

Plot the paired dotplot based on the given data.

	Type	Default	Details
query_avg_expr	ndarray		Array representing the average expression values of query genes.
target_avg_expr	ndarray		Array representing the average expression values of target genes.
query_perc_expr	ndarray		Array representing the percentage expression values of query genes.
target_perc_expr	ndarray		Array representing the percentage expression values of target genes.
query_genes	typing.List[str]		List of query gene names.
target_genes	typing.List[str]		List of target gene names.
connections	ndarray		An array where each row contains two genes and (optionally) the strength of their connection.
query_cluster_colors	typing.Dict[str, str]		Dictionary mapping query cluster names to their colors.
target_cluster_colors	typing.Dict[str, str]		Dictionary mapping target cluster names to their colors.
query_gene_colors	typing.Dict[str, str]		Dictionary mapping query gene names to their colors.
target_gene_colors	typing.Dict[str, str]		Dictionary mapping target gene names to their colors.
query_species	str		Species name of the query genes.
target_species	str		Species name of the target genes.
x_offset	float	1	Offset for the x-axis (default: 1).
y_offset	float	0	Offset for the y-axis (default: 0).
grid_offset	int	30	Offset for the grid spacing (default: 30).
query_clustering	str	leiden	Clustering method for the query genes (default: “leiden”).
target_clustering	str	leiden	Clustering method for the target genes (default: “leiden”).
output	str	./paired_dotplot.png	Output file path for the plot (default: “./paired_dotplot.png”).
title	typing.Optional[str]	None	Title of the plot (default: None).
title_font_size	int	16	Font size of the plot title (default: 16).
center	bool	True	Whether to center the dotplots when the number of genes exceeds the maximum (default: True).
cmap	Colormap	magma_r
Returns	None

add_connections

 add_connections (fig:matplotlib.figure.Figure, connections:numpy.ndarray,
                  query_gene_names:List[str],
                  query_gene_colors:Dict[str,str], label_offset:float)

Add connections between genes to the given paired dotplot figure.

	Type	Details
fig	Figure	The paired dotplot figure to which the connections will be added.
connections	ndarray	An array where each row contains two genes and (optionally) the strength of their connection.
query_gene_names	typing.List[str]	The list of query gene names.
query_gene_colors	typing.Dict[str, str]	The dictionary mapping query gene names to their colors.
label_offset	float	The offset for label positioning.
Returns	None

make_dotplot

 make_dotplot (ax:matplotlib.axes._axes.Axes, avg:numpy.ndarray,
               perc:numpy.ndarray, gene_names:List[str], species:str,
               clustering:str, clust_color:List[str],
               gene_color:List[str], side:str='left',
               cmap:matplotlib.colors.Colormap='magma_r')

Make a dotplot on the given Axes object based on the average and percentage expression values.

	Type	Default	Details
ax	Axes		The Axes object on which to create the dotplot.
avg	ndarray		The average expression values.
perc	ndarray		The percentage expression values.
gene_names	typing.List[str]		The list of gene names.
species	str		The species name.
clustering	str		The clustering information.
clust_color	typing.List[str]		The list of colors for clusters.
gene_color	typing.List[str]		The list of colors for genes.
side	str	left	The side to place the y-axis labels, either “left” or “right” (default: “left”).
cmap	Colormap	magma_r
Returns	None

plot_colorbar_legend

 plot_colorbar_legend (cbar_legend:matplotlib.axes._axes.Axes,
                       query_avg_expr:numpy.ndarray,
                       target_avg_expr:numpy.ndarray,
                       cmap:matplotlib.colors.Colormap='magma_r')

Plot the colorbar legend based on the average expression values of query and target genes.

	Type	Default	Details
cbar_legend	Axes		The Axes object representing the colorbar legend.
query_avg_expr	ndarray		Array representing the average expression values of query genes.
target_avg_expr	ndarray		Array representing the average expression values of target genes.
cmap	Colormap	magma_r	The Colormap instance or registered colormap name used to map scalar data to colors (default: “magma_r”).
Returns	None

plot_dot_legend

 plot_dot_legend (dot_legend, size_exponent=1.5, dot_size=200)

Create the dotplot legend, explaining dot size.

	Type	Default	Details
dot_legend	matplotlib `axis`		The subplot of the grid that contains the dotplot legend.
size_exponent	float	1.5	The exponent to raise the fraction of cells in a group to, to get the dot size. The default is 1.5.
dot_size	int	200	The size of the largest dot. The default is 200.

get_dot_color

 get_dot_color (query:anndata._core.anndata.AnnData,
                target:anndata._core.anndata.AnnData,
                query_clustering:str, target_clustering:str,
                query_genes:Optional[numpy.ndarray]=None,
                target_genes:Optional[numpy.ndarray]=None,
                query_gene_names:Optional[numpy.ndarray]=None,
                target_gene_names:Optional[numpy.ndarray]=None,
                layer:Optional[str]=None)

Calculate average expression in each cluster and translate that to dot color for the dotplot. Note that this function does not know what you did with the matrix before; if you have log-transformed the data it will calculate an average of logs, not the log of the exp-transformed average.

	Type	Default	Details
query	AnnData		The query dataset.
target	AnnData		The target dataset.
query_clustering	str		The .obs column name to use for the query dataset.
target_clustering	str		The .obs column name to use for the target dataset.
query_genes	typing.Optional[numpy.ndarray]	None	Array of query genes to subset the data, if any. If None, use all genes (default: None).
target_genes	typing.Optional[numpy.ndarray]	None	Array of target genes to subset the data, if any. If None, use all genes (default: None).
query_gene_names	typing.Optional[numpy.ndarray]	None	Array of query gene names (default: None).
target_gene_names	typing.Optional[numpy.ndarray]	None	Array of target gene names (default: None).
layer	typing.Optional[str]	None	The layer to use for the average expression calculation. If not specified, it will use the `.X` slot of the `AnnData` objects. It is vital to set this correctly to avoid calculating average expression on log1p-transformed data (default: None).
Returns	typing.Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]		A tuple containing the dot color values for query and target datasets, respectively.

get_dot_size

 get_dot_size (query:pandas.core.frame.DataFrame,
               target:pandas.core.frame.DataFrame, query_clustering:str,
               target_clustering:str,
               query_genes:Optional[numpy.ndarray]=None,
               target_genes:Optional[numpy.ndarray]=None,
               query_gene_names:Optional[numpy.ndarray]=None,
               target_gene_names:Optional[numpy.ndarray]=None)

Calculate which percentage of cells in each cluster express each gene, and translate that to dot size for the dotplot.

	Type	Default	Details
query	DataFrame		The query dataset.
target	DataFrame		The target dataset.
query_clustering	str		The .obs column name to use for the query dataset.
target_clustering	str		The .obs column name to use for the target dataset.
query_genes	typing.Optional[numpy.ndarray]	None	Array of query genes to subset the data, if any. If None, use all genes (default: None).
target_genes	typing.Optional[numpy.ndarray]	None	Array of target genes to subset the data, if any. If None, use all genes (default: None).
query_gene_names	typing.Optional[numpy.ndarray]	None	Array of query gene names (default: None).
target_gene_names	typing.Optional[numpy.ndarray]	None	Array of target gene names (default: None).
Returns	typing.Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]		A tuple containing the dot size values for query and target datasets, respectively.

feature_colors

 feature_colors (components:numpy.ndarray, query_G:int, seed:int=42)

Assign colors to the components based on the given array of components.

	Type	Default	Details
components	ndarray		The array of components.
query_G	int		The number of components for the query genes.
seed	int	42	The seed value for the random number generator (default: 42).
Returns	typing.Tuple[numpy.ndarray, numpy.ndarray]		A tuple containing the colored components for query genes and target genes, respectively.

gene_order

 gene_order (full_adjacency:numpy.ndarray, components:numpy.ndarray,
             query_G:int)

Calculate the order of genes based on the given full adjacency matrix and components. Highly connected genes are placed first, genes without any connections are randomly ordered in the bottom of the plot.

	Type	Details
full_adjacency	ndarray	The full adjacency matrix represented as a 2D numpy array.
components	ndarray	An array representing the components.
query_G	int	The number of query genes.
Returns	typing.Tuple[numpy.ndarray, numpy.ndarray]	A tuple containing the query gene order and the target gene order as numpy arrays.

calculate_adjacency_matrix

 calculate_adjacency_matrix (connections:numpy.ndarray,
                             query_genes:List[Any],
                             target_genes:List[Any])

Calculate the adjacency matrix based on the given connections, query genes, and target genes.

	Type	Details
connections	ndarray	The 2D array representing the connections between genes. Each row contains two gene identifiers indicating a connection, and optionally the strength of that connection.
query_genes	typing.List[typing.Any]	A list of genes that act as queries.
target_genes	typing.List[typing.Any]	A list of genes that act as targets.
Returns	ndarray	The adjacency matrix represented as a 2D numpy array. It has dimensions (query_G + target_G) x (query_G + target_G), where query_G and target_G are the lengths of query_genes and target_genes, respectively.

label_pos

 label_pos (display_coords:Dict[str,Tuple[float,float,float,float]],
            key:str, side:str='left')

Get the edge coordinates of a label. Keep either the left or the right end of the word.

	Type	Default	Details
display_coords	typing.Dict[str, typing.Tuple[float, float, float, float]]		A dictionary that holds the window extents of tick labels.
key	str		The label to retrieve; a gene name.
side	str	left	One of “left” or “right”; depending on orientation will return the leftmost or rightmost position of the label (default: “left”).
Returns	typing.Tuple[float, float]		A tuple containing the x and y coordinates of the label.

prepare_dotplot

 prepare_dotplot (avg_expr:pandas.core.frame.DataFrame,
                  perc_expr:pandas.core.frame.DataFrame,
                  cmap:Union[str,matplotlib.colors.Colormap]='magma_r',
                  vmin:float=0, vmax:Optional[float]=None,
                  size_exponent:float=1.5, dot_size:float=200)

Pivots average expression and percent expressed tables to make them dotplot-friendly.

	Type	Default	Details
avg_expr	DataFrame		Data frame that holds average expression for all genes and all clusters.
perc_expr	DataFrame		Data frame that tracks the percentage of cells expressing each gene in every cluster.
cmap	typing.Union[str, matplotlib.colors.Colormap]	magma_r	The Colormap instance or registered colormap name used to map scalar data to colors (default: “magma_r”).
vmin	float	0	Minimum average expression value to show (default: 0).
vmax	typing.Optional[float]	None	Maximum average expression value to show (default: maximum average expr. value).
size_exponent	float	1.5	Dot size is computed as fraction ** size_exponent * dot_size (default: 1.5).
dot_size	float	200	The size of the largest dot (default: 200).
Returns	typing.Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, numpy.ndarray]		A tuple containing the melted average expression data frame, the melted percentage expression data frame, and the array of RGBA-coded color values for the average expression in a cluster/gene combination, according to the input color map.