dotplot_util
This module contains a collection of utility functions for the pairwise dotplots, as well as some more general functions that are used in the dotplot generation process. I wrote documentation for the functions here, but I didn’t write tests for them. I’m not sure this page is very valuable to end users.
highlight_cluster
def highlight_cluster(
clusters:ndarray, # The array of cluster names.
cluster:Optional=None, # The cluster to highlight (default: None).
bg:Union='black', # The background color. Accepts all matplotlib-compatible color formats (default: "black").
hl:Union='red', # The highlight color. Accepts all matplotlib-compatible color formats (default: "red").
)->ndarray: # The array of colors, with the same length as `clusters`.
Highlight a cluster in a list of clusters by setting the color of the cluster to hl and the color of the rest to bg.
unique_genes
def unique_genes(
connections:ndarray, # The array of connections. Columns should be (query_genes, target_genes, connection_strength). The last column is optional.
)->ndarray: # The array of unique genes.
Extract the unique gene names from an array of connections.
map_to_colormap
def map_to_colormap(
x:ndarray, # The array to map.
cmap:Union='magma_r', # The color map to use. Should be a matplotlib colormap object or a string with the name of a matplotlib colormap (default: "magma_r").
vmin:float=0, # The value to obtain the minimum color in the colormap. Should be <= np.min(x) to avoid truncation (default: 0).
vmax:Optional=None, # The value to obtain the maximum color in the colormap. Should be >= np.max(x), and will use np.max(x) if set to `None` (default: None).
)->ndarray: # Array of RGBA values with a shape of ``x.shape + (4, )``.
Map an array of values to a color palette.
map_array_to_color
def map_array_to_color(
x:ndarray, # The array to map.
palette:Colormap, # The color map to use. Should be a matplotlib colormap object.
xmax:Optional=None, # The maximum value to use for normalization. Should be >= np.max(x), and will use np.max(x) if set to `None` (default: None).
)->ndarray: # Array of RGBA values with a shape of ``x.shape + (4, )``.
Map an array of values to a color palette.
add_homology_context
def add_homology_context(
connections:ndarray, # The connections between genes. The columns should be (query_gene, target_gene).
orthology:DataFrame, # The orthology information as a DataFrame.
)->ndarray: # The connections array with homology context added. The columns will be (query_gene,
target_gene, connection_strength), and the values in connection_strength will depend on the
content of the orthology DataFrame.
Add homology context to the given connections based on the orthology information.
plot_dotplot
def plot_dotplot(
query_avg_expr:ndarray, # Array representing the average expression values of query genes.
target_avg_expr:ndarray, # Array representing the average expression values of target genes.
query_perc_expr:ndarray, # Array representing the percentage expression values of query genes.
target_perc_expr:ndarray, # Array representing the percentage expression values of target genes.
query_genes:List, # List of query gene names.
target_genes:List, # List of target gene names.
connections:ndarray, # An array where each row contains two genes and (optionally) the strength of their
connection.
query_cluster_colors:Dict, # Dictionary mapping query cluster names to their colors.
target_cluster_colors:Dict, # Dictionary mapping target cluster names to their colors.
query_gene_colors:Dict, # Dictionary mapping query gene names to their colors.
target_gene_colors:Dict, # Dictionary mapping target gene names to their colors.
query_species:str, # Species name of the query genes.
target_species:str, # Species name of the target genes.
x_offset:float=1, # Offset for the x-axis (default: 1).
y_offset:float=0, # Offset for the y-axis (default: 0).
grid_offset:int=30, # Offset for the grid spacing (default: 30).
query_clustering:str='leiden', # Clustering method for the query genes (default: "leiden").
target_clustering:str='leiden', # Clustering method for the target genes (default: "leiden").
output:str='./paired_dotplot.png', # Output file path for the plot (default: "./paired_dotplot.png").
title:Optional=None, # Title of the plot (default: None).
title_font_size:int=16, # Font size of the plot title (default: 16).
center:bool=True, # Whether to center the dotplots when the number of genes exceeds the maximum (default: True).
cmap:Colormap='magma_r'
)->None:
Plot the paired dotplot based on the given data.
add_connections
def add_connections(
fig:Figure, # The paired dotplot figure to which the connections will be added.
connections:ndarray, # An array where each row contains two genes and (optionally) the strength of their
connection.
query_gene_names:List, # The list of query gene names.
query_gene_colors:Dict, # The dictionary mapping query gene names to their colors.
label_offset:float, # The offset for label positioning.
)->None:
Add connections between genes to the given paired dotplot figure.
make_dotplot
def make_dotplot(
ax:Axes, # The Axes object on which to create the dotplot.
avg:ndarray, # The average expression values.
perc:ndarray, # The percentage expression values.
gene_names:List, # The list of gene names.
species:str, # The species name.
clustering:str, # The clustering information.
clust_color:List, # The list of colors for clusters.
gene_color:List, # The list of colors for genes.
side:str='left', # The side to place the y-axis labels, either "left" or "right" (default: "left").
cmap:Colormap='magma_r'
)->None:
Make a dotplot on the given Axes object based on the average and percentage expression values.
plot_colorbar_legend
def plot_colorbar_legend(
cbar_legend:Axes, # The Axes object representing the colorbar legend.
query_avg_expr:ndarray, # Array representing the average expression values of query genes.
target_avg_expr:ndarray, # Array representing the average expression values of target genes.
cmap:Colormap='magma_r', # The Colormap instance or registered colormap name used to map scalar data to colors
(default: "magma_r").
)->None:
Plot the colorbar legend based on the average expression values of query and target genes.
plot_dot_legend
def plot_dot_legend(
dot_legend, # The subplot of the grid that contains the dotplot legend.
size_exponent:float=1.5, # The exponent to raise the fraction of cells in a group to, to get the dot size. The default
is 1.5.
dot_size:int=200, # The size of the largest dot. The default is 200.
):
Create the dotplot legend, explaining dot size.
get_dot_color
def get_dot_color(
query:AnnData, # The query dataset.
target:AnnData, # The target dataset.
query_clustering:str, # The .obs column name to use for the query dataset.
target_clustering:str, # The .obs column name to use for the target dataset.
query_genes:Optional=None, # Array of query genes to subset the data, if any. If None, use all genes (default: None).
target_genes:Optional=None, # Array of target genes to subset the data, if any. If None, use all genes (default: None).
query_gene_names:Optional=None, # Array of query gene names (default: None).
target_gene_names:Optional=None, # Array of target gene names (default: None).
layer:Optional=None, # The layer to use for the average expression calculation. If not specified, it will use the
`.X` slot of the `AnnData` objects. It is vital to set this correctly to avoid calculating
average expression on log1p-transformed data (default: None).
)->Tuple: # A tuple containing the dot color values for query and target datasets, respectively.
Calculate average expression in each cluster and translate that to dot color for the dotplot. Note that this function does not know what you did with the matrix before; if you have log-transformed the data it will calculate an average of logs, not the log of the exp-transformed average.
get_dot_size
def get_dot_size(
query:DataFrame, # The query dataset.
target:DataFrame, # The target dataset.
query_clustering:str, # The .obs column name to use for the query dataset.
target_clustering:str, # The .obs column name to use for the target dataset.
query_genes:Optional=None, # Array of query genes to subset the data, if any. If None, use all genes (default: None).
target_genes:Optional=None, # Array of target genes to subset the data, if any. If None, use all genes (default: None).
query_gene_names:Optional=None, # Array of query gene names (default: None).
target_gene_names:Optional=None, # Array of target gene names (default: None).
)->Tuple: # A tuple containing the dot size values for query and target datasets, respectively.
Calculate which percentage of cells in each cluster express each gene, and translate that to dot size for the dotplot.
feature_colors
def feature_colors(
components:ndarray, # The array of components.
query_G:int, # The number of components for the query genes.
seed:int=42, # The seed value for the random number generator (default: 42).
)->Tuple: # A tuple containing the colored components for query genes and target genes, respectively.
Assign colors to the components based on the given array of components.
gene_order
def gene_order(
full_adjacency:ndarray, # The full adjacency matrix represented as a 2D numpy array.
components:ndarray, # An array representing the components.
query_G:int, # The number of query genes.
)->Tuple: # A tuple containing the query gene order and the target gene order as numpy arrays.
Calculate the order of genes based on the given full adjacency matrix and components. Highly connected genes are placed first, genes without any connections are randomly ordered in the bottom of the plot.
calculate_adjacency_matrix
def calculate_adjacency_matrix(
connections:ndarray, # The 2D array representing the connections between genes. Each row contains two gene
identifiers indicating a connection, and optionally the strength of that connection.
query_genes:List, # A list of genes that act as queries.
target_genes:List, # A list of genes that act as targets.
)->ndarray: # The adjacency matrix represented as a 2D numpy array. It has dimensions (query_G + target_G)
x (query_G + target_G), where query_G and target_G are the lengths of query_genes and
target_genes, respectively.
Calculate the adjacency matrix based on the given connections, query genes, and target genes.
label_pos
def label_pos(
display_coords:Dict, # A dictionary that holds the window extents of tick labels.
key:str, # The label to retrieve; a gene name.
side:str='left', # One of "left" or "right"; depending on orientation will return the leftmost or rightmost
position of the label (default: "left").
)->Tuple: # A tuple containing the x and y coordinates of the label.
Get the edge coordinates of a label. Keep either the left or the right end of the word.
prepare_dotplot
def prepare_dotplot(
avg_expr:DataFrame, # Data frame that holds average expression for all genes and all clusters.
perc_expr:DataFrame, # Data frame that tracks the percentage of cells expressing each gene in every cluster.
cmap:Union='magma_r', # The Colormap instance or registered colormap name used to map scalar data to colors
(default: "magma_r").
vmin:float=0, # Minimum average expression value to show (default: 0).
vmax:Optional=None, # Maximum average expression value to show (default: maximum average expr. value).
size_exponent:float=1.5, # Dot size is computed as fraction ** size_exponent * dot_size (default: 1.5).
dot_size:float=200, # The size of the largest dot (default: 200).
)->Tuple: # A tuple containing the melted average expression data frame, the melted percentage
expression data frame, and the array of RGBA-coded color values for the average expression
in a cluster/gene combination, according to the input color map.
Pivots average expression and percent expressed tables to make them dotplot-friendly.