geneorder

The Python Gene Order plotter

This package was born out of my continuous efforts to write streamlined code to plot Hox gene clusters. It seems like this is a task I will be performing many times, and there don’t seem to be too many tools out there, so I wrote my own. This is my matplotlib-based answer, and I hope that it is useful to you, too.

Input

Generally, the package expects you to have, at least, knowledge of the gene IDs that you want to visualize, and knowledge of their coordinates in the genome. This can be manually encoded or come in table form - crucially, it can be read from a GFF3 file.

Usage

Installation

Install latest from the GitHub repository:

$ pip install git+https://github.com/galicae/geneorder.git

or from pypi

$ pip install geneorder

Documentation

Documentation can be found hosted on this GitHub repository’s pages. Additionally you can find package manager specific guidelines on pypi.

How to use

The most basic geneorder use case is when you have a list of genes that are co-linear and would like to plot them in their chromosomal context. The least information that is needed for that is

  • the chromosome name
  • the gene IDs/names
  • the starts/ends of all genes
  • the strand and orientation of the genes
import pandas as pd

from geneorder.core import plot_synteny_schematic
gene_dict = {
    "gene_name": [
        "Hox1",
        "Hox2",
        "Hox3",
        "Hox4",
        "Hox5",
        "Hox6",
        "Hox7",
        "Hox8",
        "Hox10",
    ],
    "gene_id": [
        "PB.8615",
        "g9718",
        "PB.8616",
        "g9720",
        "g9721",
        "PB.8617",
        "g9723",
        "g9724",
        "g9725",
    ],
    "start": [
        1927066,
        1998922,
        2058396,
        2195412,
        2351936,
        2373415,
        2565196,
        2916314,
        2986021,
    ],
    "end": [
        1936157,
        2024148,
        2065953,
        2206712,
        2354374,
        2375678,
        2594468,
        2926445,
        2996225,
    ],
}

minimal = pd.DataFrame(gene_dict)
minimal["seqid"] = "pseudochrom_56"
minimal["strand"] = "-"
minimal
gene_name gene_id start end seqid strand
0 Hox1 PB.8615 1927066 1936157 pseudochrom_56 -
1 Hox2 g9718 1998922 2024148 pseudochrom_56 -
2 Hox3 PB.8616 2058396 2065953 pseudochrom_56 -
3 Hox4 g9720 2195412 2206712 pseudochrom_56 -
4 Hox5 g9721 2351936 2354374 pseudochrom_56 -
5 Hox6 PB.8617 2373415 2375678 pseudochrom_56 -
6 Hox7 g9723 2565196 2594468 pseudochrom_56 -
7 Hox8 g9724 2916314 2926445 pseudochrom_56 -
8 Hox10 g9725 2986021 2996225 pseudochrom_56 -
plot_synteny_schematic(minimal)

The plot can be customized, e.g. by including color:

minimal["color"] = [
    "red",
    "orange",
    "gold",
    "lightgreen",
    "forestgreen",
    "royalblue",
    "darkblue",
    "darkmagenta",
    "magenta",
]

plot_synteny_schematic(minimal)

We can also edit the dataframe to indicate missing genes:

from geneorder import util
minimal = util.insert_gap(
    minimal,
    "Hox8",
    "Hox10",
    "gene_name",
    no_gaps=1,
    purge_columns=["gene_id", "color"],
)
plot_synteny_schematic(minimal)

For more details, please refer to the documentation.