Skip to content

Calling intervals

By default, ENEO performs variant calling on exons of protein coding genes, with the exception of two types of calling regions:

  • Known hard-to-call regions of the human genome. These regions were extensively profiled by the GIAB consortium, and are publicly available. This regions are removed from the VCF file and so are not present in the final VCF output.

  • Genes involved in the antigen presentation: given the initial goal of the pipeline to be applied in a personalized cancer vaccine setup, we discared these genes as a source of unwanted variations. We obtained the set of genes using the KEGG pathway annotated as

Expand to see the full set of excluded genes in ENEO
symbol description
IFNG interferon gamma
TNF tumor necrosis factor
PSME1 proteasome activator subunit 1
PSME2 proteasome activator subunit 2
PSME3 proteasome activator subunit 3
HSPA8 heat shock protein family A (Hsp70) member 8
HSPA1A heat shock protein family A (Hsp70) member 1A
HSPA1L heat shock protein family A (Hsp70) member 1 like
HSPA1B heat shock protein family A (Hsp70) member 1B
HSPA6 heat shock protein family A (Hsp70) member 6
HSPA2 heat shock protein family A (Hsp70) member 2
HSPA4 heat shock protein family A (Hsp70) member 4
HSP90AA1 heat shock protein 90 alpha family class A member 1
HSP90AB1 heat shock protein 90 alpha family class B member 1
HLA-A major histocompatibility complex, class I, A
HLA-B major histocompatibility complex, class I, B
HLA-C major histocompatibility complex, class I, C
HLA-F major histocompatibility complex, class I, F
HLA-G major histocompatibility complex, class I, G
HLA-E major histocompatibility complex, class I, E
HSPA5 heat shock protein family A (Hsp70) member 5
CANX calnexin
B2M beta-2-microglobulin
PDIA3 protein disulfide isomerase family A member 3
CALR calreticulin
TAPBP TAP binding protein
TAP1 transporter 1, ATP binding cassette subfamily B member
TAP2 transporter 2, ATP binding cassette subfamily B member
CD8A CD8 subunit alpha
CD8B CD8 subunit beta
CD8B2 CD8B family member 2
KIR3DL2 killer cell immunoglobulin like receptor, three Ig domains and long cytoplasmic tail 2
KIR3DL1 killer cell immunoglobulin like receptor, three Ig domains and long cytoplasmic tail 1
KIR3DL3 killer cell immunoglobulin like receptor, three Ig domains and long cytoplasmic tail 3
KIR3DS1 killer cell immunoglobulin like receptor, three Ig domains and short cytoplasmic tail 1
KIR2DL2 killer cell immunoglobulin like receptor, two Ig domains and long cytoplasmic tail 2
KIR2DL1 killer cell immunoglobulin like receptor, two Ig domains and long cytoplasmic tail 1
KIR2DL3 killer cell immunoglobulin like receptor, two Ig domains and long cytoplasmic tail 3
KIR2DL4 killer cell immunoglobulin like receptor, two Ig domains and long cytoplasmic tail 4
KIR2DL5A killer cell immunoglobulin like receptor, two Ig domains and long cytoplasmic tail 5A
KLRC1 killer cell lectin like receptor C1
KLRC2 killer cell lectin like receptor C2
KLRC3 killer cell lectin like receptor C3
KLRC4 killer cell lectin like receptor C4
KLRD1 killer cell lectin like receptor D1
KIR2DS1 killer cell immunoglobulin like receptor, two Ig domains and short cytoplasmic tail 1
KIR2DS3 killer cell immunoglobulin like receptor, two Ig domains and short cytoplasmic tail 3
KIR2DS4 killer cell immunoglobulin like receptor, two Ig domains and short cytoplasmic tail 4
KIR2DS5 killer cell immunoglobulin like receptor, two Ig domains and short cytoplasmic tail 5
KIR2DS2 killer cell immunoglobulin like receptor, two Ig domains and short cytoplasmic tail 2
IFI30 IFI30 lysosomal thiol reductase
LGMN legumain
CTSB cathepsin B
HLA-DMA major histocompatibility complex, class II, DM alpha
HLA-DMB major histocompatibility complex, class II, DM beta
HLA-DOA major histocompatibility complex, class II, DO alpha
HLA-DOB major histocompatibility complex, class II, DO beta
HLA-DPA1 major histocompatibility complex, class II, DP alpha 1
HLA-DPB1 major histocompatibility complex, class II, DP beta 1
HLA-DQA1 major histocompatibility complex, class II, DQ alpha 1
HLA-DQA2 major histocompatibility complex, class II, DQ alpha 2
HLA-DQB1 major histocompatibility complex, class II, DQ beta 1
HLA-DRA major histocompatibility complex, class II, DR alpha
HLA-DRB1 major histocompatibility complex, class II, DR beta 1
HLA-DRB3 major histocompatibility complex, class II, DR beta 3
HLA-DRB4 major histocompatibility complex, class II, DR beta 4
HLA-DRB5 major histocompatibility complex, class II, DR beta 5
CD74 CD74 molecule
CTSL cathepsin L
CTSS cathepsin S
CD4 CD4 molecule
CIITA class II major histocompatibility complex transactivator
RFX5 regulatory factor X5
RFXANK regulatory factor X associated ankyrin containing protein
RFXAP regulatory factor X associated protein
CREB1 cAMP responsive element binding protein 1
NFYA nuclear transcription factor Y subunit alpha
NFYB nuclear transcription factor Y subunit beta
NFYC nuclear transcription factor Y subunit gamma

Generate a custom interval set

To build a custom set of intervals from a GTF file (here referred to as genecode.gtf.gz), you need to have tabix and bgzip installed. If you built a conda environment for the setup using the setup_env.yml file, tabix and bgzip will be present.

zgrep "protein_coding" gencode.gtf.gz | awk -F'\t' '{ if ($3 == "exon") print $1, $2, $3}' OFS='\t' - | bgzip -c > calling_intervals.BED.gz &&\
tabix -p bed calling_intervals.BED.gz

Then you can use the calling_intervals.BED.gz file as the input for the variant calling step, by putting it in the config_main.yaml