IEDB Analysis Resource - Labs Logo

PepX - Tutorial

1. Introduction
The Peptide eXpression annotator (pepX) takes a peptide as input, identifies from which proteins the peptide can be derived, and returns an estimate of the expression level of those source proteins from selected public databases ("Peptide/Gene Summary" tab on the results page). PepX also accumulates those expression levels and provides an estimate for the abundance level of the peptide ("Peptide Summary" tab on the results page).
The pepX database currently contains all peptides from the Ensembl GRCh38, release 106. This fasta file was used to derive all possible peptides, excluding those that are shorter than 8 amino acids or contain 'X'.
2. Input
PepX accepts list of peptides as input. It can also accept list of peptides through either in text file or in CSV file. When submitting a CSV file, it should contain a "Peptide" header, and each cell below the header should have one peptide per row.
3. Expression Datasets
Pre-calculated gene-level and transcript-level TPM values for the TCGA Pan-cancer cohort for 33 cancer types were downloaded from the UCSC Xena data pages (1). Pre-calculated gene-level and transcript-level TPM values for 256 healthy tissues were downloaded from the Human Protein Atlas (HPA)(2). Pre-calculated gene-level and transcript-level TPM values for 54 healthy tissue subtypes were downloaded from The Genotype-Tissue Expression (GTEx) project data portal (3). Median TPM values were calculated for each of the 31 main tissue types. Pre-calculated gene-level and transcript-level TPM values for 1019 cell lines were downloaded from the Cancer Cell Line Encyclopedia (CCLE) (4). All datasets were downloaded in July 2022.

  1. M. J. Goldman, B. Craft, M. Hastie, K. Repecka, F. McDade, A. Kamath, A. Banerjee, Y. Luo, D. Rogers, A. N. Brooks, J. Zhu and D. Haussler: Visualizing and interpreting cancer genomics data via the Xena platform. Nat Biotechnol, 38(6), 675-678 (2020) doi:10.1038/s41587-020-0546-8
  2. M. Uhlen, P. Oksvold, L. Fagerberg, E. Lundberg, K. Jonasson, M. Forsberg, M. Zwahlen, C. Kampf, K. Wester, S. Hober, H. Wernerus, L. Bjorling and F. Ponten: Towards a knowledge-based Human Protein Atlas. Nat Biotechnol, 28(12), 1248-50 (2010) doi:10.1038/nbt1210-1248
  3. L. J. Carithers and H. M. Moore: The Genotype-Tissue Expression (GTEx) Project. Biopreserv Biobank, 13(5), 307-8 (2015) doi:10.1089/bio.2015.29031.hmm
  4. M. Ghandi, F. W. Huang, J. Jane-Valbuena, G. V. Kryukov, C. C. Lo, E. R. McDonald, 3rd, J. Barretina, E. T. Gelfand, C. M. Bielski, H. Li, K. Hu, A. Y. Andreev-Drakhlin, J. Kim, J. M. Hess, B. J. Haas, F. Aguet, B. A. Weir, M. V. Rothberg, B. R. Paolella, M. S. Lawrence, R. Akbani, Y. Lu, H. L. Tiv, P. C. Gokhale, A. de Weck, A. A. Mansour, C. Oh, J. Shih, K. Hadi, Y. Rosen, J. Bistline, K. Venkatesan, A. Reddy, D. Sonkin, M. Liu, J. Lehar, J. M. Korn, D. A. Porter, M. D. Jones, J. Golji, G. Caponigro, J. E. Taylor, C. M. Dunning, A. L. Creech, A. C. Warren, J. M. McFarland, M. Zamanighomi, A. Kauffmann, N. Stransky, M. Imielinski, Y. E. Maruvka, A. D. Cherniack, A. Tsherniak, F. Vazquez, J. D. Jaffe, A. A. Lane, D. M. Weinstock, C. M. Johannessen, M. P. Morrissey, F. Stegmeier, R. Schlegel, W. C. Hahn, G. Getz, G. B. Mills, J. S. Boehm, T. R. Golub, L. A. Garraway and W. R. Sellers: Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature, 569(7757), 503-508 (2019) doi:10.1038/s41586-019-1186-3
4. Result Table Description
1. Peptide/Gene Summary:
This table has 1 row per input peptide and matched gene.
FieldDescriptionExample
PeptidePeptide sequenceHETTFNSI
Gene ENSG IDEnsembl gene identifierENSG00000075624
Gene SymbolHGVS gene symbolACTB
Proteins Encoded by GeneNumber of proteins/transcripts associated with the gene17
Proteins Containing PeptideNumber of proteins/transcripts associated with the gene that also contain the peptide9
Fraction of Matching ProteinsFraction of proteins/transcripts associated with the gene that also contain the peptide0.529
Mean Occurrences per ProteinThe total number of occurrences of this peptide divided by 'Proteins Containing Peptide'.
This will usually be 1 except in unusual circumstances.
(e.g., low-complexity peptides, repetative genes, etc.)
1
Gene TPMTPM of the gene5209
Peptide TPMGene TPM x Mean Occurrenced per Protein5209
Scaled Peptide TPMGene TPM x Fraction of Matching Proteins2755.561

2. Peptide Summary (gene):
This table has 1 row per input peptide. Data for all genes in which the peptide is found are collapsed here. Many of the fields are lists of values derived from the peptide/gene summary table, where you will find associated descriptions.
FieldDescriptionExample
PeptidePeptide sequenceMQKEITAL
Gene SymbolList of gene symbols where peptide is found.ACTB;ACTA2;ACTA1;ACTC1;ACTG2;ACTG1
Total Peptide TPMSum of Peptide TPMs for all genes.9093.988
Median Peptide TPMMedian Peptide TPM for all genes.37.008
Total Scaled Peptide TPMSum of Scaled Peptide TPMs for all genes.6498.506
Median Scaled Peptide TPMMedian Scaled Peptide TPM for all genes.12.408
Gene ENSG IDsList of corresponding Ensembl gene identifiers.ENSG00000075624;ENSG00000107796;ENSG00000143632;
ENSG00000159251;ENSG00000163017;ENSG00000184009
Gene TPMsList of Gene TPMs for corresponding genes.5209;73.763;0.252;0.0045;0.048;3810.92
Peptide TPMsList of Peptide TPMs for corresponding genes.5209.000;73.763;0.252;0.005;0.048;3810.920
Scaled Peptide TPMsList of Scaled Peptide TPMs for corresponding genes.3062.892;24.563;0.252;0.005;0.021;3410.773
Proteins Encoded by GeneList of 'Proteins Encoded by Gene' for corresponding genes.17;3;3;1;7;19
Proteins Containing Peptide (per Gene)List of 'Proteins Containing Peptide' for corresponding genes.10;1;3;1;3;17
Fraction of Proteins Containing Peptide (per Gene)List of 'Fraction of Matching Proteins' for corresonding genes.0.588;0.333;1.000;1.000;0.429;0.895
Gene Mean Occurrences per ProteinList of 'Mean Occurrences per Protein' for corresponding genes.1.000;1.000;1.000;1.000;1.000;1.000

3. Peptide/Transcript Summary:
This table has 1 row per input peptide and matched transcript.
FieldDescriptionExample
PeptideSee peptide/gene summaryHETTFNSI
Gene ENSG IDSee peptide/gene summaryENSG00000184009
Protein ENSP IDEnsembl protein identiferENSP00000458435
Gene SymbolSee peptide/gene summaryACTG1
Number of OccurencesThe number of times the peptide appears in the transcript/protein.
In most cases, this will be 1.
1
Transcript TPMTPM of the transcript.2951.5
Peptide TPMTranscript TPM x Number of Occurrences.2951.5

4. Peptide Summary (transcript):
This table has 1 row per input peptide. Data for all genes in which the peptide is found are collapsed here. Many of the fields are lists of values derived from the peptide/transcript summary table, where you will find associated descriptions.
FieldDescriptionExample
PeptideSee peptide summary for genesMQKEITAL
Gene SymbolsSee peptide summary for genesACTA1;ACTA2;ACTB;ACTC1;ACTG1;ACTG2
Total Peptide TPMSum of the Peptide TPMs for all transcripts in all genes where the peptide occurs.5815.28
Median Peptide TPMMedian Peptide TPM over all transcripts in all genes in which the peptide occurs.0.83
Number of GenesNumber of genes with transcripts encoding the peptide.6
Number of TranscriptsNumber of transcripts encoding the peptide.35
Gene ENSG IDsSee peptide summary for genes.ENSG00000075624;ENSG00000107796;ENSG00000143632;
ENSG00000159251;ENSG00000163017;ENSG00000184009
Protein ENSP IDsList of Ensembl protein identifiers containing the peptide.ENSP00000224784;ENSP00000290378;ENSP00000295137;ENSP00000355644;
ENSP00000355645;ENSP00000386857;ENSP00000386929;ENSP00000407473;
ENSP00000458162;ENSP00000458435;ENSP00000459119;ENSP00000459124;
ENSP00000460464;ENSP00000460660;ENSP00000461407;ENSP00000461672;
ENSP00000466346;ENSP00000477968;ENSP00000493648;ENSP00000494269;
ENSP00000494750;ENSP00000495059;ENSP00000495995;ENSP00000496101;
ENSP00000501773;ENSP00000501862;ENSP00000502286;ENSP00000502821;
ENSP00000505060;ENSP00000505193;ENSP00000505235;ENSP00000506126;
ENSP00000506201;ENSP00000506253;ENSP00000508084
Number of Transcript OccurencesList of 'Number of Occurrences' for corresponding transcripts.1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1
Transcript TPMsList of individual Transcript TPM values.0.37;0;0;0.04;0;0;0;14.62;1.26;2951.5;0.48;0.31;0;1.48;78.18;156.79;8.04;0.16;
34.47;5.07;2468.8;0.69;1.54;39.26;1.37;0;34.65;6.61;2.69;5.95;0.12;0;0.83;0;0
Transcript Peptide TPMsList of individual Peptide TPM values.0.370;0.000;0.000;0.040;0.000;0.000;0.000;14.620;1.260;2951.500;0.480;0.310;0.000;
1.480;78.180;156.790;8.040;0.160;34.470;5.070;2468.800;0.690;1.540;39.260;1.370;
0.000;34.650;6.610;2.690;5.950;0.120;0.000;0.830;0.000;0.000