Axel-F binding predictions - Tutorial
There are multiple ways of what to include inside the CSV file. The simplest form of data you can provide to Axel-F is a CSV file containing peptide sequences only. Depending on available data you have, you may include allele names, tpm values, and/or gene names at your choice.
Example 1.
Contains peptide sequence only.
(Simplest form of data.)
Example 2.
Contains peptide sequence and allele names.
Example 3.
Contains peptide sequence, allele names, and TPM value.
(Using custom TPM values.)
Example 4.
Contains peptide sequence, allele names, gene names.
(Using TPM values from TCGA data.)
Please note that these are CSV formatted file, thus TSV format would not work. If you would like to simply create CSV formatted file on a text editor or directly on the form, please have each column separated by comma.
CSV Format Example In Plain Text
If you filling out data manually on text file/editor or directly on the AXEL-F form,
make sure you separate each data with comma if you have more than one column.
Because Axel-F requires allele and TPM value, you must specify them on the form if not already included in the CSV file. Let's take Example 1, which includes peptide sequences only. Axel-F would require you to specify allele and TPM value on the form. If Example 2 was provided, you don't need to specify allele on the form, but you still do need to provide TPM value. If Example 3 was provided, you don't have to specify anything since required information (Peptide sequence, allele names, TPM values) were all provided inside the CSV file.
As you type in gene name, it will also autosuggest or recommend available genes.
FASTA formatted input is a lot more direct compared to CSV file as there really only one type of format that Axel-F can accept. A valid FASTA format is single-line description starting with ">" character, followed by lines of sequence data.
** Note : Currently Axel-F can accept only one FASTA sequence at a time.
If you fill the textbox with FASTA formatted data, AXEL-F form will detect that it's a FASTA format and will change available options. Compared to CSV format options, you can see from the following image that the form now has a length option added to it.
Rank EL values are directly from the neural networks that NetMHCpan uses. Because these numbers are abstract and cannot be directly used in biological context of Axelf, Rank EL values are translated to IC50 values by comparing the percentile ranks of the two metrics in Trolle dataset, then using interpolation function to map each percentile ranks to corresponding IC50 value. The resulting values are stored in Rank_mapped_to_IC50 column. The same interpolation function based in the Trolle dataset was used to calculate AXEL-F scores.
AXEL-F scores estimates the likelihood of a peptide being presented on HLA and being an epitope.