Main Menu
About Us
Useful Links

A step-by-step usage guideline for mirTools

Users can navigate the platform in two modes:

Based on individual sample

Under this mode, only one sample is allowed for easy small RNA annotation. Users need to wait for the result by Email. When the job is finished, an Email will be sent to you with kind notification. In addition, users can also retrieve the responding results in the download page with the job ID.

Step 1: users are requested to enter their email address, through which automatically generated job ID number can be delivered to customer for further result retrieve.

Step 2: Description of data input

Users can choose reference genome through a scroll bar for sequence reads alignment. Length of interval is elective with size 16-32.

For sample upload, the source data are required in specified input file format (*.fa or *.zip or *.gz). Users can directly enter the sequence reads with a restricted file size of maximum 5M.

An example sequence tag format is:
>sample_260_x80 ( "sample_260" represents an unique ID, "80" is read counts)

In addition, we have provided a Perl script for filtration of low quality reads, adapter and polyA to generate the mirTools input format files out of initial fastq format. Responding Perl scrip, example sample and output are available in the download section.


Step 3: Set algorithm parameters for alignment.


Step 4: Description of data output

Once a job is finished, users can receive an Email with a URL link automatically, through which annotation results will be presented in a user-friendly interface containing following 5 parts.

Part one: Length distribution

Overview of sequence reads length distribution is represented in bar graphs.

Unique reads: sequence tags being only one of a particular type.

Expression levels: the number of reads for each tag which reflects relative abundance.


Part two: Mapping reference genome

The percentage of reads mapped to the reference genome.


Part three: Annotation

The classification of the large-scale short reads into known categories.

Reads are sorted out into 5 categories.

Non-coding RNA: sequence tags mapped to Rfam 9.1.

microRNA: sequence tags aligned to miRBase14.

Repeat associated: sequence tags matched RepBase14.09.

mRNA associated: sequence tags mapped to the human genes (UCSC annotation, hg18).

Unclassified: sequence tags aligned to reference genome but could not be mapped to any of the above categories.


Part four: Known miRNAs

Visual sequence alignments matched to a specific miRNA were listed in the left tabulated text files. Meanwhile, the right html tables provide detailed annotation information mainly includes:

Absolute count: total number of sequence tags mapped to a specific microRNA.

Relative count: normalization of matched read counts to the total number of microRNA reads and then multiplied by 10E6.

Most abundant tag: columns are dedicated to show the most abundant tag ID with absolute/relative count and tag sequence.


Part five: Novel miRNA

All unclassified reads were considered for detecting candidate novel miRNAs. Sequence of novel miRNA and miRNA star along with the corresponding tag number, tag count and hairpin structure are provided.


In addition, users can download their results by the unique job ID retrieved. Download of each relevant sub-files or as a archived results in .gz format is feasible to your prefer.

Based on two samples

Under this mode, users need to submit two samples for data input. The output pages were similar to that of single sample analysis.

Extraordinary section: differential expression detection

Scatter plot for display of differentially expressed miRNAs between multiple.


Based on total tag count: differentially expressed microRNA detected between samples according to relative counts of total sequence tag of specific microRNA.

Based on the most abundant tag: differentially expressed microRNA detected between samples according to relative counts of most abundant tag count.

Each individual point in the scattergram corresponding to a miRNA ID. The statistical significance (P-value) was inferred based on the Bayesian method.


Institute of Genomic Medicine/Zhejiang Provincial Key Laboratory of Medical Genetics,
Wenzhou Medical College, Wenzhou 325035, China