EMSAR is a C program that quantifies RNA abundance (gene & RNA isoform expression levels) from RNA-seq data. It takes a transcriptome reference fasta file and an alignment file (BAM/SAM/default bowtie output), and generates RNA abundance estimates in FPKM, TPM and inferred read count.
Usage1 : EMSAR [options] fastafile outdir outprefix bowtieoutfile|SAMfile|BAMfile
Usage2 : bowtie_command | EMSAR [options] fastafile outdir outprefix
ex : EMSAR -p 4 -h R human.rna.fna RNAseq sample22 sample22.bowtieout
ex2: EMSAR -p 4 -B -q human.rna.fna RNAseq sample22 sample22.BAM
ex3: EMSAR -p 4 -S --PE human.rna.fna RNAseq sample22 sample22.SAM
ex4 : bowtie -v 2 -a -m 100 -p 8 human.rna sample22.fastq | EMSAR -p 4 -h R human.rna.fna RNAseq sample22
When RNA-seq reads are aligned to the transcriptome reference sequences, the reads may be mapped to multiple transcripts. Based on this information, each read is assigned to a group, which we termed 'segment'. For example, a read mapped to both isoform A and B is in segment 1, and a read mapped to isoforms A, B and C is in segment 2. For each segment, the length of each segment is computed as the number of all possible unique reads with the same transcript sharing. The read count for a segment should depend on the segment length and the sum of abundance of the transcripts that define the segment. Then, 'sequence-sharing sets' are defined as a group of transcripts, so that any two transcripts that together define a segment would belong to the same set. A sequence-sharing set is usually comprised of isoforms of a gene or multiple genes. A Poisson-based likelihood function is defined as a joint probability over all segments in a sequence-sharing set and the abundance estimates of transcripts in the set that maximize the likelihood are found.