Commit c0e04970 authored by Rebecca E Batorsky's avatar Rebecca E Batorsky
Browse files

added material

parent d487341b
Pipeline #12 failed with stages
---
title: "Bioinformatics for RNAseq Workshop"
author: "Rebecca Batorsky, Sr. Bioinformatics Specialist at Tufts"
date: "April 2019"
output:
xaringan::moon_reader:
css: ["default", "custom.css","default-fonts", "hygge","footer-header.css"]
lib_dir: libs
nature:
highlightStyle: github
highlightLines: true
countIncrementalSlides: false
---
## Experimental Methods
RNA Extraction + QC
Library Prep methods
Illumina Sequencing
---
## Experimental Design
Avoiding bias: Randomization, blocking
Statistical power
Capturing variability
---
## Bioinformatics Workflow
Our analysis will have the following steps with QC at each stage
```{r, out.width = "500px",echo=FALSE,fig.align="center"}
knitr::include_graphics("fig/wf0.png")
```
---
## Data for the course
.pull-left[
- mRNA data from 48 replicates of two Saccromyces cerevisiae populations
- Wildtype (WT) and $\Delta$SNF2
- Unusually comprehensive analysis of variability in sequencing replicates
]
.pull-right[
<img src="fig/paper.png" width="100%">
<img src="fig/gier.png" width="100%">
]
<div class="my-footer"><span>Gierlinski et al Bioinformatics 2015&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4754627</span></div>
---
## Start Jupyter Lab on the HPC cluster On Demand
.pull-left[
1. In **Chrome** web browser: [https://ondemand.cluster.tufts.edu](http://ondemand.cluster.tufts.edu)
2. Interactive Apps -> Jupyter Lab
+ Hours: 3 hours
+ Core: 8 cores
+ Memory: 64 GB
4. Press "Connect to Jupyter Lab"
5. Choose "Terminal" from the Launcher menu
6. A terminal will appear on the compute node where your job is running, where you can type bash commands
]
.pull-right[
```{r, out.width = "70%",echo=FALSE,fig.align="center"}
knitr::include_graphics("fig/jn.png")
knitr::include_graphics("fig/terminal.png")
```
]
---
## Setting up our working directory
Make a directory in the training space called `your-user-name` e.g.rbator01:
```
cd /cluster/tufts/bio/tools/
mkdir training/<your-user-name>
```
Copy the workshop materials into the new directory:
```
cp -r tutorials/bioinformatics-rnaseq/ training/<your-user-name>
```
Make a soft link into your home directory:
```
ln –s training/<your-user-name>/bioinformatics-rnaseq/ ~/
```
You now have a directory called "bioinformatics_rnaseq" in your home directory containing:
```
tree .
```
---
## Downloading data from a public archive
In "bioinformatics_rnaseq/data" we have a tab delimited file with samples information for study ERP004763 at [European Nucleotide Archive](https://www.ebi.ac.uk/ena)
Use bash utility **head** to look at the first few lines
```{bash, echo=TRUE, eval=FALSE}
cd bioinformatics_rnaseq/data
head ERP004763_info.txt
```
```
run_accession condition biol_rep fastq_ftp
ERR458493 WT 1 ftp.sra.ebi.ac.uk/vol1/fastq/ERR458/ERR458493/ERR458493.fastq.gz
ERR458494 WT 1 ftp.sra.ebi.ac.uk/vol1/fastq/ERR458/ERR458494/ERR458494.fastq.gz
ERR458495 WT 1 ftp.sra.ebi.ac.uk/vol1/fastq/ERR458/ERR458495/ERR458495.fastq.gz
ERR458496 WT 1 ftp.sra.ebi.ac.uk/vol1/fastq/ERR458/ERR458496/ERR458496.fastq.gz
ERR458497 WT 1 ftp.sra.ebi.ac.uk/vol1/fastq/ERR458/ERR458497/ERR458497.fastq.gz
ERR458498 WT 1 ftp.sra.ebi.ac.uk/vol1/fastq/ERR458/ERR458498/ERR458498.fastq.gz
ERR458499 WT 1 ftp.sra.ebi.ac.uk/vol1/fastq/ERR458/ERR458499/ERR458499.fastq.gz
ERR458500 SNF2 1 ftp.sra.ebi.ac.uk/vol1/fastq/ERR458/ERR458500/ERR458500.fastq.gz
ERR458501 SNF2 1 ftp.sra.ebi.ac.uk/vol1/fastq/ERR458/ERR458501/ERR458501.fastq.gz
```
---
## Downloading data from a public archive
Find the accession numbers corresponding to WT and replicate 1 using contents using bash utlilty **awk**
```{bash, echo=TRUE, eval=FALSE}
cat ERP004763_info.txt | awk '$2=="WT"'
```
```
ERR458493 WT 1 ftp.sra.ebi.ac.uk/vol1/fastq/ERR458/ERR458493/ERR458493.fastq.gz
...
ERR459206 WT 48 ftp.sra.ebi.ac.uk/vol1/fastq/ERR459/ERR459206/ERR459206.fastq.gz
```
Find the accession numbers corresponding to the first biological replicate:
```bash
cat ERP004763_info.txt | awk '$3==1'
```
```
ERR458493 WT 1 ftp.sra.ebi.ac.uk/vol1/fastq/ERR458/ERR458493/ERR458493.fastq.gz
...
ERR458506 SNF2 1 ftp.sra.ebi.ac.uk/vol1/fastq/ERR458/ERR458506/ERR458506.fastq.gz
```
The pipe symbol "|" will tell bash to send the output of one program to another program for further processing
---
.content-box-yellow[
**Exercise** (2 minutes):
Can you combine the above commands using a pipe in order to output only the lines corresponding to WT replicate 1?
]
---
## Downloading fastq files
We'll open a script in a text editor in Jupyter lab
.pull-left[
1. Click on "Files" in the left menu
2. Click the "Home" icon and navigate to "bioinformatics-rnaseq"
3. Double click the file "download_samples.sh" and it will open in a text editor
]
.pull-right[
```{r, out.width = "70%",echo=FALSE}
knitr::include_graphics("fig/jn_files.png")
```
]
```{bash, echo=TRUE, eval=FALSE}
#!/bin/bash
mkdir WT_1
cd WT_1
cat ERP004763_info.txt | awk '$3==1'| awk '$2=="WT"'| cut -f4 | xargs wget
cd ..
```
--
The script can be run by returning to your terminal and doing:
```{r, engine = 'bash', echo=TRUE, eval=FALSE}
cd ../
./scripts/download_samples.sh
```
You should now have a folder WT_1 in your bioinformatics-rnaseq directory with seven fastq.gz files.
---
## Fastq files
```{r, out.width = "500px",echo=FALSE,fig.align="center"}
knitr::include_graphics("fig/wf1.png")
```
---
## Fastq format
View the first 4 lines of one of the gzipped fastq files without decompressing:
```{bash, echo=TRUE, eval=FALSE}
gzip -cd WT_1/ERR458493.fastq.gz | head -n 4
```
```
@ERR458493.1 DHKW5DQ1:219:D0PT7ACXX:1:1101:1724:2080/1
CGCAAGACAAGGCCCAAACGAGAGATTGAGCCCAATCGGCAGTGTAGTGAA
+
B@@FFFFFHHHGHJJJJJJIJJGIGIIIGI9DGGIIIEIGIIFHHGGHJIB
```
The four lines corresponding to a single read are:
1. Sequence identifier: \@Read ID and sequencing run info
2. Sequence
3. \+ optionally followed by sequence identifier again
4. Quality scores
---
## Quality Scores
Base quality scores encode the prediction of the probabilty of an error in base calling by the sequencing instrument.
The sequencing quality score of a given base, Q, is defined by: $$Q = -10log_{10}(e)$$ where e is the estimated probability that the call is wrong.
| Quality Score | Probability of an Incorrect Base Call | Inferred Base Call Accuracy |
| ----- |:-----:|:-----:|
| 10 (Q10) | 1 in 10 | 90% |
| 20 (Q20) | 1 in 100 | 99% |
| 30 (Q30) | 1 in 1000 | 99.9% |
--
So why don't we see numbers in the quality score?
```
B@@FFFFFHHHGHJJJJJJIJJGIGIIIGI9DGGIIIEIGIIFHHGGHJIB
```
---
### Quality Scores
In FASTQ files produced by Illumia software 1.8+, quality scores are encoded as the character with an ASCII code equal to its value + 33.
```{r, out.width = "60%",echo=FALSE}
knitr::include_graphics("fig/illumina_fastq_coding.png")
```
```
@ERR458493.1 DHKW5DQ1:219:D0PT7ACXX:1:1101:1724:2080/1
CGCAAGACAAGGCCCAAACGAGAGATTGAGCCCAATCGGCAGTGTAGTGAA
+
B@@FFFFFHHHGHJJJJJJIJJGIGIIIGI9DGGIIIEIGIIFHHGGHJIB
```
B = Q(33), probability of error of 1 in 2000
<div class="my-footer"><span>https://www.illumina.com/science/education/sequencing-quality-scores.html/</span></div>
---
## FastQC
Widely used tool for both DNA and RNA sequencing data that is run on each fastq file.
To use, in the terminal
```{bash, echo=TRUE, eval=FALSE}
module load fastqc/0.11.8
mkdir fastqc
```
Since FastQC can run on multiple files at once, we'll use a wildcard "*" to indicate each file in the folder "WT_1":
```{bash, echo=TRUE, eval=FALSE}
fastqc WT_1/*.fastq.gz -o fastqc --extract
```
For a quick check, we can look at "summary.txt" file for each fastq, which has a line for each test that fastQC performs
```{bash, echo=TRUE, eval=FALSE}
cat fastqc/ERR458498_fastqc/summary.txt
```
.small[
```
PASS Basic Statistics ERR458498.fastq.gz
PASS Per base sequence quality ERR458498.fastq.gz
WARN Per tile sequence quality ERR458498.fastq.gz
PASS Per sequence quality scores ERR458498.fastq.gz
FAIL Per base sequence content ERR458498.fastq.gz
PASS Per sequence GC content ERR458498.fastq.gz
PASS Per base N content ERR458498.fastq.gz
PASS Sequence Length Distribution ERR458498.fastq.gz
WARN Sequence Duplication Levels ERR458498.fastq.gz
PASS Overrepresented sequences ERR458498.fastq.gz
PASS Adapter Content ERR458498.fastq.gz
```
]
---
## MultiQC
This tool combines QC output across multiple samples.
To use on the HPC (not installed as a module yet):
```{bash, echo=TRUE, eval=FALSE}
module load anaconda
source activate /cluster/tufts/bio/tools/conda_envs/multiqc/1.7/
```
To run multiQC on all the fastq files for WT_1:
```{r, engine = 'bash', echo=TRUE, eval=FALSE}
multiqc fastqc/ -o multiqc
```
Deactivate the conda environment
```
source deactivate
```
---
## Download MultiQC report
- Click on Files-> Home Directory -> bioinformatics_rnaseq -> multiqc
- Right click on multiqc_report.html -> Download
- Double click on file and it will open in a web browser
```{r, out.width = "40%",echo=FALSE,fig.align="left"}
knitr::include_graphics("fig/jn_files_2.png")
```
<div class="my-footer"><span>https://multiqc.info/</span></div>
---
## FastQC results
We'll go through two of the plots that FastQC produces
- Sequence Quality Histograms
- Per Base Sequence Content
---
## FastQC results - Sequence Quality Histograms
- Distribution of quality scores across all bases at each position in the reads.
- Drops at the end of reads due to molecules in a given sequencing cluster getting out of sync
- Drops in quality below Phred score of ~20 can be handled by read trimming.
```{r, out.width = "100%",echo=FALSE,fig.align="left"}
knitr::include_graphics("fig/fastqc0.png")
```
<div class="my-footer"><span>https://sequencing.qcfail.com//</span></div>
---
## FastQC results - Sequence Quality Histograms
Drops in quality below Phred score of ~20 can be handled by read trimming.
```{r, out.width = "100%",echo=FALSE,fig.align="left"}
knitr::include_graphics("fig/fastqc4.png")
```
<div class="my-footer"><span>https://sequencing.qcfail.com//</span></div>
---
## FastQC results - Per Base Sequence Content
```{r, out.width = "60%",echo=FALSE,fig.align="left"}
knitr::include_graphics("fig/fastqc1.png")
```
- Proportion of each position for which each DNA base has been called
- RNAseq data tends to show a positional sequence bias in the first ~12 bases
- The "random" priming step during library construction is not truly random and certain hexamers are more prevalent than others
- Studies have shown that this does NOT cause mis-called bases or drastic bias in sequenced fragments
<div class="my-footer"><span>https://sequencing.qcfail.com/articles/positional-sequence-bias-in-random-primed-libraries/</span></div>
---
## FastQC results - Per Base Sequence Content
The right plot results show a strong positional bias throughout the reads, which in this case is due to the library having a certain sequence that is overrepresented
```{r, out.width = "100%",echo=FALSE,fig.align="left"}
knitr::include_graphics("fig/fastqc5.png")
```
<div class="my-footer"><span>https://sequencing.qcfail.com/articles/positional-sequence-bias-in-random-primed-libraries/</span></div>
---
## Read Alignment
```{r, out.width = "500px",echo=FALSE,fig.align="center"}
knitr::include_graphics("fig/wf2.png")
```
---
## Read Alignment
.pull-left[
- Find the genomic origin of sequence fragments
- RNAseq data originates from spliced mRNA, ni introns
- When aligning to the genome, our aligner must find a spliced alignment for reads
]
.pull-right[
```{r, out.width = "100%",echo=FALSE,fig.align="center"}
knitr::include_graphics("fig/align0.png")
```
<div class="my-footer"><span>http://chagall.med.cornell.edu/RNASEQcourse/</span></div>
]
---
## STAR (Spliced Transcripts Alignment to a Reference)
.pull-left[
- Highly accurate, memory intensive aligner
- Two phase mapping process
1. Find Maximum Mappable Prefix (MMP) in a read
.small[ a contiguous sequence in the read that matches a segment of the genome
Continue with the unmapped portion of the read. If a read is not completely covered by MMPs, the MMP are extended with mismatches (a) indels (b) or soft-clipped (c in the Figure below) ]
2. Clustering, stitching and scoring
.small[
Using MMP a anchors, reads are stitched together. All seeds that fall within a user-defined genomic window (which determines the maximum intron length) will be clustered. If all seeds in a read are not within the window, chimeric alignment is produced, such as would happen in gene fusion.
]
]
.pull-right[
```{r,echo=FALSE,fig.align="left"}
knitr::include_graphics("fig/align1.png")
```
]
<div class="my-footer"><span>Dobin et al Bioinformatics 2013&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3530905/</span></div>
---
## Genome annotation standards
- STAR can use an annotation file gives the location and structure of genes in order to improve alignment in known splice junctions
- Annotation is dynamic and there are at least three major sources of annotation
```{r, out.width = "50%",echo=FALSE,fig.align="center"}
knitr::include_graphics("fig/ann0.png")
```
- The intersection among RefGene, UCSC, and Ensembl annotations shows high overlap. RefGene has the fewest unique genes, while more than 50% of genes in Ensembl are unique
- Be consistent!
<div class="my-footer"><span>Zhao et al Bioinformatics 2015&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-015-1308-8</span></div>
---
## Genome annotation standards
RefSeq and Ensemble have different gene definitions for gene PIK3CA can give rise to differences in gene quantification.
```{r, out.width = "800px",echo=FALSE,fig.align="left"}
knitr::include_graphics("fig/ann1.png")
```
.small[" We demonstrated that the choice of a gene model has a dramatic effect on both gene quantification and differential analysis. Our research will help RNA-Seq data analysts to make an informed choice of gene model in practical RNA-Seq data analysis."]
<div class="my-footer"><span>https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-015-1308-8</span></div>
---
## Final note on standards
```{r, out.width = "600px",echo=FALSE,fig.align="center"}
knitr::include_graphics("fig/ann2.png")
```
<div class="my-footer"><span>https://xkcd.com/927/</span></div>
---
## Reference files on the HPC
Tufts HPC hosts genome reference data from [UCSC](https://genome.ucsc.edu/cgi-bin/hgTables
) at the following location
```
/cluster/tufts/bio/data/genomes
```
For our data, we will need reference files from Saccharomyces_cerevisiae genome version sacCer3.
We can explore available files like this:
```{bash, echo=TRUE, eval=FALSE}
cd /cluster/tufts/bio/data/genomes/Saccharomyces_cerevisiae/UCSC/sacCer3/
tree -d
```
.pull-left[
```{bash, echo=TRUE, eval=FALSE}
├── Annotation
│ ├── Genes -
│ └── SmallRNA
└── Sequence
├── AbundantSequences
├── Bowtie2Index
├── BowtieIndex
├── BWAIndex
│ ├── version0.5.x
│ └── version0.6.0
├── Chromosomes
├── HISAT2
├── STAR
└── WholeGenomeFasta
```
]
.pull-right[
The reference files that we need for this analysis are:
1. Genome indexed for STAR aligner, under Sequence/STAR
2. Annotation file in GTF and BED formats, under Annotation/Genes/
]
---
## Annotation file formats
STAR uses a GTF format for genome annotation
```{bash, echo=TRUE, eval=FALSE}
cd /cluster/tufts/bio/data/genomes/Saccharomyces_cerevisiae/UCSC/sacCer3/Annotation/Genes
head sacCer3.gtf | column -ts $'\t'
```
```
chrI sacCer3.genepred transcript 335 649 . + . gene_id "YAL069W"; transcript_id "YAL069W";
chrI sacCer3.genepred exon 335 649 . + . gene_id "YAL069W"; transcript_id "YAL069W"; exon_number "1"; exon_id "YAL069W.1";
chrI sacCer3.genepred CDS 335 646 . + 0 gene_id "YAL069W"; transcript_id "YAL069W"; exon_number "1"; exon_id "YAL069W.1";
chrI sacCer3.genepred start_codon 335 337 . + 0 gene_id "YAL069W"; transcript_id "YAL069W"; exon_number "1"; exon_id "YAL069W.1";
chrI sacCer3.genepred stop_codon 647 649 . + 0 gene_id "YAL069W"; transcript_id "YAL069W"; exon_number "1"; exon_id "YAL069W.1";
chrI sacCer3.genepred transcript 538 792 . + . gene_id "YAL068W-A"; transcript_id "YAL068W-A";
chrI sacCer3.genepred exon 538 792 . + . gene_id "YAL068W-A"; transcript_id "YAL068W-A"; exon_number "1"; exon_id "YAL068W-A.1";
chrI sacCer3.genepred CDS 538 789 . + 0 gene_id "YAL068W-A"; transcript_id "YAL068W-A"; exon_number "1"; exon_id "YAL068W-A.1";
chrI sacCer3.genepred start_codon 538 540 . + 0 gene_id "YAL068W-A"; transcript_id "YAL068W-A"; exon_number "1"; exon_id "YAL068W-A.1";
chrI sacCer3.genepred stop_codon 790 792 . + 0 gene_id "YAL068W-A"; transcript_id "YAL068W-A"; exon_number "1"; exon_id "YAL068W-A.1";
```
There is a simpler format which contains only one feature type, called BED.
Since there is no feature type, we download a table for each feature type of interest, e.g.:
```{r, engine = 'bash', echo=TRUE, eval=FALSE}
head sacCer3.sgdGene.wholegene.bed
```
```
chrI 130798 131983 YAL012W 0 + 130798 131983 0 1 1185, 0,
chrI 334 649 YAL069W 0 + 334 649 0 1 315, 0,
chrI 537 792 YAL068W-A 0 + 537 792 0 1 255, 0,
```
---
.content-box-yellow[
**Exercise** 3 (10 minutes):
The gene we'll be analyzing is called SNF2 or YOR290C. Check to make sure it's represented consistently in the GTF and BED files using bash utility **grep**, e.g.:
```
grep YOR290C <file name>
```
Using the files above, how long is the gene? Does it have any introns?
]
--
```{r, out.width = "500px",echo=FALSE,fig.align="left"}
knitr::include_graphics("fig/ucsc_conventions.png")
```
<div class="my-footer"><span>http://genome.ucsc.edu/blog/the-ucsc-genome-browser-coordinate-counting-systems/</span></div>
---
## Read Alignment
STAR alignment is a 2 step process:
1. Index Genome: genomeGenerate, needs to be done only once per genome
2. Align reads
Although we'll use pre-indexed genome it's worth knowing how it's done
--
Open up bioinformatics-rnaseq/scripts/sbatch_star_index.sh in a text editor
```{bash, echo=TRUE, eval=FALSE}
# load the module
module load STAR/2.5.2b
# create a directory to store the index in
REF_DIR=/cluster/tufts/bio/data/genomes/Saccharomyces_cerevisiae/UCSC/sacCer3
# Run STAR in "genomeGenerate" mode
STAR --runMode genomeGenerate \
--genomeDir ${REF_DIR}/Sequence/STAR \
--genomeFastaFiles ${REF_DIR}/Sequence/WholeGenomeFasta/genome.fa \
--runThreadN 12
```
---
## Read Alignment: running the command
Open file
The basic command
```{bash, echo=TRUE, eval=FALSE}
STAR --genomeDir <directory of indexed genome> \
--readFilesIn <CSV list of gzipped fastq files> \
--readFilesCommand zcat \
--outFileNamePrefix <prefix for output file> \
--outFilterMultimapNmax <max number of alignment positions to allow> \
--outSAMtype BAM SortedByCoordinate \
--runThreadN <threads to use - must match SLURM request> \
--alignIntronMin <min size of intron> \
--alignIntronMax <max size of intron>
--sjdbGTFfile <path to GTF file> \
--sjdbOverhang <read length -1 >
```
---
Open up bioinformatics-rnaseq/scripts/star_align.sh in a text editor
.small[
```{bash, echo=TRUE, eval=FALSE}
## Use STAR aligner to align all fastq files in a directory
## This step must be done for each sample
module load STAR/2.7.0a
mkdir -p STAR
## File is run as sbatch sbatch_align_star.sh <folder>
## Where <folder> contains fastq.gz files for 1 sample
SAMPLE=$1
## Obtain a comma separated list of files
FILES=`ls -m ${SAMPLE}/*fastq.gz | tr -d ' ' | tr -d '\n'`
## Name the output file, for example if the folder is /cluster/tufts/sample_1
## The output will have prefix sample_1_
OUT=$(basename $SAMPLE)
echo "Starting to align: $FILES"
echo "Output file will have prefix: $OUT"
REF_DIR=/cluster/tufts/bio/data/genomes/Saccharomyces_cerevisiae/UCSC/sacCer3
# execute STAR in the runMode "alignReads"
STAR --genomeDir ${REF_DIR}/Sequence/STAR \
--readFilesIn $FILES \
--readFilesCommand zcat \
--outFileNamePrefix STAR/${OUT}_ \
--outFilterMultimapNmax 1 \
--outSAMtype BAM SortedByCoordinate \
--runThreadN 12 \
--alignIntronMin 1 \
--alignIntronMax 2500 \
--sjdbGTFfile ${REF_DIR}/Annotation/Genes/sacCer3.gtf \
--sjdbOverhang 49
# generate the bam index
module load samtools/1.2
samtools index STAR/${OUT}_Aligned.sortedByCoord.out.bam
```
]
---
## Read Alignment
.content-box-yellow[
**Exercise** (5 minutes): Run the command in the terminal using sbatch
```
sbatch scripts/sbatch_star_align.sh WT_1
```
Check the result of your job submission
```
squeue -u <your user name>
```
View the outputs of your job while it's running like this:
```
cat <job-number>.err
cat <job-number>.out
```
Modify the script to ignore small introns and run it again:
```
--outFileNamePrefix STAR/${OUT}_nosmallintron_ \
--alignIntronMin 2500 \
--alignIntronMax 2500 \
```
]
---
## Read Alignment: Result files
```
ls -lh STAR/
```
.small[
```
-rw-rw-r-- 1 rbator01 biotools 272M Mar 25 15:45 WT_1_Aligned.sortedByCoord.out.bam
-rw-rw-r-- 1 rbator01 biotools 34K Mar 25 15:45 WT_1_Aligned.sortedByCoord.out.bam.bai
-rw-rw-r-- 1 rbator01 biotools 1.8K Mar 25 15:45 WT_1_Log.final.out
-rw-rw-r-- 1 rbator01 biotools 24K Mar 25 15:45 WT_1_Log.out
-rw-rw-r-- 1 rbator01 biotools 364 Mar 25 15:45 WT_1_Log.progress.out
-rw-rw-r-- 1 rbator01 biotools 46K Mar 25 15:45 WT_1_SJ.out.tab
drwx------ 2 rbator01 biotools 4.0K Mar 25 15:43 WT_1__STARgenome
```
]
---
```bash
cat WT_1_Log.final.out
```
.small[
```bash
Number of input reads | 7014609
Average input read length | 51
UNIQUE READS:
Uniquely mapped reads number | 6014703
Uniquely mapped reads % | 85.75%
Average mapped length | 50.72
Number of splices: Total | 55354
Number of splices: Annotated (sjdb) | 47840
Number of splices: GT/AG | 50848