README.md 2.72 KB
Newer Older
Rebecca E Batorsky's avatar
Rebecca E Batorsky committed
1
2
# sdmmej

Rebecca E Batorsky's avatar
Rebecca E Batorsky committed
3
This is a repository to work on the error prone dna repair project with the McVey lab at Tufts University.
Rebecca E Batorsky's avatar
Rebecca E Batorsky committed
4

Rebecca E Batorsky's avatar
Rebecca E Batorsky committed
5
**Installation**  
Rebecca E Batorsky's avatar
test    
Rebecca E Batorsky committed
6
7
Installation via Miniconda3 is recommended:

Rebecca E Batorsky's avatar
Rebecca E Batorsky committed
8
9
10
11
12
- Configure on the command line to use GitLab on the command line  
`git config --global user.name "tufts username"` (Tufts username is usually 5 letters followed by 2 numbers)  
`git config --global user.email "first.last@tufts.edu"`  

- Download repository  
Rebecca E Batorsky's avatar
Rebecca E Batorsky committed
13
`git clone https://gitlab.tufts.edu/rbator01/sdmmej.git`  
Rebecca E Batorsky's avatar
Rebecca E Batorsky committed
14
You will be prompted for password.
Rebecca E Batorsky's avatar
Rebecca E Batorsky committed
15
Note that you can also download the repository from the web browser if there are problems configuring command line git access.
Rebecca E Batorsky's avatar
test    
Rebecca E Batorsky committed
16

Rebecca E Batorsky's avatar
Rebecca E Batorsky committed
17
- Download and install Miniconda3, the following will work on Mac OS  
Rebecca E Batorsky's avatar
test    
Rebecca E Batorsky committed
18
19
`curl https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh --output Miniconda3-latest-MacOSX-x86_64.sh`  
`bash Miniconda3-latest-MacOSX-x86_64.sh`  
Rebecca E Batorsky's avatar
Rebecca E Batorsky committed
20
follow prompts to accept license and "yes" to running "conda init" on startup  
Rebecca E Batorsky's avatar
test    
Rebecca E Batorsky committed
21
22
`source ~/.bash_profile`  

Rebecca E Batorsky's avatar
Rebecca E Batorsky committed
23
- Create Conda environment called sdmmej  
Rebecca E Batorsky's avatar
test    
Rebecca E Batorsky committed
24
25
26
`conda config --add channels conda-forge`  
`conda config --add channels r`  
`conda config --add channels bioconda`  
Rebecca E Batorsky's avatar
Rebecca E Batorsky committed
27
`conda create -n sdmmej r-base=4.0.3 r-dplyr r-stringr bioconductor-biostrings python=2.7 pandas r-tidyverse`  
Rebecca E Batorsky's avatar
test    
Rebecca E Batorsky committed
28
29


Rebecca E Batorsky's avatar
Rebecca E Batorsky committed
30
- To activate and deactivate you will use  
Rebecca E Batorsky's avatar
test    
Rebecca E Batorsky committed
31
32
`conda activate sdmmej`  
`conda deactivate sdmmej`  
Rebecca E Batorsky's avatar
Rebecca E Batorsky committed
33

Rebecca E Batorsky's avatar
Rebecca E Batorsky committed
34
**Pipeline Script**
Rebecca E Batorsky's avatar
Rebecca E Batorsky committed
35
36
37

This script takes the path to the HiFibr output file as the single command line argument.
Other default arguments are set within the script.
Rebecca E Batorsky's avatar
Rebecca E Batorsky committed
38

39
Example usage on test data: 
Rebecca E Batorsky's avatar
Rebecca E Batorsky committed
40

Rebecca E Batorsky's avatar
Rebecca E Batorsky committed
41
42
43
`conda activate sdmmej`  
`cd sdmmej-master`  
`sh run_pipeline.sh test_data/polyA1Seq/PolyA1Seq_testdata.csv`  
Rebecca E Batorsky's avatar
Rebecca E Batorsky committed
44

Rebecca E Batorsky's avatar
Rebecca E Batorsky committed
45
This generates output files in a directory `PolyA1Seq_testdata_output` with the following output files  
Rebecca E Batorsky's avatar
Rebecca E Batorsky committed
46

Rebecca E Batorsky's avatar
Rebecca E Batorsky committed
47
- Outputs from Hifibr processing script  
Rebecca E Batorsky's avatar
Rebecca E Batorsky committed
48
49
50
51
`PolyA1Seq_testdata_reclassified.csv` Same format input, but adds an “ID” column as well as a column for how the sequence was reclassified  
`PolyA1Seq_testdata_complex.txt` sequences that could not be reclassified as ins or del  
`PolyA1Seq_testdata_insertion.txt` all ins sequences  
`PolyA1Seq_testdata_deletion.txt` all del sequences with dashes  
Rebecca E Batorsky's avatar
Rebecca E Batorsky committed
52

Rebecca E Batorsky's avatar
Rebecca E Batorsky committed
53
54
55
- Outputs from Deletion script  
`PolyA1Seq_testdata_deletion_consistency_log.txt`  
`PolyA1Seq_testdata_deletion_consistency_table.txt`  
Rebecca E Batorsky's avatar
Rebecca E Batorsky committed
56

57
58
59
- Outputs from Insertionn script, which is run on both insertions and complex separately
`PolyA1Seq_testdata_insertion_insertion_consistency2.csv`  
`PolyA1Seq_testdata_insertion_insertion_consistency_long2.csv`  
Rebecca E Batorsky's avatar
Rebecca E Batorsky committed
60
61
62
`PolyA1Seq_testdata_insertion_insertion_alignment2.csv`  
`PolyA1Seq_testdata_complex_insertion_consistency2.csv`  
`PolyA1Seq_testdata_complex_insertion_consistency_long2.csv`  
Rebecca E Batorsky's avatar
Rebecca E Batorsky committed
63
`PolyA1Seq_testdata_complex_insertion_alignment2.csv`