QuickStart
users can use the demo data to run Vireobulk test
Installation
git clone https://github.com/chengarthur/Vireobulk_analysis.git
cd Vireobulk_analysis/
pip install -r requirements.txt
Run test1 Donor decomposition
python vireobulk.py -c "data/cellSNP.base.vcf.gz" -n test1 -d "data/filter_pbmc10donors.vcf.gz"
This step will generate summary file, demultiplexed donor ratio file and model.theta file demultiplexed donor ratio contains sample X ratio matrix in piled up sample mode theta represents the inferred B Allele frequency of model, the more closed to [0,0.5,1],the more reliable of the result, usually BAF > 0.40 represent confident inference
Run test1 gene abundance estimate
python vireobulk.py -c "data/cellSNP.base.vcf.gz" -n test1 - -n test1 -d "data/filter_pbmc10donors.vcf.gz" -a "data/scvcf_annotated_processed.csv" --gene
This step will generate summary file for demultiplexed donor ratio and model.theta file ,and a geneX donor ratio matrix ,and withe the p value of if the Gene is differentially expressed among donors.
Tag usage
--cellData -c
the cell.vcf file of bulk RNAseq data that piled up by genotyping methods like samtools, and freebayes. However, for mixed bulk data , genotyping is unnecessary ,cellSNP-lite is strongly recommeded to generate this input data
--donorFile -d
the donor genotype reference file of donors exist in pooled bulk samples ,you could extract genotyping information
important:you could use -R from samtools to avoid accidental issues in data input
--outDir -o
Directory for output files, defaults is current workdir
Optional
--gene
when this flag is input , run gene level demultiplexing mode
--notLearnGT
If use, close learnGT mode for ASE in donor ratio demutiplex, default use learned mode
--annotation -a
annotation for SNPs and Genes” , this file can be generated by annoVAR, detailed infomration is listed in preprocessing