QuickStart

users can use the demo data to run Vireobulk test

Installation



git clone https://github.com/chengarthur/Vireobulk_analysis.git
cd Vireobulk_analysis/
pip install -r requirements.txt

Run test1 Donor decomposition


python vireobulk.py -c "data/cellSNP.base.vcf.gz" -n test1 -d "data/filter_pbmc10donors.vcf.gz"

This step will generate summary file, demultiplexed donor ratio file and model.theta file demultiplexed donor ratio contains sample X ratio matrix in piled up sample mode theta represents the inferred B Allele frequency of model, the more closed to [0,0.5,1],the more reliable of the result, usually BAF > 0.40 represent confident inference

Run test1 gene abundance estimate

python vireobulk.py -c "data/cellSNP.base.vcf.gz" -n test1 - -n test1 -d "data/filter_pbmc10donors.vcf.gz" -a "data/scvcf_annotated_processed.csv" --gene



This step will generate summary file for demultiplexed donor ratio and model.theta file ,and a geneX donor ratio matrix ,and withe the p value of if the Gene is differentially expressed among donors.

Tag usage

--cellData -c 

the cell.vcf file of bulk RNAseq data that piled up by genotyping methods like samtools, and freebayes. However, for mixed bulk data , genotyping is unnecessary ,cellSNP-lite is strongly recommeded to generate this input data

--donorFile -d

the donor genotype reference file of donors exist in pooled bulk samples ,you could extract genotyping information

important:you could use -R from samtools to avoid accidental issues in data input

--outDir -o 

Directory for output files, defaults is current workdir

Optional

--gene

when this flag is input , run gene level demultiplexing mode

  --notLearnGT

If use, close learnGT mode for ASE in donor ratio demutiplex, default use learned mode

--annotation -a

annotation for SNPs and Genes” , this file can be generated by annoVAR, detailed infomration is listed in preprocessing