Preprocessing

Vireobulk requires at least two file as input to run donor composition, WE STONGLY RECMMAND USE CELLSNP-LITE to generate result piled up file and reference piled up

Reference file

format : one VCF file contains all donor genotype result, which can generated by cellsnp-lite/GATK/Freebayes and other genotyping tools

You can use following command to filter out all multiallelic and indels sites that will not be counted in VireoBulk


bcftools view -Oz --max-alleles 2 --exclude-types indels input.vcf > output.vcf

Result piled up file

out put format : VCF file generate by cellsnp-lite input: aligned NGS sequencing bam file in NGS, Bulk RNAseq, scRNAseq , ATAC-seq etc. please use reference vcf to pile up the result example command:


cellsnp-lite -S $BAMfile -O $OUT_DIR -R $REGION_VCF -p 20 --cellTAG None --UMItag None --gzip



Gene annotation

In gene abundance step Vireo-bulk requires addition gene annotation to get the gene-snp correspondence information the annotation file can be generated by annovar and shell command

## after download and unzip annovar
## download and build annovar database

perl /annovar/annotate_variation.pl -buildver hg38 -downdb -webfrom annovar refGene humandb/
perl /annovar/nnotate_variation.pl -buildver hg38 -downdb -webfrom annovar avsnp150 humandb/

##convert & annotated by annovar 

perl annovar/convert2annovar.pl -format vcf4 cellSNP.base.vcf.gz > base.annovar 
perl annovar/table_annovar.pl base.annovar annovar/humandb/ -buildver hg38 -out filtered_table -remove -protocol refGene,avsnp150 -operation g,f -nastring . -csvout -polish


## convert to VireoBulk input 
Rscript scripts/convert.R --annofile filtered_table.hg38_multianno.csv --out scvcf_annotated_processed.csv