Preprocessing
Vireobulk requires at least two file as input to run donor composition, WE STONGLY RECMMAND USE CELLSNP-LITE to generate result piled up file and reference piled up
Reference file
format : one VCF file contains all donor genotype result, which can generated by cellsnp-lite/GATK/Freebayes and other genotyping tools
You can use following command to filter out all multiallelic and indels sites that will not be counted in VireoBulk
bcftools view -Oz --max-alleles 2 --exclude-types indels input.vcf > output.vcf
Result piled up file
out put format : VCF file generate by cellsnp-lite input: aligned NGS sequencing bam file in NGS, Bulk RNAseq, scRNAseq , ATAC-seq etc. please use reference vcf to pile up the result example command:
cellsnp-lite -S $BAMfile -O $OUT_DIR -R $REGION_VCF -p 20 --cellTAG None --UMItag None --gzip
Gene annotation
In gene abundance step Vireo-bulk requires addition gene annotation to get the gene-snp correspondence information the annotation file can be generated by annovar and shell command
## after download and unzip annovar
## download and build annovar database
perl /annovar/annotate_variation.pl -buildver hg38 -downdb -webfrom annovar refGene humandb/
perl /annovar/nnotate_variation.pl -buildver hg38 -downdb -webfrom annovar avsnp150 humandb/
##convert & annotated by annovar
perl annovar/convert2annovar.pl -format vcf4 cellSNP.base.vcf.gz > base.annovar
perl annovar/table_annovar.pl base.annovar annovar/humandb/ -buildver hg38 -out filtered_table -remove -protocol refGene,avsnp150 -operation g,f -nastring . -csvout -polish
## convert to VireoBulk input
Rscript scripts/convert.R --annofile filtered_table.hg38_multianno.csv --out scvcf_annotated_processed.csv