Abstract Summary/Description
With nearly 250 million cases reported in 2022, malaria remains a significant global health challenge, with Anopheles mosquitoes as primary vectors. Their developing resistance to insecticides undermines control efforts. Advances in Next Generation Sequencing have made it possible to investigate mosquito genomes, resulting in millions of RNA-Sequence reads however many fail to align with reference genomes and are discarded. These “unmapped reads” often go unexplored and may contain important biological information such as bacterial symbionts, viruses, or missing genes. The goal of this study is to perform a comparative analysis of unmapped RNA-Seq reads in the Anopheles species and explore whether these unmapped reads could originate from bacterial symbionts, viruses, or missing genes from reference genomes, and if GC content might affect alignment. To investigate this, four published RNA-Seq datasets from Anopheles species were analyzed using a standardized bioinformatics pipeline. Quality control of reads was performed using FASTQC, followed by alignment to the reference genome using HISAT2. Unmapped reads were extracted with Samtools and characterized using both small subunit (SSU) rRNA and non-rRNA-based taxonomic classification tools, with controls to mitigate bias introduced by rRNA depletion during library preparation. GC content differences between mapped and unmapped reads were assessed using seqkit and statistical testing in R. No significant GC content differences were found between mapped and unmapped reads, suggesting that GC bias does not explain the lack of alignment. SSU rRNA analysis revealed no bacterial assignments, likely due to rRNA depletion, while less than 0.1% of unmapped reads aligned with known viruses. Ongoing work includes taxonomic classification of non-rRNA reads, de novo assembly, and functional annotation to identify novel transcripts and their role in insecticide resistance. By leveraging insights from unmapped RNA-Seq reads, this research may illuminate novel genetic factors related to insecticide resistance, aiding in the development of targeted malaria control strategies, and addressing a critical public health challenge.