Semantic Search with Vector Database: A Comprehensive Review of Models, Indexing and Applications

Main Article Content

Tanay Chowdhury

Abstract

The use of semantic search with the help of vector databases has become an impressive paradigm of retrieving the pertinent information by offering the contextual and conceptual sense of the information searching more than using the conventional methods of keyword searching. This paper provides an in-depth overview of the models of vector representation, transformer-based semantic encoders, and technologies of vectors database that jointly allow efficient and error-free semantic search. Classical distributional semantics, word-level embeddings, and transformer architectures are presented as background methods of making designed generating meaningful vectors representations. The paper also looks at the contemporary databases of vectors and indexing mechanisms which enable scalable similarity search in high-dimensional data. Moreover, different distance measures, hash algorithms and indexing strategies based on graphs are evaluated to determine how they can be used to maximize retrieval. Lastly, the paper presents practical examples of semantic searching with the use of the vector databases with text, image, audio and conversational applications, outlining both the main challenges and research opportunities.

Article Details

How to Cite
Chowdhury, T. (2026). Semantic Search with Vector Database: A Comprehensive Review of Models, Indexing and Applications. The Eastasouth Journal of Information System and Computer Science, 3(03), 323–336. https://doi.org/10.58812/esiscs.v3i03.938
Section
Articles

References

[1] S. Mao, L.-L. Zhang, and Z.-G. Guan, “An LSTM&Topic-CNN Model for Classification of Online Chinese Medical Questions,” IEEE Access, vol. 9, pp. 52580–52589, 2021, doi: 10.1109/ACCESS.2021.3070375.

[2] A. Perevalov and A. Both, “Improving Answer Type Classification Quality Through Combined Question Answering Datasets,” in Knowledge Science, Engineering and Management, Cham, 2021, pp. 191–204.

[3] K. Mao, J. Xu, X. Yao, J. Qiu, K. Chi, and G. Dai, “A Text Classification Model via Multi-Level Semantic Features,” Symmetry (Basel)., vol. 14, no. 9, 2022, doi: 10.3390/sym14091938.

[4] H. Götzsche, “An Approach to Conceptualisation and Semantic Knowledge: Some Preliminary Observations,” AI, vol. 3, no. 3, pp. 582–600, 2022, doi: 10.3390/ai3030034.

[5] A. L. Lezama-Sánchez, M. Tovar Vidal, and J. A. Reyes-Ortiz, “An Approach Based on Semantic Relationship Embeddings for Text Classification,” Mathematics, 2022, doi: 10.3390/math10214161.

[6] U. Krzeszewska, A. Poniszewska-Marańda, and J. Ochelska-Mierzejewska, “Systematic Comparison of Vectorization Methods in Classification Context,” Appl. Sci., vol. 12, no. 10, 2022, doi: 10.3390/app12105119.

[7] A. J.-P. Tixier, M. R. Hallowell, B. Rajagopalan, and D. Bowman, “Automated content analysis for construction safety: A natural language processing system to extract precursors and outcomes from unstructured injury reports,” Autom. Constr., vol. 62, pp. 45–56, Feb. 2016, doi: 10.1016/j.autcon.2015.11.001.

[8] H. Aujla, M. J. C. Crump, M. T. Cook, and R. K. Jamieson, “The Semantic Librarian: A search engine built from vector-space models of semantics,” Behav. Res. Methods, vol. 51, no. 6, pp. 2405–2418, 2019, doi: 10.3758/s13428-019-01268-4.

[9] S. Qaiser and R. Ali, “Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents,” Int. J. Comput. Appl., vol. 181, no. 1, pp. 25–29, 2018.

[10] V. K. Garbhapu and P. Bodapati, “A comparative analysis of Latent Semantic analysis and Latent Dirichlet allocation topic modeling methods using Bible data,” INDIAN J. Sci. Technol., vol. 13, no. 44, pp. 4474–4482, 2020.

[11] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” 1st Int. Conf. Learn. Represent. ICLR 2013 - Work. Track Proc., pp. 1–12, 2013.

[12] A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, “Bag of Tricks for Efficient Text Classification,” Arxiv J., 2016.

[13] G. Kovács, P. Alonso, and R. Saini, “Challenges of Hate Speech Detection in Social Media,” SN Comput. Sci., vol. 2, no. 2, p. 95, 2021, doi: 10.1007/s42979-021-00457-3.

[14] R. Ribeiro and F. Batista, “Transformer-based Language Models for Semantic Search and Mobile Applications Retrieval,” Int. Jt. Conf. Knowl. Discov. Knowl. Eng. Knowl. Manag., vol. 1, no. Ic3k, pp. 225–232, 2021, doi: 10.5220/0010657300003064.

[15] S. Garg, “Intelligent Tutoring Systems: The Future of AI-Powered Personalized Learning,” Int. Sci. J. Eng. Manag., vol. 01, pp. 1–6, 2022, doi: 10.55041/ISJEM00114.

[16] V. M. L. G. Nerella, “Automated Cross-Platform Database Migration and High Availability Implementation,” Turkish J. Comput. Math. Educ., vol. 9, no. 2, pp. 823–835, 2018.

[17] I. Dokmanic, R. Parhizkar, J. Ranieri, and M. Vetterli, “Euclidean Distance Matrices: Essential theory, algorithms, and applications,” IEEE Signal Process. Mag., vol. 32, no. 6, pp. 12–30, Nov. 2015, doi: 10.1109/MSP.2015.2398954.

[18] H. A. H. Hasan, “A Review of Hash Function Types and their Applications,” Wasit J. Comput. Math. Sci., vol. 1, pp. 120–139, 2022, doi: 10.31185/wjcm.52.

[19] V. T. Kesavan and B. S. Kumar, “Graph Based Indexing Techniques for Big Data Analytics: A Systematic Survey,” Int. J. Recent Technol. Eng., vol. 7, no. 6, pp. 641–647, 2019.

[20] H. P. Kapadia, “Voice and Conversational Interfaces in Banking Web Apps,” J. Emerg. Technol. Innov. Res., vol. 8, no. 6, pp. g817–g823, 2021.

[21] D. Kirilenko, A. K. Kovalev, Y. Solomentsev, A. Melekhin, D. A. Yudin, and A. I. Panov, “Vector Symbolic Scene Representation for Semantic Place Recognition,” in 2022 International Joint Conference on Neural Networks (IJCNN), 2022, pp. 1–8. doi: 10.1109/IJCNN55064.2022.9892761.

[22] F. Amin, A. Mondal, and J. Mathew, “Deep Semantic Hashing with Structure-Semantic Disagreement Correction via Hyperbolic Metric Learning,” in 2022 IEEE 24th International Workshop on Multimedia Signal Processing (MMSP), 2022, pp. 1–6. doi: 10.1109/MMSP55362.2022.9948733.

[23] L. Sheng et al., “A Vertical Semantic Search Engine in Electric Power Metering Domain,” in 2021 IEEE International Conference on Electrical Engineering and Mechatronics Technology (ICEEMT), 2021, pp. 640–644. doi: 10.1109/ICEEMT52412.2021.9602260.

[24] O. Ivanova, I. Zemtsov, and E. Minaev, “Database Integration Based on the Selection of Preliminary Knowledge Using a Semantic Network,” in 2020 2nd International Conference on Control Systems, Mathematical Modeling, Automation and Energy Efficiency (SUMMA), 2020, pp. 435–438. doi: 10.1109/SUMMA50634.2020.9280710.

[25] Y. Kalmukov and I. Valova, “Design and development of an automated web crawler used for building image databases,” in 2019 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), 2019, pp. 1553–1558. doi: 10.23919/MIPRO.2019.8756790.

[26] W. Song, Y. Liu, L.-Z. Liu, and H.-S. Wang, “Semantic Composition of Distributed Representations for Query Subtopic Mining,” Front. Inf. Technol. Electron. Eng., vol. 19, no. 11, pp. 1409–1419, 2018, doi: 10.1631/FITEE.1601476.