DSpace Repository

Ask-UP: A Large Language Model-Powered Interactive Agent for the University of the Philippines Gazette Files using Retrieval Augmented Generation

Show simple item record

dc.contributor.author Baclig, Isabel B.
dc.date.accessioned 2025-08-15T00:28:28Z
dc.date.available 2025-08-15T00:28:28Z
dc.date.issued 2025-07
dc.identifier.uri http://dspace.cas.upm.edu.ph:8080/xmlui/handle/123456789/3123
dc.description.abstract This study presents a retrieval-augmented question-answering (QA) system designed to extract academic policies and resolutions from the University of the Philippines (UP) Gazette. Utilizing Optical Character Recognition (OCR), historical printed documents were digitized and preprocessed through natural language processing techniques. Dense vector representations were generated using embedding models and stored in Pinecone, a hybrid vector database enabling both semantic and keyword-based retrieval. Retrieved passages were reranked using a two-stage approach: a cross-encoder for semantic matching and PageRank-based graph reranking to promote contextually central chunks. A fine-tuned large language model (LLM) was then used to generate coherent, context-aware responses based on the top-ranked passages. The system was evaluated using retrieval and generation metrics including Precision@k, Recall@k, Mean Reciprocal Rank (MRR), ROUGE-L, METEOR, and Jaccard Similarity. Results indicate that while the LLM frequently identifies the correct answers, partial outputs affect text generation scores, suggesting future improvements in generation grounding. This research demonstrates how hybrid search and graph-based reranking enhance retrieval effectiveness in open-domain QA for historical documents. en_US
dc.subject Retrieval-Augmented Generation (RAG) en_US
dc.subject University of the Philippines Gazette en_US
dc.subject Dense Passage Retrieval en_US
dc.subject Pinecone en_US
dc.subject Graph Reranking en_US
dc.subject PageRank en_US
dc.subject Cross-Encoder en_US
dc.subject Large Language Model (LLM) en_US
dc.subject Optical Character Recognition en_US
dc.subject Natural Language Processing en_US
dc.title Ask-UP: A Large Language Model-Powered Interactive Agent for the University of the Philippines Gazette Files using Retrieval Augmented Generation en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account