Please use this identifier to cite or link to this item: http://dspace.cas.upm.edu.ph:8080/xmlui/handle/123456789/3123
Full metadata record
DC FieldValueLanguage
dc.contributor.authorBaclig, Isabel B.-
dc.date.accessioned2025-08-15T00:28:28Z-
dc.date.available2025-08-15T00:28:28Z-
dc.date.issued2025-07-
dc.identifier.urihttp://dspace.cas.upm.edu.ph:8080/xmlui/handle/123456789/3123-
dc.description.abstractThis study presents a retrieval-augmented question-answering (QA) system designed to extract academic policies and resolutions from the University of the Philippines (UP) Gazette. Utilizing Optical Character Recognition (OCR), historical printed documents were digitized and preprocessed through natural language processing techniques. Dense vector representations were generated using embedding models and stored in Pinecone, a hybrid vector database enabling both semantic and keyword-based retrieval. Retrieved passages were reranked using a two-stage approach: a cross-encoder for semantic matching and PageRank-based graph reranking to promote contextually central chunks. A fine-tuned large language model (LLM) was then used to generate coherent, context-aware responses based on the top-ranked passages. The system was evaluated using retrieval and generation metrics including Precision@k, Recall@k, Mean Reciprocal Rank (MRR), ROUGE-L, METEOR, and Jaccard Similarity. Results indicate that while the LLM frequently identifies the correct answers, partial outputs affect text generation scores, suggesting future improvements in generation grounding. This research demonstrates how hybrid search and graph-based reranking enhance retrieval effectiveness in open-domain QA for historical documents.en_US
dc.subjectRetrieval-Augmented Generation (RAG)en_US
dc.subjectUniversity of the Philippines Gazetteen_US
dc.subjectDense Passage Retrievalen_US
dc.subjectPineconeen_US
dc.subjectGraph Rerankingen_US
dc.subjectPageRanken_US
dc.subjectCross-Encoderen_US
dc.subjectLarge Language Model (LLM)en_US
dc.subjectOptical Character Recognitionen_US
dc.subjectNatural Language Processingen_US
dc.titleAsk-UP: A Large Language Model-Powered Interactive Agent for the University of the Philippines Gazette Files using Retrieval Augmented Generationen_US
dc.typeThesisen_US
Appears in Collections:BS Computer Science SP



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.