I've worked on a variety of projects, focusing on video search engine and linguistic tools based on NLP and ML. Here are some highlights I'm proud of, showcasing my process from concept to execution.
2024
Combating Clickbait\: A Fine-grained Video Search Engine
• Developed a video search engine for the Chinese counterpart of Youtube, Bilibili, bypassing titles and descriptions to objectively assess content and reduce clickbait.
• Used audio transcription and large language models (LLMs) to detect video content, summarize it, and classify its genre.
• Enabled users to locate and jump to the exact timestamp in a video when a keyword appears, and highlight position of the keyword object on the screen.
• Optimized performance and reduced computational overhead through scene division, random sampling, image compression, and other techniques.
"Oldspeak", a Collocation Dictionary of the Classical Chinese Language
• A collocation extraction system and dictionary website for Classical Chinese, supporting language learning and linguistic research.
• Using BERT-based NLP models for tokenization, PoS tagging, and dependency parsing to extract collocations accurately.
• Using FastAPI and PostgreSQL to compute relative frequencies of collocations and support advanced queries based on conditions such as historical period or genre.
• Using Vue.js to develop a web interface, enabling researchers and learners to perform easy lookups and view detailed information and example sentences for each collocation.