Course Description
This course is a project-based course that studies basic techniques for processing text data. The course will introduce the concepts of language morphology, text representation, pre-processing, feature extraction to obtain information such as similarity and text clustering. Topics covered include: language morphology, string representation, regex, tokenization, text pre-processing, Bag of Words, TF-IDF, word similarity, word clustering, and web scraping. Students will create group projects to apply text processing theories and concepts to problems in the field of Data Science.
Program Objectives (PO)
- Menjelaskan konsep Language Modeling dalam pengolahan teks
- Mampu merepresentasikan pengetahuan linguistik pada tingkat representasi morfologi, sintaksis serta semantik
- Mampu melakukan penggalian data teks dari sumber digital dan mengolahnya menggunakan teknik pre-processing, ekstraksi fitur, dan klasifikasi teks
- Mampu merancang penyelesaian masalah pada data teks menggunakan pengolahan data teks yang terkait