This course is a project-based course that studies basic techniques for processing text data. The course will introduce the concepts of language morphology, text representation, pre-processing, feature extraction to obtain information such as similarity and text clustering. Topics covered include: language morphology, string representation, regex, tokenization, text pre-processing, Bag of Words, TF-IDF, word similarity, word clustering, and web scraping. Students will create group projects to apply text processing theories and concepts to problems in the field of Data Science.