will students learn in your course?
this course the students will learn the basics of text mining and
will build on it to perform document categorization, document
grouping and subjective analysis.
The code implementation is carried out in Python language, while Natural Language Processing (NLP) is used for pre-processing textual data.
We will learn about structuring textual data using different representation schemes and tuning their parameters.
Starting from a very small dummy dataset, we migrate to existing databases to build models and perform validation and evaluation on them.
We will learn about scraping data from the web and converting it into a dataset.
Sentiment analysis of user hotel reviews
Information extraction from raw documents
Are there any course requirements or prerequisites?
Basics of programming (Any language, python is a bonus)
Basic understanding of Machine Learning
Can code with lists, loops and conditions and have basic understanding of models learning patterns from data
Who are your target students?
Beginners in python and curious about data science
Knows programming in Python and basic concepts of Data Science but cannot practically correlate the two.
In this course, we study the basics of text mining.
- The basic operations related to structuring the unstructured data into vector and reading different types of data from the public archives are taught.
- Building on it we use Natural Language Processing for pre-processing our dataset.
- Machine Learning techniques are used for document classification, clustering and the evaluation of their models.
- Information Extraction part is covered with the help of Topic modeling
- Sentiment Analysis with a classifier and dictionary based approach
- Almost all modules are supported with assignments to practice.
- Two projects are given that make use of most of the topics separately covered in these modules.
- Finally, a list of possible project suggestions is given for students to choose from and build their own project.
I am a researcher and an academician since 2011, and have a background of professional software development for around 3 years. As an Assistant Professor in Computer Science faculty I have taught various courses to undergraduate and graduate students. I am particularly interested in courses related to software design and development, databases, artificial intelligence, machine learning and data mining etc.
My PhD research is related to data science and computational linguistics, having worked with large-scale textual data for building knowledge-based systems that are adaptive and evolve with the growing needs without having to explicitly trained for a specific scenario. I have published papers in internationally recognized journals and conferences where we proposed solutions to real-world data analysis issues. I have supervised tens of projects that offered software based solutions for social content analytics, recommendations and tracking evolving public interests.
StartTheoretical Concepts of Text Representation (5:30)
StartStructuring One Document Corpus (3:00)
StartStructuring a Multiple Document Corpus (1:14)
StartSetting Parameters (3:56)
StartUsing TF-IDF Representation (0:40)
StartReading Data from a Labeled Dataset (3:25)
StartUsing Textual Dataset from UCI Respository (2:28)
StartMachine Learning Overview (2:38)
StartK-Nearest Neighbors Classifier (2:35)
StartNaive Bayes Classifier (3:18)
StartDecision Tree Classifier (2:07)
StartLinear Classifier (2:16)
StartConcluding Remarks on Classifiers (2:32)
StartClassifiers Implementation with Default Settings (13:40)
StartClassifiers with Different Parameter Settings (7:36)
StartClassification with a UCI Repository Dataset (7:32)