
STUDENTS PERFORMANCE IN EXAMS
INTRO TO PROBLEM:
As a full time college student, I work hard to strive in my education. And a lot around me do the same. Some are successful and some are trying to get there. I wonder what the main factors are for students’ performance during exams. I am curious about how well other students do during exams and what the major similarities are between those whose performance ranks are closer. I am going to use this data mining project to analyze this curiosity and focus on students' performance and the commonality between their status.
The goal of this project is to discuss the steps taken to find the data set and to tell a story around the acquired data and make visualizations. Therefore, the problem I will be trying to understand is the students' performance in exams and their characteristics. I will be answering questions like:
Does gender play a role in students' exam performance?
What is the performance gap between those who prepared for the exam and who didn’t?
Does the level of parental education level have relations with students' exam performance?
INTRO TO DATA:
The dataset I will be working on for this project is a CSV file from Kaggle.com. The data set was published by Jakki Seshapanpu 3 years ago. You can access the dataset here. The data is collected from students in high school in the United States of America. This dataset contains:
Gender
race/ethnicity
Parental level of education
Lunch
Test preparation
Math Score
Reading score
Writing score
PRE-PROCESSING DATA
The first step I took on this phase of data mining is identifying the type of libraries I need for this project and importing them. The next step was importing the dataset to a jupyter notebook and pulling up the dataset. I pre-processed the data to make sure I had made my dataset ready to display the needed information only in order to have accurate and complete data. Therefore, in this step I utilized Pandas and NumPy libraries to open and look at the dataset. Looking at the table of dataset, I noticed there is a column of “lunch” that I did not want to include in my data therefore I considered formatting my data and removing that column from the table. I also removed the “race/ethnicity” column because the classification of the race/ethnicity is not clear. I didn’t want to work with data that classifies race/ethnicity in group A, B, C, D, and E.
Once I dropped those unwanted columns, I looked for any missing data for the values I am going to be working with. There weren’t any null values therefore after making sure the data has been pre-processed to my needs then I am ready to move forward to the next step which is understanding the data. Before I move to the next step, I have noticed that I will have to be taking the average values of the math, reading, and writing scores when creating the visualization.

DATA VISUALIZATION
Does gender play a role in students' exam performance?

Gender Vs Count

As it is presented in the graph to the left, we can easily see that there is a slight difference in the number of females and male. The code above tells us the exact number of females is 38 more than the number of male.

This graph will compare the performance of genders in math, reading, and writing exam scores. The blue bar represents women and the orange bar represents men. From my graph observation, the women lead the reading and writing score, whereas the men perform better in math. Therefore we can not obviously conclude that gender is the only factor of a student's performance. Let's look at other factors.

What is the performance gap between those who prepared for the exam and who didn’t?

Exam Preparation Vs Count

More than half of the pie chart is covered in blue which represents the number of students who are not prepared for the exam. From the 1000 students, only 358 was prepared for the exam during this data collection.

This graph has a uniform distribution of scores between those who prepared for the exam and those who didn’t. In all three subjects; reading, writing and math, the prepared students have a higher score than those who didn't. We can conclude that preparedness is one of the best factors to students' exam performance.

Does the level of parental education level have relations with students' exam performance?

Parental Level of Education Vs Count

Based on this data set we have about an equal number of parental levels of education in “some college” level and “associate’s degree”. We can also tell that the number of parents with master’s degree and bachelor’s degree are relatively very low.

From this graph, we observe that the higher the parents’ education level, the higher the students' exam score. Therefore, we can conclude that the education level of parents’ have a role in students’ performance.
Storytelling
I gained good insight about the factors that influence students’ performance on exams. I can now conclude that gender has no correlation to exam score but preparation and parents’ education have. To reach this conclusion, I used Seaborn and Matplotlibs to create my visualization graphs.
I used Seaborn library to create:
Parental level of education vs count graph
Gender vs count graph
Test preparation course vs count graph
I used Matplotlibs to create score vs:
Parental level of education
Gender
Test preparation course