top of page
Filling Out a Form

STUDENTS PERFORMANCE IN EXAMS

INTRO TO PROBLEM:

As a full time college student, I work hard to strive in my education. And a lot around me do the same. Some are successful and some are trying to get there. I wonder what the main factors are for students’ performance during exams. I am curious about how well other students do during exams and what the major similarities are between those whose performance ranks are closer. I am going to use this data mining project to analyze this curiosity and focus on students' performance and the commonality between their status. 

The goal of this project is to discuss the steps taken to find the data set and to tell a story around the acquired data and make visualizations. Therefore, the problem I will be trying to understand is the students' performance in exams and their characteristics. I will be answering questions like:

  • Does gender play a role in students' exam performance?

  • What is the performance gap between those who prepared for the exam and who didn’t?

  • Does the level of parental education level have relations with students' exam performance?

INTRO TO DATA:

The dataset I will be working on for this project is a CSV file from Kaggle.com. The data set was published by Jakki Seshapanpu 3 years ago. You can access the dataset here. The data is collected from students in high school in the United States of America. This dataset contains:

  • Gender

  • race/ethnicity 

  • Parental level of education

  • Lunch 

  • Test preparation 

  • Math Score

  • Reading score

  • Writing score

PRE-PROCESSING DATA

The first step I took on this phase of data mining is identifying the type of libraries I need for this project and importing them. The next step was importing the dataset to a jupyter notebook and pulling up the dataset. I pre-processed the data to make sure I had made my dataset ready to display the needed information only in order to have accurate and complete data. Therefore, in this step I utilized Pandas and NumPy libraries to open and look at the dataset. Looking at the table of dataset, I noticed there is a column of “lunch” that I did not want to include in my data therefore I considered formatting my data and removing that column from the table. I also removed the “race/ethnicity” column because the classification of the race/ethnicity is not clear. I didn’t want to work with data that classifies race/ethnicity in group A, B, C, D, and E.

Once I dropped those unwanted columns, I looked for any missing data for the values I am going to be working with. There weren’t any null values therefore after making sure the data has been pre-processed to my needs then I am ready to move forward to the next step which is understanding the data. Before I move to the next step, I have noticed that I will have to be taking the average values of the math, reading, and writing scores when creating the visualization.

Students' Performance in Exams: Text
Image by Dainis Graveris

DATA VISUALIZATION

Does gender play a role in students' exam performance?

Students' Performance in Exams: Projects
gendercount_edited_edited.png

Gender Vs Count

genderp.JPG

As it is presented in the graph to the left, we can easily see that there is a slight difference in the number of females and male. The code above tells us the exact number of females is 38 more than the number of male.

gender.png

This graph will compare the performance of genders in math, reading, and writing exam scores. The blue bar represents women and the orange bar represents men. From my graph observation, the women lead the reading and writing score, whereas the men perform better in math. Therefore we can not obviously conclude that gender is the only factor of a student's performance. Let's look at other factors.

Taking an Exam_edited.jpg

What is the performance gap between those who prepared for the exam and who didn’t?

Students' Performance in Exams: Projects
preparationPie.png

Exam Preparation Vs Count

testprep.JPG

More than half of the pie chart is covered in blue which represents the number of students who are not prepared for the exam. From the 1000 students, only 358 was prepared for the exam during this data collection.

preparation.png

This graph has a uniform distribution of scores between those who prepared for the exam and those who didn’t. In all three subjects; reading, writing and math, the prepared students have a higher score than those who didn't. We can conclude that preparedness is one of the best factors to students' exam performance.

Father and Son in Bed

Does the level of parental education level have relations with students' exam performance?

Students' Performance in Exams: Projects
parentcount.png

Parental Level of Education Vs Count

parentP.png

Based on this data set we have about an equal number of parental levels of education in “some college” level and “associate’s degree”. We can also tell that the number of parents with master’s degree and bachelor’s degree are relatively very low.

parent.png

From this graph, we observe that the higher the parents’ education level, the higher the students' exam score. Therefore, we can conclude that the education level of parents’ have a role in students’ performance.

Storytelling

I gained good insight about the factors that influence students’ performance on exams. I can now conclude that gender has no correlation to exam score but preparation and parents’ education have. To reach this conclusion, I used Seaborn and Matplotlibs to create my visualization graphs.

 I used Seaborn library to create: 

  • Parental level of education vs count graph

  • Gender vs count graph

  • Test preparation course vs count graph

I used Matplotlibs to create score vs:

  • Parental level of education

  • Gender 

  • Test preparation course

Over all, I was able to find answers to my initial inquiry/curiosity about the factors of students’ exam performance. It gives me confidence to talk about the factors that affect students’ exam performance because I can back up what I say using my findings.


References

To learn how to display full dataset on jupyter notebook, I referenced: https://songhuiming.github.io/pages/2017/04/02/jupyter-and-pandas-display/#:~:text=To%20show%20the%20full%20data,'%2C%20500

To learn how to rotate the axis for the parental level of education vs count graph, I referred to : https://www.delftstack.com/howto/seaborn/rotate-tick-labels-seaborn/

To learn how to visualize using MatPlotlibs I referenced: https://matplotlib.org/3.1.1/gallery/lines_bars_and_markers/barchart.html

To draw my pie chart, I referenced: https://github.com/nosaka0/vehicular-accident-analysis/blob/main/viz.ipynb

Students' Performance in Exams: Text
Preparing for an Exam

VISIT MY CODE!!!

Students' Performance in Exams: Welcome

©2022 by liyugt.

bottom of page