OUR TEAM

Whatever we have done in this project, we did it together!, A strong team makes a successful project afterall!

team

Shibam Roy

Captain/Data Analyst

Hi, i am Shibam Roy, a 15 year old school student with knowledge of C++, Python , Data Science and some basic DSA

team

Ankush Roy

Presentator

Hello, I am Ankush Roy, a 15 year old student with vast knowledge of Mathematics and C++.

team

Swadhin Maharana

Data Extractor

Hello, I'm Swadhin Maharana, a passionate programmer, quick learner, and aspiring entrepreneur. Highly skilled in problem-solving and ready to create innovative solutions.

team

Debdutta Burman

Front-End Developer

Frontend developer, a quick learner, with a passion for problem-solving, a keen interest in ML, and a data science enthusiast.

OUR APPROACH TO THE GIVEN PROBLEM

We had a simple approach to the problem to think about it properly.We followed up a number of steps and phases to complete the project properly.

1

STEP 1 | Proper Planning

We started by properly planning on how to work on the data. This phase also included the creation of our team and setting roles of each team members.

2

STEP 2 | Exploring the data

We explored the data to check out its columns, its length , and other basic features of the dataset. In this phase we also tried to find out how many null values our dataset has, or even it has null values or not. We also tried to find out occuraces of types of data in a column for example the medal_type consists of Bronze,Silver and Gold.

3

STEP 3 | Data cleaning

We started by cleaning all the null data points in the dataset, the most of the null values were found in the participant_title column. We managed to fill all possible points via the athletes_url column as it contained all the name of the athletes.We fetched the remaining values with country_name, or if not possible we dropped them.

4

STEP 4 | Analyzing the data

In this phase we tried to analyze the data by various means , through correlation mattrices, or via other visualizations. Besides visualization techniques we have used different queries in the data, which helped s find out various insights. We found out very helpful insights such as the dominance of USA or the strong correlation of Continent/ country position with their medals.

5

STEP 5 | Visualizations

In this phase after analysis, we made powerful visualizations which depict the given dataset clearly. We have plotted few graphs which show some insights on the data.

6

STEP 6 | Training and Testing a machine learning Model

After all our work , with our findings, we trained a machine learning model. We used Random Forest Regressor as the algorithm in this case.Our trained model isn't very accurate, but it can give a basic idea that on what factors a country's success in olympics is based on.However due to some finance issues we couldn't bring this model to run in this website(its a static website).

7

STEP 7 | Finished

After all this hardwork, we gained an extreme level of experience and we also worked together forming a great team, thanks to GeeksForGeeks we were able to make this great project!.

OUR DISCOVERIES AND FINDINGS

We were able to find out various things in the data which can help us identify the rate of success of a country.Here are the insights that we found out from the given dataset and also from some external data.

Our Insights :-

  1. 1. USA or United States of America was the most dominant country in the olympic events. It won about 2616 medals in total, out of which 766 were Bronze medals, 978 Gold medals, and 872 Silver medals .

  2. 2. Overall in the past 120 years European countries or countries belonging to Europe had most of the success rate.This can be due to their immense development in the past years.

  3. 3. African coutries or countries belonging to Africa were comparitively more prone to a lose instead of a win. It can be due to multiple Historical reasons, as they were under the colonial rule or other factors can be the cause of it.

  4. 3. We also found out the correlation between population of a country and its number of medals.This also makes sense for a country to have higher olympic medals , if it has higher population, because a higher population can result in a Higher number of athletes. This finally results in more number of medals.

  5. 4. Although, we didn't have any data about GDPs of countries, it can also be estimated that GDP of a country is related to its success in olympics. This is possible because its observed that more developed countries or simply countries with more GDP are receiving more medals, for example USA , Germany ,Norway, UK are all developed nations

  6. 5. Even Though this is quite obvious, but we have found out that coutnries which already have one type of medal , has high probability of receiving other type of medal in olympics. This can be explained like, if a country has high amount of gold medals, its highly likely for that country to also have multiple Silver and bronze medals.

  7. 6. On an average each player was aged about 16. The given dataset contains data from 1896-2022 (120 years)

  8. 7. Bronze medals were most common, and Silver medals were most rare

DATA SOURCES

For this project we have used the given dataset from GeeksForGeeks, and for even further exploration we have scaped data from the internet.We also used data from kaggle for the world population.

Data sources we have used :-

  1. 1. Provided dataset by GeeksForGeeks Click here to check

  2. 2. Scraped data from internet using Google Cloud Platform.The data was related to the birth years and age of the athletes. (This task was mostly done by Swadhin Maharana)Click here to check

  3. 3. For further exploration, we took a dataset from kaggle about country population to check correlation of it with the success of it.Click here to check

  4. 4. To check out our Project you can go Here

Our Visualizations

To understand and represent our data visually, we made multiple attractive graphs. Here are these graphs( and also one correlation mattrix ):

gallery

Bar Graph

Top 10 countries

Contains the top 10 countries on the basis of most olympic medals they received.

gallery

Pie Chart

Top 10 Sports

Contains the top 10 sports played in the olympic events.

gallery

Correlation Matrix

Feature correlations

This contains the correlation between multiple features of our data.

Acknowledgements

We, the team of passionate data enthusiasts, are humbled and immensely grateful for the incredible opportunity to participate in the Hackathon on Data Analysis organized by GeeksForGeeks. This exhilarating event has been an unforgettable journey that allowed us to explore the world of data analysis and showcase our skills as a team.
First and foremost, we extend our heartfelt gratitude to GeeksForGeeks for organizing such a fantastic hackathon. The platform provided us with an exceptional arena to apply our data analysis expertise, learn from industry experts, and challenge ourselves to new heights.
We cannot thank the organizers, mentors, and volunteers enough for their hard work and dedication in making this hackathon a resounding success. Their support and guidance throughout the event have been invaluable, pushing us to go above and beyond in our pursuit of data-driven solutions.
A special thank you goes to the dataset providers for giving us access to the rich and vast Olympics data. This dataset fueled our curiosity and allowed us to dive deep into our analysis, uncovering meaningful insights and trends.
Our heartfelt appreciation also goes to each member of our team. It has been an incredible journey collaborating with such talented and like-minded individuals. Together, we navigated through complex data challenges, brainstormed innovative ideas, and leveraged each other's strengths to achieve our goals.
Participating in this hackathon has been a transformative experience for our team, fostering camaraderie, knowledge-sharing, and growth. We are truly grateful for the memories created and the skills honed during this remarkable event.
Once again, our sincere thanks to GeeksForGeeks for organizing this extraordinary hackathon, and to everyone involved for creating an unforgettable experience in the realm of data analysis and exploration.
With heartfelt appreciation,
Shibam Roy, Ankush Roy, Swadhin Maharana, Debdutta Burman
GeeksForGeeks Hackathon Participants

Contact Us

You can contact us via the following details.

team

Shibam Roy

Email
royshibam9826@gmail.com
Phone no.-
8787777952

team

Ankush Roy

Email- ankush3411111@gmail.com

team

Swadhin Maharana

Email- noreplycursorhigh@gmail.com

team

Debdutta Burman

Email- debdutta0401@gmail.com