Awesome sites for ML datasets

Awesome sites for ML datasets

Jan 27, 2021ยท

3 min read

Play this article

Are you tired of finding datasets for ML projects? Don't worry I got you covered!

Here's a list of some awesome sites that I recommend for datasets!


Kaggle is the largest ML community containing just not only ML datasets, but various interesting contests, and also kaggle GPU boosted notebooks which you can use for coding right there. It provides an amazing community of like minded people, who learn ML, deep learning and datasets along with awesome contests to grow your rep in the community.

Screenshot from 2021-01-25 17-38-29.png


This is one of the largest ML data archives containing huge amount of data from the year 1969 to date. It has so many interesting datasets, that you won't ever complain. It contains a lot of things, along with raw data and names, and it is a real world datasets that might contain missing data, which acts as a challenge to fight it, and work with it.

Screenshot from 2021-01-25 17-39-01.png

Open ML

This is one of the famous ML community containing several thousands of datasets for you normal and various interesting projects and everything is well organized and really well explained for each. It has a large community, and also they post various interesting challenges and resources in the end for you to tackle and learn.

Screenshot from 2021-01-25 17-39-55.png

US Govt datasets

The Government of US provides free access to many of its online catalogs and datasets for research and development purposes. This contains over 18k CSV datasets and many other databases.

Screenshot from 2021-01-25 17-40-15.png

Youtube datasets

Created by Youtube, this is the best place to get a video dataset. It consists over 8 million video IDs and labels. Here you get abundance of video datasets. This helps you create your own recommendation engines, and tackle various problems

Screenshot from 2021-01-25 17-40-32.png

IMDB Datasets

A lot of movie related datasets can be found related to the movie information giant IMDB (Internet Movie Data Base). We get the best movie data both in quantity as well as quality.

Screenshot from 2021-01-25 17-40-56.png

Google Datasets

This has a ton of data which will help you with your Data science and learning skills, and also use the data for training your own models. This is completely free and public for anyone to freely use.

Screenshot from 2021-01-25 17-41-41.png

Google cloud datasets

This data has been gathered from GCP, or the dumps from data that is suitable and useful for working and making ML models and also training personal algorithms and working them out. This has a lot of interesting things and is sure to help you out.

Screenshot from 2021-01-25 17-42-04.png

AWS sourced datasets

This is the datasets crowd sourced and generated by amazon. They have gathered all of data from various crowd based surveys and product and service tendencies and various other features from their own softwares which is further utilized in ML.

Screenshot from 2021-01-25 17-42-20.png

UK government datasets

United kingdom has a lot of data for you to use and train a interesting model. This has been gathered by UK government over all the factors, that yield data, and can be used in a good way, and also solving various problems. It's free to use and also has a lot of data to learn.

Screenshot from 2021-01-25 17-42-41.png