Awesome sites for ML datasets
3 min read
Are you tired of finding datasets for ML projects? Don't worry I got you covered!
Here's a list of some awesome sites that I recommend for datasets!
Kaggle is the largest ML community containing just not only ML datasets, but various interesting contests, and also kaggle GPU boosted notebooks which you can use for coding right there. It provides an amazing community of like minded people, who learn ML, deep learning and datasets along with awesome contests to grow your rep in the community.
This is one of the largest ML data archives containing huge amount of data from the year 1969 to date. It has so many interesting datasets, that you won't ever complain. It contains a lot of things, along with raw data and names, and it is a real world datasets that might contain missing data, which acts as a challenge to fight it, and work with it.
This is one of the famous ML community containing several thousands of datasets for you normal and various interesting projects and everything is well organized and really well explained for each. It has a large community, and also they post various interesting challenges and resources in the end for you to tackle and learn.
US Govt datasets
The Government of US provides free access to many of its online catalogs and datasets for research and development purposes. This contains over 18k CSV datasets and many other databases.
Created by Youtube, this is the best place to get a video dataset. It consists over 8 million video IDs and labels. Here you get abundance of video datasets. This helps you create your own recommendation engines, and tackle various problems
A lot of movie related datasets can be found related to the movie information giant IMDB (Internet Movie Data Base). We get the best movie data both in quantity as well as quality.
This has a ton of data which will help you with your Data science and learning skills, and also use the data for training your own models. This is completely free and public for anyone to freely use.
Google cloud datasets
This data has been gathered from GCP, or the dumps from data that is suitable and useful for working and making ML models and also training personal algorithms and working them out. This has a lot of interesting things and is sure to help you out.
AWS sourced datasets
This is the datasets crowd sourced and generated by amazon. They have gathered all of data from various crowd based surveys and product and service tendencies and various other features from their own softwares which is further utilized in ML.
UK government datasets
United kingdom has a lot of data for you to use and train a interesting model. This has been gathered by UK government over all the factors, that yield data, and can be used in a good way, and also solving various problems. It's free to use and also has a lot of data to learn.