Photo by Christian Wiediger on Unsplash
Hi All, This week we will look into the data extraction methods for data science projects. While honing your skills, there is always “Learn by doing” and so it is always a great idea to work on personal projects. However, there is a caveat. The first and initial phase for any data science phase is data extraction/data collection. There are many sites that provide available public data sets. Other than that, if you want to learn in-depth, you can opt for collecting data in real time. Some of the ways are the below methods for data collection:
Using YouTube API:
YouTube API from Google allows developers to access video statistics and YouTube channels via two types of calls: Rest and XML_RPC (Wikipedia Definition).
YouTube API v3 lets us use youtube functionality in your app project. This can be used to retrieve search results like videos or playlists.
To use the API, a user needs to have a developer ID and create a project that will allow the user to enable YouTube Data API v3 and provide an API key.
Part where to create a project and an API key can be found here
Python Code To Extract Data:
Install required librariers:
conda install google-api-python-client
Import packages:
from googleapiclient.discovery import build
Setting YouTube parameters:
Set the youtube parameter such as API Key
youtube_apikey = APIKEY(from your developer account)
youtube = build('youtube', 'v3', developerkey = APIKEY)
“YouTube Data API” provides so many functions to retrieve data from the youtube of particular channels, videos or playlists.
Some of the resources available to retrieve which are used in the following steps are:
Search: It will be used to search a query about the channel by providing the “channel name” as search parameters and retrieve the channels.
Channel: It will be used to retrieve information about youtube channels including total subscribers of the channel, total uploaded videos, total likes/dislikes, comments on all videos, and other information.
Properties:
Snippets property for youtube search is used as it contains the basic information of the channel.
snippet = youtube.search().list(part = "snippet", type = "channel", q = "YouTube").execute()
snippet['items'] #all basic information related to the channels will be displayed
snippet['items'][0]['snippet']['channelId'] #retrieves the first channelId
Statistics property displays all the statistical information of the video/channelId you provide.
statistics = youtube.channels().list(part = "statistics", id = "channelId").execute()
statistics['items']#lists all the statistics for the channelId such as likes, dislikes, views, comments, etc.
This is the basic code snippet on how to start with data extraction using YouTube API.
Example Resources:
Next week, we will look into other methods for data collection. If you like this post, please like, share, and subscribe!!!
Comments