In May 2020, while in the middle of doing some research, I came across a YouTube video on a piece of history that I was researching. The video was long and had a lot of reference information that I needed. Instead of replaying the clip over and over to hear the information that I needed, I decided to find a way to obtain a transcript of the video.
Whenever I have a problem to solve, I jump straight to Python for the solution. With a little research, I found an API that will allow you to transcribe YouTube videos for Python called, YouTube Transcript/Subtitle API.
This is a Python API that allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles, supports translating subtitles, and does not require a headless browser, as other Selenium-based solutions do.
You can install this API by using the following command:
pip install youtube_transcript_apiThis API works by passing the YouTube video ID to YouTubeTranscriptApi.get_transcript function. The function then returns a dictionary of transcripts for the video. The API has many different options that you can use.
The following code will allow you to grab the transcript/subtitles for a given YouTube video. An error message will be displayed if the video does not contain a transcript. The code will accept one of the following formats:
- Full YouTube Video URL (https://www.youtube.com/watch?v=BvA0J_2ZpIQ)
- Short YouTube Video URL (https://youtu.be/BvA0J_2ZpIQ)
- YouTube Video ID (BvA0J_2ZpIQ)
The code will extract the video ID from the YouTube URL if the URL is used instead of a YouTube video ID.
#!/usr/bin/python3
from youtube_transcript_api import YouTubeTranscriptApi
def get_transcript(video):
# Extract the Video ID from the YouTube URLs
if "youtu.be" in video:
video_id = video.split("/")[3]
elif "watch?v=" in video:
video_id = video.split("=")[1]
else:
# Video is not a URL but the ID
video_id = video
try:
# Create an empty list to store transcript lines
transcript_lines = []
# Iterate over the APIs Dictionary of transcripts
for transcript in YouTubeTranscriptApi.get_transcript(video_id):
# Append the text value of each line to the list
transcript_lines.append(transcript['text'])
# Join the list elements together separated by a new line and return the results
return '\n'.join(transcript_lines)
except:
# No transcript was found for the video, return an error
return "Transcription is not available for this video."
if __name__ == "__main__":
# Pass the YouTube Video to the 'get_transcript' function
print(get_transcript('https://youtu.be/iODxExWFx_0'))Terminal Output:
[Music]<br>
cyber security is the term used to<br>
characterize<br>
and collect all of the activities<br>
policies<br>
procedures and tools used in concert to<br>
protect the information technology<br>
systems and data<br>
that is core to the functioning of the<br>
modern world<br>
the protections of cyber security apply<br>
to physical systems<br>
software systems as well as the people<br>
who use them<br>
there are multiple things that<br>
organizations can do to implement cyber<br>
security protections<br>
these include having agreed policies and<br>
procedures<br>
regular staff awareness training deploy<br>
strong password management tools<br>
and multi-factor authentication<br>
implement identity access management and<br>
privileged access management encrypt<br>
data at rest<br>
and in transit over networks install<br>
endpoint protection software and keep it<br>
up to date<br>
do frequent backups to secure locations<br>
and keep all it systems including<br>
network equipment up to date via<br>
installation of the latest<br>
security and operating systems updates<br>
the insights external threat protection<br>
suite enhances cyber security protection<br>
measures<br>
by applying threat intelligence allowing<br>
the monitoring of multiple sources to<br>
identify<br>
threats to your organization on the<br>
clear deep<br>
and dark web that could indicate<br>
potential cyber attack planning<br>
or data exposure against your brand<br>
people<br>
or infrastructure to deliver actionable<br>
insights that can be taken<br>
to mitigate the attack and riskThe code above will display the video transcript in your terminal. You may wish to modify the code so that it saves the output into a file instead. This snippet of code will allow you to do just that.
#!/usr/bin/python3
from youtube_transcript_api import YouTubeTranscriptApi
def get_transcript(video, file_name):
# Extract the Video ID from the YouTube URLs
if "youtu.be" in video:
video_id = video.split("/")[3]
elif "watch?v=" in video:
video_id = video.split("=")[1]
else:
# Video is not a URL but the ID
video_id = video
try:
# Create an empty list to store transcript lines
transcript_lines = []
# Iterate over the APIs Dictionary of transcripts
for transcript in YouTubeTranscriptApi.get_transcript(video_id):
# Append the text value of each line to the list
transcript_lines.append(transcript['text'])
# Create a file and write the Transcript to it
with open(file_name, "w") as file:
# Join the list elements together separated by a new line and write to file
file.write('\n'.join(transcript_lines))
# Return a successful message
return "The transcript has been created."
except:
# No transcript was found for the video, return an error
return "Transcription is not available for this video."
if __name__ == "__main__":
# Pass the YouTube Video and file name to the 'get_transcript' function
print(get_transcript('https://youtu.be/iODxExWFx_0', "transcript.txt"))






















