Retrieving Channel Data from YouTube in Python

Sept, 2013

This tutorial appeared as part of work published in JASIST

Retrieving The Most Subscribed / Viewed YouTube Channels
A list of the most subscribed channels can be obtained from a channel feed in the YouTube API v2.0. Note, it was found that a couple top channels were missing from the YouTube most subscribed channel feed (bug report #3748). Below is a Python script that retrieves and parses the 100 most subscribed channels from the YouTube API:

#Written for Python v. 2.7.1, Feedparser v. 5.1.3 import feedparser
print '\n--Retrieving Most Subscribed Channels--\n' ftop100 = open('top100.txt','w')
for start in range(1, 101, 50):
  uri = 'http://gdata.youtube.com/feeds/api/channelstandardfeeds/most_subscrib ed?start-index=' + str(start) + '&time=all_time&&max-results=50&v=2'
  feed = feedparser.parse(uri)
  for post in feed.entries:
   print post.author
ftop100.write( post.author + '\n' ) ftop100.close()

Changing the uri from “most_subscribed” to “most_viewed” will retrieve the most viewed YouTube channels. Similarly, adding a channel type suffix to either will retrieve top channels of a particular type (e.g. “most_subscribed_Comedians”). Currently nine channel types are permitted: Comedians, Directors, Gurus, Musicians, Non-Profit, Partners, Politicians, Reporters, Sponsors.

Retrieving A YouTube Channel’s Videos And Their Views
Python scripting and the The Google Data API can retrieve information on every video in a YouTube channel. The script below retrieves the view count of every video over a list of YouTube Channels. A text file containing a list of every user with descending list of every video's view. Also a list of the users and their respective subscriber count is created. Note that a sleep statement to pause the program can prevent over-accessing Google Data services.

#Written for Python v. 2.7.1, Google Data API 2.0
import gdata.youtube
import gdata.youtube.service
import time

yt_service = gdata.youtube.service.YouTubeService()
yt_service.ssl = True

def GetAndWriteEntryStats(uri,username,fviews,fratings,fdurations):
  yt_service = gdata.youtube.service.YouTubeService()
  feed = yt_service.GetYouTubeVideoFeed(uri)
for entry in feed.entry:
  WriteEntryStats(entry,username,fviews,fratings,fdurations)

def WriteEntryStats(entry,username,fviews,fratings,fdurations):
  try:
   fviews.write(entry.statistics.view_count + '\t')
  except:
   fviews.write('na' + '\t')

#START MAIN HERE
print '\n--Running Youtube View Count Analyzer--\n'
with open('top100.txt') as fusernames:
  usernames = fusernames.read().splitlines()
fusernames.close()

fviews = open('views.txt','w')
fsubscribers = open('subscribers.txt','w')

for username in usernames:
  print '- - - - - - ' + username + ' - - - - - -'
  fviews.write(username + '\t')
  for start in range(1, 501, 50):
   uri = 'http://gdata.youtube.com/feeds/api/users/' + username + '/uploads?start-index=' + str(start) + '&max- results=50&orderby=viewCount&racy=include'
   GetAndWriteEntryStats(uri,username,fviews,fratings,fdurations)
  fviews.write('\n')
  uri = 'http://gdata.youtube.com/feeds/api/users/' + username user_entry = yt_service.GetYouTubeUserEntry(uri)
  fsubscribers.write(user_entry.username.text + '\t')
  fsubscribers.write(user_entry.statistics.subscriber_count + '\n')

  time.sleep(6)

print '\n'