Wednesday, August 4, 2010

Downloading and converting multiple flv files from a external website

I was trying to download all the flv files shown in a website out of my control (in my case http://www.marielilasagabaster.net) to be able to play them offline at any time.

To make it I have implemented a script scrapping all the web pages of the site (the urls of the pages have a simple format) to extract the url of the flv file from the HTML, download those files and convert them to a typical avi format.

This is the python script, feel free to adapt it to your needs:

import urllib2
import os
import re

base = "http://www.marielilasagabaster.net"

for id in range(1, 422):
    page = urllib2.urlopen(base + "/melodias.php?id=" + str(id))
    html = page.read()
    m = re.search("so.addVariable\(\"file\",\"(.*)&autostart=true\"\);", html)
    print "Downloaded HTML Page " + str(id)
    if m:
        print "Found flv reference " + m.group(1)
        url = base + m.group(1)
        output = "video_" + str(id) + ".avi"
        os.system("wget \"" + url + "\" -O - | ffmpeg -i pipe: " + output)