Making a list of URLs from an ftp site to download using wget

June 25, 2006
Recently, I decided to download Debian distribution and visited the official Debian website. Debian allows one to download the distribution in a variety of ways. Some of them being via torrents, Jigdo and of course as CD images (ISOs).

I prefer downloading the ISOs because it is much faster than torrents (especially if there are not that many people seeding the torrents). Also if you have a 256 Kpbs or less internet connection, your best choice is to download the ISOs.

So I navigated to the Debian download page and I encountered a bunch of links pointing to the ISOs.The full Debian distribution takes up as many as 22 CDs.

I wanted to copy all the links and enter them in a text file so that it can be easily passed on to a downloading program such as wget. In Linux this is easily achieved using a combination of lynx (console web browser), grep, head and awk. This is how it is done :

$ lynx -dump http://cdimage.debian.org/debian-cd/4.0_r0/i386/iso-cd/ |grep 'iso$'|awk '{ print $2 }'|head -n 21 > my_url_file.txt

The above command will download the source of the file using the -dump option in lynx and filter only ISOs, select only the path name of the ISO and save it to the file 'my_url_file.txt'. A cursory glance of the 'my_url_file.txt' file indicates that all the URLs of the CD ISOs are there , with one URL per line which is what I needed. Now all I had to do was to edit the file as needed and use it in conjunction with a script to download each of the 21 or so Debian ISOs as follows :
FILE: debian_downloader.sh
# Download debian ISOs one after the other using wget

for url in $(cat my_url_file.txt)
do
wget -c $url
done
Now execute the above bash script to start downloading the files one by one.
$ ./debian_downloader.sh
Note: You can also run this script as a cron job.

0 comments: