PDA

View Full Version : beginner: allow one external download source



tester
10-09-2012, 03:12 PM
Hello, I just purchased a license and was wondering how to do the following.
If i take this page for example: http://www.stockphotosforfree.com/photos/people-talent-released.html

I want to download all the webpages ( 1 to...28) and the webpage behind each thumbnail and the photo itself (with photo, I mean the full resolution photo which is located behind the "DOWNLOAD" link).

Downloading a webpage isn't a problem, the problem is how do I allow only one external source?
The files I want to download are hosted on an external site:
http://hss.338c.edgecastcdn.net/***.jpg (this is the external source)

So is it possible to do a job at unlimited depth at SERVER level and only allow one external source? => http://hss.338c.edgecastcdn.net/
As a bonus: Is it possible to target certain extensions (.gif in stead of .jpg) or image sizes (size above 30kb) on the external link or even do a regex on the external link?

The links "http://hss.338c.edgecastcdn.net/..." and is reachable if you go to a photo page and click the "DOWNLOAD" image link.

Thank you!

Brent
10-10-2012, 09:53 AM
Hi Tester,

I have taken a look at the stock photos site.

As you mentioned, as the full resolution photos are on an external domain, it makes it impossible to use a site download job to scrape them all.

We do have future plans to include the ability to whitelist or possibly blacklist the site job following links based on a regexp of the url. Unfortunately this has not been implemented for the upcoming version 8 of DownloadStudio.

One feature which will allow you to mass download photos from the edgecastcdn.net site is the file range downloader in downloadstudio. You can use this by selecting the action 'Download a range of files' in the add job dialog.

The file range downloader allows you create many jobs based on patterns in the urls. For example, if you have a url like, http://hss.338c.edgecastcdn.net/0F338C/spff/hq/40306.jpg, you can download all files from say, 40306.jpg to 40399.jpg.

To download from edgecastcdn.net, the server will only allow us to download if it has a referrer, like 'http://www.stockphotosforfree.com/free-stock-photos/p-6236-terrain.html'. You can add this in the Download Settings... from the add job dialog in downloadstudio.


As a bonus: Is it possible to target certain extensions (.gif in stead of .jpg) or image sizes (size above 30kb)

Yes both these features are in DownloadStudio. On a Site download job, click the Download Settings... button and go to the File Filter Settings section. Here you will find these options (see attached image).

329

tester
10-11-2012, 07:52 PM
Thank you for the feedback!!