Spidering a website
- Enter the URL of a website or webpage to check in the Address bar. The URL should be in the form of http://www.trellian.com
To spider a website from your hard drive, select Open from the File menu. Browse for the HTML file and click OK. File names can also be entered in the address bar eg. C:\filetocheck.htm
- Select a spider type from the spider drop down menu.
Site Spider
The Site Spider will scan a complete web site and index all the URL's that the site contains. This spider will not spider behind the root domain. You can select a URL from the spider results and restart the spider from that point.
Gallery Page Spider
The gallery page spider is designed to extract the content from compiled gallery pages. Commonly called "Link Lists", these pages usually contain no content of there own, but do include many links to pages that do contain.
The first page of all the links that exit the root domain are included in the spider results. This kind of spider often yields the biggest return per spider.
Engine Page Spider
The engine page spider is very similar to the gallery page spider. The root page is the only page included at the root domain. This spider is designed to extract the content from engine page results.
You can then select the domain with the content that matches your requirements and continue searching using the default spider.
Keyword Spider
The keyword spider is an automated engine page spider.Trellian SiteSpider will automatically extract the search engine results before beginning a standard engine page spider. This spider is useful if you want to do a quick spider, but sometimes the results don't have the quality of a manually triggered spider. Simply enter keywords and select a search engine from the list.
- The selected spider will start spidering the website. SiteSpider's status is shown in the bottom left corner of the window.
- To stop the spider click the Stop button.
To resume the spider click the Spider button then click the Yes button when prompted to resume. The unspidered files are shown in the Pending Files tab.
