The Crawl Pages action loops through the pages of a web site. In effect, it crawls the web site one web page at a time. Hence, the first iteration crawls the first page, the second iteration crawls the second page, and so on.
The Crawl Pages action accepts a loaded page as part of the input, such as the start page of the web site. The output contains the next crawled web page.
Crawl Pages robot command result in each link being fully loaded and the Javascript on the page executing.
Each link that is traversed will be loaded and any Javascript on the page is executed. You can verify the behavior by creating a robot that will crawl a few levels of a website that you know utilizes JavaScript.
Turn on the BROWSER TRACER tool (TOOLS > Open Browser Tracer > push the RED button to start recording traffic)
As you step through the robot, you can use the BROWSER TRACER tool to verify the HTTP traffic as well as what Javascript was executed.
If you wish to try having the CRAWL PAGES traverse links/pages without executing any Javascript, you could turn off EXECUTE JAVASCRIPT at the Global Configuration level for the entire robot (File > Configure Robot > BASIC Tab > [CONFIGURE] > Javascript Execution)
How to Crawl an Entire Site
In this example, we wish crawl an entire site.
- Add a step with the Load Page action that loads the main page.
- Add a new step and choose the Crawl Pages action.
- On the Rules tab, add a Crawling Rule that applies to all pages in the site, e.g. by specifying the domain that the pages belong to or by making a pattern that the URL should match. For these pages, the rule should specify "Crawl Entire Page" and "Output the Page".
- On the Rules tab, set the "For all Other Pages" property to "Do Not Crawl".
- After the step with the Crawl Pages action, add steps to handle each page, e.g. by extracting information into returned variables.
How to Crawl a Popup Menu
In this example, we wish to discover all the pages that a popup menu links directly to. We do not wish to continue crawling from these pages.
- Add a step with the Load Page action that loads the main page.
- Add a new step and choose the Crawl Pages action.
- Select the menu bar as named tag.
- Notice that the "Automatically Handle Popup Menus" option on the Crawling tab is checked.
- On the Rules tab, add a Crawling Rule saying that for "All URLs" we "Do Not Crawl", but "Output the Page".
- After the step with the Crawl Pages action, add steps to handle each page, e.g. by extracting information into returned variables.