KAPOW

Would you like to react to this message? Create an account in a few clicks or log in to continue.
KAPOW

Welcome to the Kapow forum. Here you can get help, use your skills to help others and enjoy hanging out in the company of other Kapow Robot Developers.


2 posters

    CRAWL PAGES

    Shyam Kumar
    Shyam Kumar
    Ranks


    Posts : 113
    Points : 4112
    Join date : 2013-07-05
    Location : Kerala, India

    CRAWL PAGES Empty CRAWL PAGES

    Post by Shyam Kumar Tue Sep 27, 2016 12:43 pm

       

     The Crawl Pages action loops through the pages of a web site. In effect, it crawls the web site one web page at a time. Hence, the first iteration crawls the first page, the second iteration crawls the second page, and so on.


    The Crawl Pages action accepts a loaded page as part of the input, such as the start page of the web site. The output contains the next crawled web page.


    Crawl Pages robot command result in each link being fully loaded and the Javascript on the page executing.


    Each link that is traversed will be loaded and any Javascript on the page is executed. You can verify the behavior by creating a robot that will crawl a few levels of a website that you know utilizes JavaScript.


    Turn on the BROWSER TRACER tool (TOOLS > Open Browser Tracer > push the RED button to start recording traffic)


    As you step through the robot, you can use the BROWSER TRACER tool to verify the HTTP traffic as well as what Javascript was executed.


    If you wish to try having the CRAWL PAGES traverse links/pages without executing any Javascript, you could turn off EXECUTE JAVASCRIPT at the Global Configuration level for the entire robot (File > Configure Robot > BASIC Tab > [CONFIGURE] > Javascript Execution)


    How to Crawl an Entire Site


    In this example, we wish crawl an entire site.

    1. Add a step with the Load Page action that loads the main page.
    2. Add a new step and choose the Crawl Pages action.
    3. On the Rules tab, add a Crawling Rule that applies to all pages in the site, e.g. by specifying the domain that the pages belong to or by making a pattern that the URL should match. For these pages, the rule should specify "Crawl Entire Page" and "Output the Page".
    4. On the Rules tab, set the "For all Other Pages" property to "Do Not Crawl".
    5. After the step with the Crawl Pages action, add steps to handle each page, e.g. by extracting information into returned variables.


    How to Crawl a Popup Menu


    In this example, we wish to discover all the pages that a popup menu links directly to. We do not wish to continue crawling from these pages.

    1. Add a step with the Load Page action that loads the main page.
    2. Add a new step and choose the Crawl Pages action.
    3. Select the menu bar as named tag.
    4. Notice that the "Automatically Handle Popup Menus" option on the Crawling tab is checked.
    5. On the Rules tab, add a Crawling Rule saying that for "All URLs" we "Do Not Crawl", but "Output the Page".
    6. After the step with the Crawl Pages action, add steps to handle each page, e.g. by extracting information into returned variables.
    avatar
    E^E_2016


    Posts : 10
    Points : 2788
    Join date : 2016-08-29

    CRAWL PAGES Empty Crawl pages

    Post by E^E_2016 Fri Oct 21, 2016 9:18 am

    Hi ,

    Im really new in this and there isnt much helpful resource for Kapow . Hoping to get some urgent help here.
    I need to crawl pages from google search with some keyword. 

    Problem is I notice it is mentioned tht you use Crawl Pages action step. How come i couldnt find this in the action list ? 
    I am using Kapow Design Studio 9.6.2. See attached screenshot below.

    Also, How can i crawl the contents of all the URL from google search list in plain HTML content? How to test that all the URL crawled is able to extract the data content properly if it has different structure? in other words how can i configure to have data crawled with different structure contents in each URL? Can show me this please with screen shot? 

    That will be very helpful. Thanks.





    CRAWL PAGES Screen10
    Shyam Kumar
    Shyam Kumar
    Ranks


    Posts : 113
    Points : 4112
    Join date : 2013-07-05
    Location : Kerala, India

    CRAWL PAGES Empty Re: CRAWL PAGES

    Post by Shyam Kumar Tue Oct 25, 2016 9:33 am

    Hi,
    You can select a Crawl Pages action steps using the following methods,

    1. Select an action then select Loop then select step “Crawl Pages”

    CRAWL PAGES 1110

    2. Select an action then select All then select step “Crawl Pages”

    CRAWL PAGES 1210

    Thank you,

    Regards,


    Shyam kumar P
    avatar
    E^E_2016


    Posts : 10
    Points : 2788
    Join date : 2016-08-29

    CRAWL PAGES Empty Re: CRAWL PAGES

    Post by E^E_2016 Wed Oct 26, 2016 7:53 am

    Hi Shyam,

    here's my screenshot feedback. Under Loop and All, there's no "Crawl Pages" function seen. I am using Design studio 9.6.2 . Why I don't see it? Is there any other alternatives to crawl google search pages? Please do advice what I can do to fix this. Thanks.

    Screenshot 1 - Select an Action, then I select under "Loop". I do not see "Crawl Pages" function.
    CRAWL PAGES Screen11

    Screenshot 2 - Select an Action and under Select "All" , I do not see the "Crawl Pages" function either. 

    CRAWL PAGES Screen14
    Shyam Kumar
    Shyam Kumar
    Ranks


    Posts : 113
    Points : 4112
    Join date : 2013-07-05
    Location : Kerala, India

    CRAWL PAGES Empty Re: CRAWL PAGES

    Post by Shyam Kumar Wed Oct 26, 2016 4:23 pm

    Hi,

    May be the version 9.6.2 have not available that action step,

    if you need to extract data from website, no need to use crawl page action step.

    you can choose another alternative steps for grabbing data.


    Thank you

    Regards,

    Shyam kumar
    avatar
    E^E_2016


    Posts : 10
    Points : 2788
    Join date : 2016-08-29

    CRAWL PAGES Empty Re: CRAWL PAGES

    Post by E^E_2016 Wed Oct 26, 2016 5:13 pm

    Hi Shyam,

    So what is the alternative method if I cant use "Crawl Pages" function? Especially if each URL has different structure, how can I create robots to achieve this requirement?

    My requirement is basically to extract all the URL list appeared in google search pages base on a keyword search entered and the content of each URL in plain html format and store in database. (The content in plain structure html would have title, url, date and the content). 

    Hope you can advice me on this. Thank you so much.

    Sponsored content


    CRAWL PAGES Empty Re: CRAWL PAGES

    Post by Sponsored content


      Current date/time is Thu Mar 28, 2024 11:51 pm