KAPOW

Would you like to react to this message? Create an account in a few clicks or log in to continue.
KAPOW

Welcome to the Kapow forum. Here you can get help, use your skills to help others and enjoy hanging out in the company of other Kapow Robot Developers.


2 posters

    Crawling through PDF

    avatar
    Thomaszai


    Posts : 4
    Points : 2489
    Join date : 2018-02-01

    Crawling through PDF  Empty Crawling through PDF

    Post by Thomaszai Thu Feb 01, 2018 12:38 pm

    Dear All,

    I am new to Kapow, and I have been trying to create a robot to extract the SKU number as well as the quantity from a PDF source,

    However in my loop for each tag, it keeps on looping to irrelevant paragraph, even though I have predefined the tag pattern,

    The required fields are highlight in red box, 

    Any advice, suggestions or solutions?


    TIA

    Crawling through PDF  Captur10
    jking
    jking


    Posts : 103
    Points : 4053
    Join date : 2014-03-01
    Location : USA

    Crawling through PDF  Empty Re: Crawling through PDF

    Post by jking Fri Feb 02, 2018 1:28 am

    Welcome to the forum and to Kapow.

    Extracting from a .pdf is always challenging in that the tags are not as reliable as in a web page.

    In looking at your screen shot two strategies come to mind.  Both involve tag finders.

    It is difficult to read the Tag Pattern you are using (perhaps you could copy the tag finder and past it into your post), but I suspect that the tag finder needs to be  further refined.
    It appears that the SKU number is contained in a text string that begins with a series of digits followed by a space followed by additional digit and non-digit characters followed by a space followed by PRWP, then a "-" and then by additional text.

    Using Replace Pattern, the input text string 0002 9F1C145Z PRWP-DEGATINGPUNCH 2916035065 will result in Output of 9F1C145Z using the pattern \d{4}\s(.*)\sPRWP\-.*
    which reads as ignore 4 digit characters followed by a space, preserve all trailing characters up until there is a space followed by PRWP followed by a "-" and then ignore all other characters.

    Once you have the pattern locked down you could try to use that patten as the finder in your loop step.  

    An alternate strategy is to loop the .pfd, extract the text to a variable and apply the pattern.  If the extract executes successfully, then you have found the SKU.  If the step fails to execute, then the text string does not contain an SKU.  You will need to put in error handling to go to the Next Iteration of your loop Step.

    Note, in an ideal situation, the quantity (1 Number) is contained in the same text string as the SKU.  If it is, you can use a different pattern to extract the quantity.
    avatar
    Thomaszai


    Posts : 4
    Points : 2489
    Join date : 2018-02-01

    Crawling through PDF  Empty Re: Crawling through PDF

    Post by Thomaszai Mon Feb 12, 2018 12:50 pm

    Thank you so much for your fast response jking, I have adopted the alternative strategy that you mentioned.

    Sponsored content


    Crawling through PDF  Empty Re: Crawling through PDF

    Post by Sponsored content


      Current date/time is Fri Nov 15, 2024 1:54 pm