KAPOW

Welcome to the Kapow forum. Here you can get help, use your skills to help others and enjoy hanging out in the company of other Kapow Robot Developers.


Crawling through PDF

Share

Thomaszai

Posts : 4
Points : 146
Join date : 2018-02-01

Crawling through PDF

Post by Thomaszai on Thu Feb 01, 2018 12:38 pm

Dear All,

I am new to Kapow, and I have been trying to create a robot to extract the SKU number as well as the quantity from a PDF source,

However in my loop for each tag, it keeps on looping to irrelevant paragraph, even though I have predefined the tag pattern,

The required fields are highlight in red box, 

Any advice, suggestions or solutions?


TIA

avatar
jking

Posts : 45
Points : 1627
Join date : 2014-03-01
Location : USA

Re: Crawling through PDF

Post by jking on Fri Feb 02, 2018 1:28 am

Welcome to the forum and to Kapow.

Extracting from a .pdf is always challenging in that the tags are not as reliable as in a web page.

In looking at your screen shot two strategies come to mind.  Both involve tag finders.

It is difficult to read the Tag Pattern you are using (perhaps you could copy the tag finder and past it into your post), but I suspect that the tag finder needs to be  further refined.
It appears that the SKU number is contained in a text string that begins with a series of digits followed by a space followed by additional digit and non-digit characters followed by a space followed by PRWP, then a "-" and then by additional text.

Using Replace Pattern, the input text string 0002 9F1C145Z PRWP-DEGATINGPUNCH 2916035065 will result in Output of 9F1C145Z using the pattern \d{4}\s(.*)\sPRWP\-.*
which reads as ignore 4 digit characters followed by a space, preserve all trailing characters up until there is a space followed by PRWP followed by a "-" and then ignore all other characters.

Once you have the pattern locked down you could try to use that patten as the finder in your loop step.  

An alternate strategy is to loop the .pfd, extract the text to a variable and apply the pattern.  If the extract executes successfully, then you have found the SKU.  If the step fails to execute, then the text string does not contain an SKU.  You will need to put in error handling to go to the Next Iteration of your loop Step.

Note, in an ideal situation, the quantity (1 Number) is contained in the same text string as the SKU.  If it is, you can use a different pattern to extract the quantity.

Thomaszai

Posts : 4
Points : 146
Join date : 2018-02-01

Re: Crawling through PDF

Post by Thomaszai on Mon Feb 12, 2018 12:50 pm

Thank you so much for your fast response jking, I have adopted the alternative strategy that you mentioned.

Sponsored content

Re: Crawling through PDF

Post by Sponsored content


    Current date/time is Mon Jun 18, 2018 5:06 am