2 posters

Crawling through PDF

Thomaszai: Posts : 4
Points : 2489
Join date : 2018-02-01

Post n°1

Crawling through PDF

by Thomaszai Thu Feb 01, 2018 12:38 pm

Dear All,

I am new to Kapow, and I have been trying to create a robot to extract the SKU number as well as the quantity from a PDF source,

However in my loop for each tag, it keeps on looping to irrelevant paragraph, even though I have predefined the tag pattern,

The required fields are highlight in red box,

Any advice, suggestions or solutions?

TIA

jking

jking: Posts : 103
Points : 4053
Join date : 2014-03-01
Location : USA

Post n°2

Re: Crawling through PDF

by jking Fri Feb 02, 2018 1:28 am

Welcome to the forum and to Kapow.

Extracting from a .pdf is always challenging in that the tags are not as reliable as in a web page.

In looking at your screen shot two strategies come to mind. Both involve tag finders.

It is difficult to read the Tag Pattern you are using (perhaps you could copy the tag finder and past it into your post), but I suspect that the tag finder needs to be further refined.
It appears that the SKU number is contained in a text string that begins with a series of digits followed by a space followed by additional digit and non-digit characters followed by a space followed by PRWP, then a "-" and then by additional text.

Using Replace Pattern, the input text string 0002 9F1C145Z PRWP-DEGATINGPUNCH 2916035065 will result in Output of 9F1C145Z using the pattern \d{4}\s(.*)\sPRWP\-.*
which reads as ignore 4 digit characters followed by a space, preserve all trailing characters up until there is a space followed by PRWP followed by a "-" and then ignore all other characters.

Once you have the pattern locked down you could try to use that patten as the finder in your loop step.

An alternate strategy is to loop the .pfd, extract the text to a variable and apply the pattern. If the extract executes successfully, then you have found the SKU. If the step fails to execute, then the text string does not contain an SKU. You will need to put in error handling to go to the Next Iteration of your loop Step.

Note, in an ideal situation, the quantity (1 Number) is contained in the same text string as the SKU. If it is, you can use a different pattern to extract the quantity.

Thomaszai

Thomaszai: Posts : 4
Points : 2489
Join date : 2018-02-01

Post n°3

Re: Crawling through PDF

by Thomaszai Mon Feb 12, 2018 12:50 pm

Thank you so much for your fast response jking, I have adopted the alternative strategy that you mentioned.

Re: Crawling through PDF

by Sponsored content

KAPOW

Crawling through PDF