KAPOW

Welcome to the Kapow forum. Here you can get help, use your skills to help others and enjoy hanging out in the company of other Kapow Robot Developers.


PDF Looping

Share

Tejaswini H R

Posts : 10
Points : 223
Join date : 2018-04-26

PDF Looping

Post by Tejaswini H R on Wed Jul 18, 2018 7:57 pm

Hi,

PDF has Tables inserted with it. when PDF is loaded into Kapow, tables are not aligned and i am not able to loop through these tables. 
Table contains Product, Title, Quantity and Price fields
I have used "Loop for Each Tag" so as to loop through these fields and fetch the next Line item.
But, Kapow is not able to identify the next line item. 
Can somebody help me resolve this issue?

Thanks
Tejaswini H R


avatar
pavel.vraj

Posts : 78
Points : 822
Join date : 2016-11-04
Location : Prague, Czech Republic

Re: PDF Looping

Post by pavel.vraj on Thu Jul 19, 2018 4:24 pm

Hi,
the problem with PDF is, that in most cases it doesn't have structures, as we see it, because it uses streams and these could be mixed throughout the whole file. So this is something you would never be able to 100% handle with Kapow. In most cases you're able to prepare the robot, but another PDF will come and the streams could be designed different way and it won't work.
For the preparation of the robot (because the screenshot is not visible) I suggest to try to use advanced extraction.
Best regards,
Pavel Vraj

leedle

Posts : 18
Points : 132
Join date : 2018-07-31

Re: PDF Looping

Post by leedle on Wed Aug 01, 2018 6:04 am

Your best bet is to extract all of the text and use javascript/regex to get what you need.
avatar
Shyam Kumar
Ranks

Posts : 108
Points : 2144
Join date : 2013-07-05
Location : Kerala, India

Re: PDF Looping

Post by Shyam Kumar on Wed Aug 01, 2018 11:10 am

Hi Tejaswini H R,

For the PDF extraction is little bit difficult, I think the above mentioned PDF we can easily extracted the content.

You can implement various logic here..

You done is right, but need few changes here.. in the Loop for Each Tag mentioned above I think you did't use any tag pattern.
Here you can use tag pattern, I think the heading Handsets is the common to all table. So we can set this in the tag finder and the condition is not satisfied try next loop. Here you set the span tag and First tag number is 36. It may possible to skip any record. because in PDF almost everything in a span tag.

Another method is to extract whole HTML to a variable and using For Each Text Part, you can split each table, here you need to give the delimiter as a common word; here you can use Handsets and the output store in another variable. In this variable display only the table content. in the output you can easily split the values or using create page action step you can load and extract the content.

Please try and didn't get any results, please share you source/pdf.


Thank you.

Regards,
Shyam kumar

Tejaswini H R

Posts : 10
Points : 223
Join date : 2018-04-26

Re: PDF Looping

Post by Tejaswini H R on Thu Aug 09, 2018 6:05 pm

Hi Shyam,
 
Thanks for your suggestions.
I tried the second step it is working fine:)
But i want to explore first step too.
 
We have another requirement where the entire row needs to be extracted and merged with another PDF
we are able to extract contents(using second step) and we are appending to an existing file and using "Write File" option
but, when we are opening merged PDF we are getting "corrupted PDF(51KB) file".
 
Thanks!
Tejaswini H R






Last edited by Tejaswini H R on Mon Aug 13, 2018 12:05 pm; edited 1 time in total

chrismemo

Posts : 13
Points : 222
Join date : 2018-04-24
Location : Singapore

Re: PDF Looping

Post by chrismemo on Mon Aug 13, 2018 10:33 am

Hi Teja,
One of the Other options is using any Enterprise Capture solutions liked combination Kofax KTA/KC + KTM, Opentext Captiva, Abbyy Flexicapture and this will make the life easier ^^
It will can OCR and extract the tables even span on multiple pages.
Regards,
Chris

Sponsored content

Re: PDF Looping

Post by Sponsored content


    Current date/time is Thu Nov 15, 2018 12:04 pm