2 posters

How to extract image in pdf to be excel and local language

DeepMind123: Posts : 7
Points : 1759
Join date : 2019-07-22

Post n°1

How to extract image in pdf to be excel and local language

by DeepMind123 Fri Aug 02, 2019 5:26 pm

็็Hi i use 'Design studio 10.4.0.0'

I have the problem about try to extract data from pdf file. The table in pdf is show as image and information is my local language (thai language). I saw it can extract correct only number and english words.

How to extract image in pdf to be excel and local language Proble11

In image i point at thai word is the name of company "บริษัท" but it show attribute in word text=1Jtgvt
Do you know how i can extract local language as the correct word?

I am also setting my language here but it not help OCR in 'Design studio' to understand it

How to extract image in pdf to be excel and local language Set2110

And if it not has the solution to extract data from local language in OCR. Any advice to enter text at the image directly to be final value i want?

How to extract image in pdf to be excel and local language Proble12

How to extract image in pdf to be excel and local language Proble13

In the picture as long as i try about my local language in OCR and it can't do.So, i try use "Manually ebter text in field" and then assign step to variable
"parking.company" when parking is output type.

I expected it will have word "MFEC" when OCR found that image area but finally variable name company still be empty string "" in picture

Can anyone suggest me how i can put input to my variable on that area image OCR can't extract data correct? Thank you

Last edited by DeepMind123 on Mon Aug 05, 2019 12:04 pm; edited 1 time in total (Reason for editing : no one answer me, I try other solution but still stuck in problem.So, i put my new solution to let other people know)

batman

batman: Posts : 47
Points : 1923
Join date : 2019-03-19

Post n°2

Re: How to extract image in pdf to be excel and local language

by batman Mon Aug 05, 2019 3:24 pm

Hi,

To change the default OCR language used within Desktop Automation, see the note below taken from the online help.

By default, Kapow installs the English language for OCR. When your robot performs text recognition in the Extract Text From Image Step, Kapow uses the language selected on the OCR tab of the Desktop Automation Service window.

To change the default language for OCR, perform the following steps.
1.Download the .traineddata file for the required language from the https://github.com/tesseract-ocr/tessdata. For example, the file for the French language is fra.traineddata.
2.Copy downloaded trained data file to DesktopAutomationService\lib\tessdata in the Desktop Automation service installation directory. Example:
C:\Program Files (x86)\Kapow DesktopAutomation 10.1.0 x32\DesktopAutomationService\lib\tessdata
3.Right-click the Desktop Automation icon in the notification area and select Configure.
4.Click the OCR tab and select a language in the Default OCR language list.
Selecting default language for OCR operation
Click Save and Restart.

You can also train the OCR engine to recognize your character set using TIF fonts or UI screen shots, however this would typically be used to recognize non-standard fonts used in an application UI rather than document content. Refer to the online help for more detail on this.

batman

batman: Posts : 47
Points : 1923
Join date : 2019-03-19

Post n°3

Re: How to extract image in pdf to be excel and local language

by batman Mon Aug 05, 2019 3:29 pm

Hi,

I'm not sure where you are trying to input data here. Are you trying to enter text against the image?

DA includes enter text steps which are typically used to enter data into desktop applications, or simulate use keystrokes when interacting with a desktop. You should be able to enter text in any place where a user can, which in the case of an flat image is not possible.

For extraction of content you could try the Thai OCR language to see the results. Given this is a document though you should probably be pushing this through document transformation (which also supports Thai language)

DeepMind123

DeepMind123: Posts : 7
Points : 1759
Join date : 2019-07-22

Post n°4

Re: How to extract image in pdf to be excel and local language

by DeepMind123 Mon Aug 05, 2019 3:38 pm

batman wrote:Hi,

To change the default OCR language used within Desktop Automation, see the note below taken from the online help.

By default, Kapow installs the English language for OCR. When your robot performs text recognition in the Extract Text From Image Step, Kapow uses the language selected on the OCR tab of the Desktop Automation Service window.

To change the default language for OCR, perform the following steps.
1.Download the .traineddata file for the required language from the https://github.com/tesseract-ocr/tessdata. For example, the file for the French language is fra.traineddata.
2.Copy downloaded trained data file to DesktopAutomationService\lib\tessdata in the Desktop Automation service installation directory. Example:
C:\Program Files (x86)\Kapow DesktopAutomation 10.1.0 x32\DesktopAutomationService\lib\tessdata
3.Right-click the Desktop Automation icon in the notification area and select Configure.
4.Click the OCR tab and select a language in the Default OCR language list.
Selecting default language for OCR operation
Click Save and Restart.

You can also train the OCR engine to recognize your character set using TIF fonts or UI screen shots, however this would typically be used to recognize non-standard fonts used in an application UI rather than document content. Refer to the online help for more detail on this.

Thank you
I am already do this step already in my second pictutre.I save and restart it since last week i try but it still not work

How to extract image in pdf to be excel and local language Set2111

DeepMind123

DeepMind123: Posts : 7
Points : 1759
Join date : 2019-07-22

Post n°5

Re: How to extract image in pdf to be excel and local language

by DeepMind123 Mon Aug 05, 2019 3:41 pm

batman wrote:Hi,

I'm not sure where you are trying to input data here. Are you trying to enter text against the image?

DA includes enter text steps which are typically used to enter data into desktop applications, or simulate use keystrokes when interacting with a desktop. You should be able to enter text in any place where a user can, which in the case of an flat image is not possible.

For extraction of content you could try the Thai OCR language to see the results. Given this is a document though you should probably be pushing this through document transformation (which also supports Thai language)

Yes i try to enter text against image because it can reac correct.So, i try to tell it if it run check found this area image i crop.It should be equal word='mfec'
but i understand you now .Enter text can use only input text that user will fill it not image

Re: How to extract image in pdf to be excel and local language

by Sponsored content

KAPOW

How to extract image in pdf to be excel and local language

How to extract image in pdf to be excel and local language

Re: How to extract image in pdf to be excel and local language

Re: How to extract image in pdf to be excel and local language

Re: How to extract image in pdf to be excel and local language

Re: How to extract image in pdf to be excel and local language

Re: How to extract image in pdf to be excel and local language

Similar topics

Similar topics