KAPOW

Would you like to react to this message? Create an account in a few clicks or log in to continue.
KAPOW

Welcome to the Kapow forum. Here you can get help, use your skills to help others and enjoy hanging out in the company of other Kapow Robot Developers.


2 posters

    How to extract image in pdf to be excel and local language

    DeepMind123
    DeepMind123


    Posts : 7
    Points : 1759
    Join date : 2019-07-22

    How to extract image in pdf to be excel and local language Empty How to extract image in pdf to be excel and local language

    Post by DeepMind123 Fri Aug 02, 2019 5:26 pm

    ็็Hi i use 'Design studio 10.4.0.0'


    I have the problem about try to extract data from pdf file. The table in pdf is show as image and information is my local language (thai language). I saw it can extract correct only number and english words.


    How to extract image in pdf to be excel and local language Proble11


    In image i point at thai word is the name of company "บริษัท" but it show attribute in word text=1Jtgvt 
    Do you know how i can extract local language as the correct word?


    I am also setting my language here but it not help OCR in 'Design studio' to understand it
    How to extract image in pdf to be excel and local language Set2110




    And if it not has the solution to extract data from local language in OCR. Any advice to enter text at the image directly to be final value i want?

    How to extract image in pdf to be excel and local language Proble12
    How to extract image in pdf to be excel and local language Proble13

    In the picture as long as i try about my local language in OCR and it can't do.So, i try use "Manually ebter text in field" and then assign step to variable
    "parking.company" when parking is output type.


    I expected it will have word "MFEC" when OCR found that image area but finally variable name company still be empty string "" in picture


    Can anyone suggest me how i can put input to my variable on that area image OCR can't extract data correct? Thank you


    Last edited by DeepMind123 on Mon Aug 05, 2019 12:04 pm; edited 1 time in total (Reason for editing : no one answer me, I try other solution but still stuck in problem.So, i put my new solution to let other people know)
    batman
    batman


    Posts : 47
    Points : 1923
    Join date : 2019-03-19

    How to extract image in pdf to be excel and local language Empty Re: How to extract image in pdf to be excel and local language

    Post by batman Mon Aug 05, 2019 3:24 pm

    Hi,

    To change the default OCR language used within Desktop Automation, see the note below taken from the online help.

    By default, Kapow installs the English language for OCR. When your robot performs text recognition in the Extract Text From Image Step, Kapow uses the language selected on the OCR tab of the Desktop Automation Service window.

    To change the default language for OCR, perform the following steps.
    1.Download the .traineddata file for the required language from the https://github.com/tesseract-ocr/tessdata. For example, the file for the French language is fra.traineddata.
    2.Copy downloaded trained data file to DesktopAutomationService\lib\tessdata in the Desktop Automation service installation directory. Example:
    C:\Program Files (x86)\Kapow DesktopAutomation 10.1.0 x32\DesktopAutomationService\lib\tessdata
    3.Right-click the Desktop Automation icon in the notification area and select Configure.
    4.Click the OCR tab and select a language in the Default OCR language list.
    Selecting default language for OCR operation
    Click Save and Restart.

    You can also train the OCR engine to recognize your character set using TIF fonts or UI screen shots, however this would typically be used to recognize non-standard fonts used in an application UI rather than document content. Refer to the online help for more detail on this.
    batman
    batman


    Posts : 47
    Points : 1923
    Join date : 2019-03-19

    How to extract image in pdf to be excel and local language Empty Re: How to extract image in pdf to be excel and local language

    Post by batman Mon Aug 05, 2019 3:29 pm

    Hi,

    I'm not sure where you are trying to input data here. Are you trying to enter text against the image?

    DA includes enter text steps which are typically used to enter data into desktop applications, or simulate use keystrokes when interacting with a desktop. You should be able to enter text in any place where a user can, which in the case of an flat image is not possible.

    For extraction of content you could try the Thai OCR language to see the results. Given this is a document though you should probably be pushing this through document transformation (which also supports Thai language)
    DeepMind123
    DeepMind123


    Posts : 7
    Points : 1759
    Join date : 2019-07-22

    How to extract image in pdf to be excel and local language Empty Re: How to extract image in pdf to be excel and local language

    Post by DeepMind123 Mon Aug 05, 2019 3:38 pm

    batman wrote:Hi,

    To change the default OCR language used within Desktop Automation, see the note below taken from the online help.

    By default, Kapow installs the English language for OCR. When your robot performs text recognition in the Extract Text From Image Step, Kapow uses the language selected on the OCR tab of the Desktop Automation Service window.

    To change the default language for OCR, perform the following steps.
    1.Download the .traineddata file for the required language from the https://github.com/tesseract-ocr/tessdata. For example, the file for the French language is fra.traineddata.
    2.Copy downloaded trained data file to DesktopAutomationService\lib\tessdata in the Desktop Automation service installation directory. Example:
    C:\Program Files (x86)\Kapow DesktopAutomation 10.1.0 x32\DesktopAutomationService\lib\tessdata
    3.Right-click the Desktop Automation icon in the notification area and select Configure.
    4.Click the OCR tab and select a language in the Default OCR language list.
    Selecting default language for OCR operation
    Click Save and Restart.

    You can also train the OCR engine to recognize your character set using TIF fonts or UI screen shots, however this would typically be used to recognize non-standard fonts used in an application UI rather than document content. Refer to the online help for more detail on this.

    Thank you
    I am already do this step already in my second pictutre.I save and restart it since last week i try but it still not work
    How to extract image in pdf to be excel and local language Set2111
    DeepMind123
    DeepMind123


    Posts : 7
    Points : 1759
    Join date : 2019-07-22

    How to extract image in pdf to be excel and local language Empty Re: How to extract image in pdf to be excel and local language

    Post by DeepMind123 Mon Aug 05, 2019 3:41 pm

    batman wrote:Hi,

    I'm not sure where you are trying to input data here. Are you trying to enter text against the image?

    DA includes enter text steps which are typically used to enter data into desktop applications, or simulate use keystrokes when interacting with a desktop. You should be able to enter text in any place where a user can, which in the case of an flat image is not possible.

    For extraction of content you could try the Thai OCR language to see the results. Given this is a document though you should probably be pushing this through document transformation (which also supports Thai language)
     
    Yes i try to enter text against image because it can reac correct.So, i try to tell it if it run check found this area image i crop.It should be equal word='mfec'
    but i understand you now .Enter text can use only input text that user will fill it not image

    Sponsored content


    How to extract image in pdf to be excel and local language Empty Re: How to extract image in pdf to be excel and local language

    Post by Sponsored content


      Current date/time is Sun Apr 28, 2024 11:54 pm