Back to search page

rdes20170821001

Eurostars / Eureka Partner search: R&D institution or SME used to deal with Optical Character Recognition engines for providing both high level requirements and fine grained technical details

RESEARCH PROFILE   REQUEST   ADD TO FAVOURITES
PRINT

 

ABSTRACT

Spanish SME specialised in enterprise content management (ECM) and document capture is looking to identify suitable SMEs or R&D institutions to join an Eurostars proposal to the next cut-off deadline (14th September 2017). The partner has to be used to deal with Optical Character Recognition (OCR) engines for providing both high level requirements. The expected outcome of the project is to go beyond what current OCR engines are offering in the market.


FULL DESCRIPTION

The project proposes the development of new fault tolerant open source OCR/ICR solution comparable with the features provided by some of the commercial ones. Therefore providing trustable extraction from documents with any layout, typography and/or writing style (both type and handwritten).

The enterprise content management (ECM) and the document capture markets are definitely in need of innovative solutions able to relieve human work. The ECM market is expected to move 12 billions dollars for 2019. It is a huge market because organizations at any size need require of some kind of content management solution. On the other hand, the document capture market, closely related to ECM, growth last year more than a 6% reaching a volume of 2 billions dollars.

Having at hand powerful, flexible, easy-adaptable, extensible OCR-ICR tools has become a fundamental problem for many organizations in their document management processes. Current tools for OCR are mainly based in techniques that were developed in the past twenty years in the pattern recognition (PR) field, like support vector machines, or simply nearest neighbour techniques. These tools do not take profit in many cases of contextual information for improving the OCR results. This contextual information consists mainly in linguistic resources like vocabularies and lexicons that reside usually in the companies that make use of these tools. With this project, the company intends to develop a series of OCR tools that will be accessible as open source that will take profit of these linguistic resources for building language models and to deal with difficult documents.

In recent years a new technology that is based in deep learning techniques, has strongly emerged in many Pattern Recognition problems including handwritten text recognition (HTR). These powerful techniques can be extended easily to implement OCR systems in order to deal with difficult documents and some research teams intend to develop these tools in the near future. One advantage of these deep learning techniques is that they are able to classify very quickly the sample to be recognized. Another advantage is that there exist many basic open source tools for performing the core processing. With this project, the company intends to develop a series of tools based on free software both for generating automatically training data and for training an OCR system based on deep learning techniques.

The call that the company is targeting is a Eurostars cut off 8, deadline is 14/09/2017. As the deadline is very close, the company is also considering to prepare a proposal for next Eurostars cut off deadline or an Eureka network project (always open).
EOI deadline: 07 September 2017
Project duration: 1,5 years (aprox).

Ideal partner is an SME (or RD institution) in the entreprise content management, document analysis or document capture market who can strengthen their own products and solutions with this DeepLearning based new OCR and ICR platform. It should have expertise in:
Natural Language Processing
Image Processing
OCR Technologies
Artificial Intelligence
Big Data


Partner expertise sought:

- Type of partner sought: Ideal partner is an SME (or RD institution) in the entreprise content management, document analysis or document capture market who can strengthen their own products and solutions with this DeepLearning based new OCR and ICR platform. The parnet should have expertise in:
Natural Language Processing
Image Processing
OCR Technologies
Artificial Intelligence
Big Data
- Specific area of activity of the partner: Part of the technical development will be contracted with an University R&D Group with whom the company already has built an initial prototype. They are now seeking a technical and/or commercial partner with similar uses cases or interested in the expected outcomes of the project beyond what current OCR engines are offering in the market. Ideal partner is an SME in the entreprise content management, document analysis or document capture market who can strengthen their own products and solutions with this DeepLearning based new OCR and ICR platform.

Therefore, the company is seeking for an integrator partner used to deal with OCR engines for providing both high level requirements and fine grained technical details about the new engine capabilities. A concrete use case around OCR/ICR/HTR technologies is preferred, ideally a use case where they find a barrier in the current state of the art in this technologies.


Advantages & innovations:

The project intends to go beyond the state-of-the-art by developing tools for recognizing printed documents with handwritten text recognition (HTR) techniques that we will name from now on as OCR-HTR techniques. These techniques should be effective for printed documents for which current OCR techniques are not able to obtain good results. The main foundations of these tools will be: i) the technology will be based on a combination of Deep Neural Network Hidden Markov Models (DNN-HMM) for optical modeling; ii) n-gram models will be used in a recognition/decoding system; iii) Words generation and indices preparation will be implemented for making the collections of printed documents searchable.

The main technological outcome of the project include contributions to the Open Source Community:

- Industry-ready OCR and ICR with permissive open source (BSD, MIT or ASL 2.0 licensed) components that would perform as good as far more expensive commercial products.

- An API for OCR-HTR system tailored to be used by Content Management Systems and full text retrieval systems especially Apache Lucene / Solr a and Elasticsearch.


Development Stage:

Under development/lab tested


IPR:

Other

Programme - Call:

Evaluation scheme: Two-stage submission (online submission, national submission) www.eurostars-eureka.eu/eurostars-process

Acronym: Deepdocs

Website: http://www.eurostars-eureka.eu

Duration: 75 week(s)

Deadline: 14/09/2017

Coordinator required: No

Sign-in or create an account

To express an interest in this profile, you must first sign in or create a new account.

If you already have an account, sign in here

Not got an account yet, sign up here

KEY INFORMATION

Country of origin
SPAIN
Profile date
24/08/2017
Deadline
24/08/2018

PARTNERSHIP(S) SOUGHT

Research cooperation agreement

CLASSIFICATIONS

INDUSTRY SECTORS
Information Processing & Systems \ Software \ Multimedia, Digital Content and Gaming \ Tourism & Hospitality \ Internet and Wireless Technologies (Wireless, WiFi, Bluetooth) \ TechUK Single Mapping, August 2014
TECHNOLOGY KEYWORDS
ELECTRONICS, IT AND TELECOMMS / Information Processing & Systems, Workflow / Archivistics/Documentation/Technical Documentation / ELECTRONICS, IT AND TELECOMMS / Information Processing & Systems, Workflow / Artificial Intelligence (AI)
COMMERCIAL KEYWORDS
Data processing, hosting and related activities
MARKET KEYWORDS
COMPUTER RELATED / Scanning Related / OCR (optical character recognition) / COMPUTER RELATED / Computer Software / Artificial intelligence related software

FIND OUT MORE

Contact Enterprise Europe Network Scotland by email at info@enterprise-europe-scotland.com, quoting reference number rdes20170821001