RSS

Tag Archives: recognition

ODC components in parallel

Continuing the discussion on setting up multiple instances of ODC components, for a faster processing of the scanned files, presenting what we were able to achieve and what we were not.

As I had described in my earlier post Managing bulk scanning and ingestion, you will first have to take care of the number of licenses that you see fit for the setup. Different elements that would be required are scanners, windows machines, shared storage area, database, and ofcourse the WebCenter Content.

This is a sample setup for your reference.

Untitled

We started with an architecture with Import, Recognition and Commit servers first, but in few test runs we found that the Commit Server behavior is a bit strange. Once a batch is Imported and Recognized (for its primary key elements), Commit server picks up the batch for committing, but while checking in the contents from a batch it acquires and releases the lock after each content is checked in, and moreover the batches started to get locked up between different Commit Servers. We couldnt solve this issue in time, and had to resort to commit the batches directly from the Recognition Server itself.

Now the setup has only Import and Recognition servers only, and it works like charm.

We still sometime get a file not found or batch could not be loaded issue, but the processing does not have any impact cause of this. Before actually a batch is locked for processing, it gets picked up by multiple Import or Recognition and once locked up by one, another one throws an error, which is fine.

Main tables for you to watch:

ecBatches: Core table which holds up all the “Live” batches. Batches which are just created and yet to be processed, along with “Error”ed out batches. All successfully processed/committed batches gets removed from this table.

ecAudit: As the name suggests stores the complete log of the batch processing, right from when it was imported, how many pages were imported, which recognition server picked it up , how many contents were checkedin for a batch, how many failed contents were there.

Will publish few status details and database queries for reporting purposes which can help you in creating a simple graphical ui in the next one.

Stay tuned.

 
Leave a comment

Posted by on January 6, 2013 in ODC, UCM

 

Tags: , , , , , , , , , ,

 
Design a site like this with WordPress.com
Get started