Skip to main content

Concepts

This list describes key Exfil concepts and terminology. It's a good place to get started learning how to set up and use Exfil.

Model

A model is a specific document type. For example, an insurance project could be a remittance document type or a closing advice document type. Each project has its own unique and configurable set of fields so it can be tailored to any document type.

Training

Document

A document forms part of the project's training dataset. When a document is uploaded, Exfil converts it into a collection of text blocks that preserve the text's original location on each page.

Status

The status represents the state of the document. A newly added document has a status of NEW. Any document marked as DONE will be used in the dataset when a new model is trained. The REVIEW status allow you to manage the labelling process or flag documents for follow up.

Tag

A tag allows you to categorise documents. Tags might be used to identify a document format or flag problem documents.

Field

A field relates to a certain type of data from a document type. For example, it could be an address, a total amount, or an invoice date. A field can either be document level (representative of the entire document) or a column within a table (representative of rows in a table).

Label

A label is a single occurrence of a field within a document. Multiple labels can be assigned for a single field, but each text block can only be labelled once.

Train

A model applies all the learnings from the documents in the training dataset to extract data from a previously unseen document. Exfil learns over time, with more data in the training dataset leading to more accurate results in each successive model. A model can be trained after several documents have been labelled and marked as DONE within the Training section.

To help you negotiation the complexities of the machine learning training, Unmand will manage all the model training process for you. Depending on the variability within your dataset, roughly 25-50 documents are needed to train an initial model. As a rule of thumb, the more documents, the higher the accuracy.

info

Depending on the variability within your dataset, roughly 25-50 documents are needed to train an initial model. As a rule of thumb, the more documents, the higher the accuracy.