Concepts
This list describes key Exfil concepts and terminology. It's a good place to get started learning how to set up and use Exfil.
#
ModelA model is a specific document type. For example, an insurance project could be a remittance document type or a closing advice document type. Each project has its own unique and configurable set of fields so it can be tailored to any document type.
#
Training#
DocumentA document forms part of the project's training dataset. When a document is uploaded, Exfil converts it into a collection of text blocks that preserve the text's original location on each page.
#
StatusThe status represents the state of the document. A newly added document has a status of NEW
. Any document marked as DONE
will be used in the dataset when a new model is trained. The REVIEW
status allow you to manage the labelling process or flag documents for follow up.
#
TagA tag allows you to categorise documents. Tags might be used to identify a document format or flag problem documents.
#
FieldA field relates to a certain type of data from a document type. For example, it could be an address, a total amount, or an invoice date. A field can either be document level (representative of the entire document) or a column within a table (representative of rows in a table).
#
LabelA label is a single occurrence of a field within a document. Multiple labels can be assigned for a single field, but each text block can only be labelled once.
#
TrainA model applies all the learnings from the documents in the training dataset to extract data from a previously unseen document. Exfil learns over time, with more data in the training dataset leading to more accurate results in each successive model. A model can be trained after several documents have been labelled and marked as DONE
within the Training
section.
To help you negotiation the complexities of the machine learning training, Unmand will manage all the model training process for you. Depending on the variability within your dataset, roughly 25-50 documents are needed to train an initial model. As a rule of thumb, the more documents, the higher the accuracy.
important
Depending on the variability within your dataset, roughly 25-50 documents are needed to train an initial model. As a rule of thumb, the more documents, the higher the accuracy.