Securibox: ParseXtract - PX

Automated data extraction

Compatible with Image or PDF, PX allows you to extract structured data from semi-structured documents.

Securibox PX
How it works

Structured data

Improve, simplify and increase productivity of data workflows with machine learning technology.

Data owner.

User

px-HowItWorks-arror-dashed-right

Process or Solution.

Your App

px-HowItWorks-arror-solid-down

Doc

Image based
PDF
px-HowItWorks-arror-solid-down
px-HowItWorks-arror-solid-up

JSON

Structured extracted data

px-HowItWorks-arror-solid-up

PX

Securibox
ParseXtract

Classification

Identify and group files into homogeneous collections

Extraction

Extract, validate and format document data

In
Real-time

Automatically extract document data through real-time processing.

Easy integration

Smoothly integrate into 
your process with just 3 lines of code.

Improve quality

Avoid the manual processing related errors.

Reduce costs

No manual entry associated costs.

Models

Pre-trained

Invoices, bank statements and payslips have already been modeled and PX can be integrated, easily and quickly, to extract data from your documents.

Account Payables

Extract data from invoices that can be imported to an accounting software.

Credit Scoring

Gather data from financial documents to be able to measure credit rating.

Cash backs

Extract data from receipts to evaluate ammounts that can be redeem.

Output
{
    "detailedLabelId": "3f18d4a6bb6979ea3e9f7bce6ac61abc",
    "extractedData": [
        {
            "name": "Invoice.Type.Identifier",
            "value": "Invoice"
        },
        {
            "name": "Invoice.Date",
            "value": "18/09/2019"
        },
        {
            "name": "Invoice.Number.Identifier",
            "value": "2234567"
        },
        {
            "name": "Supplier.Name.Literal",
            "value": "Ma Société SARL"
        },
        {
            "name": "Supplier.National.Identifier",
            "value": "000000000000"
        },
        {
            "name": "Supplier.Siret.Identifier",
            "value": "554 874 445"
        },
        {
            "name": "Supplier.Vatnumber.Identifier",
            "value": "FR 000000000000"
        },
        {
            "name": "Invoice.Currency",
            "value": "EUR"
        },
        {
            "name": "Invoice.TotalAmount.WithoutTaxes.Amount",
            "value": "276,00"
        },
		{
            "name": "Invoice.VATTotal.Amount",
            "value": "55,20"
        },
		{
            "name": "Invoice.TotalAmount.WithTaxes.Amount",
            "value": "331,20"
        },
		{
            "name": "Customer.Contact.Name.Literal",
            "value": "Pénélope D. Seguin"
        },
		{
            "name": "Customer.VATNumber.Identifier",
            "value": ""
        },
		{
            "name": "Customer.Address.Line1",
            "value": "51 rue Nationale"
        },
		{
            "name": "Customer.Address.ZipCode",
            "value": "75003"
        },
		{
            "name": "Customer.Address.City",
            "value": "Paris"
        },
    ],
    "id": "DemoTrial_20100831_Armstrong_Neil_0014.pdf",
    "labelId": "FactureMaSociete"
}
                            
Output
{
	"detailedLabelId": "3f18d4a6bb6979ea3e9f7bce6ac61abc",
    "extractedData": [
        {
            "name": "Employee.Identifier",
            "value": "078904"
        },
        {
            "name": "Employee.Full.Name",
            "value": "Pénélope D Séguin"
        },
        {
            "name": "Employee.SocialSecurityNumber",
            "value": "2651132254647 79"
        },
        {
            "name": "Employee.Address.Line1",
            "value": "51 rue Nationale"
        },
        {
            "name": "Employee.Address.ZipCode",
            "value": "75003"
        },
        {
            "name": "Employee.Address.City.Name",
            "value": "Paris"
        },
        {
            "name": "Payslip.StartDate",
            "value": "01/08/2019"
        },
        {
            "name": "Payslip.EndDate",
            "value": "31/08/2019"
        },		
        {
            "name": "Company.Name",
            "value": "Ma Soci\u00e9t\u00e9 SARL"
        },
        {
            "name": "Company.SIRET.Identifier",
            "value": "55487445"
        }
    ],
    "id": "Bulletin_de_paie.pdf",
    "labelId": "MaSociete_label"
}
                            
Output
{
    "detailedLabelId": "3f18d4a6bb6979ea3e9f7bce6ac61abc",
    "extractedData": [
        {
            "name": "Invoice.Type.Identifier",
            "value": "Invoice"
        },
        {
            "name": "Invoice.Date",
            "value": "18/09/2019"
        },
        {
            "name": "Invoice.Number.Identifier",
            "value": "2234567"
        },
        {
            "name": "Supplier.Name.Literal",
            "value": "Ma Société SARL"
        },
        {
            "name": "Supplier.National.Identifier",
            "value": "000000000000"
        },
        {
            "name": "Supplier.Siret.Identifier",
            "value": "554 874 445"
        },
        {
            "name": "Supplier.Vatnumber.Identifier",
            "value": "FR 000000000000"
        },
        {
            "name": "Invoice.Currency",
            "value": "EUR"
        },
        {
            "name": "Invoice.TotalAmount.WithoutTaxes.Amount",
            "value": "276,00"
        },
		{
            "name": "Invoice.VATTotal.Amount",
            "value": "55,20"
        },
		{
            "name": "Invoice.TotalAmount.WithTaxes.Amount",
            "value": "331,20"
        },
		{
            "name": "Customer.Contact.Name.Literal",
            "value": "Pénélope D. Seguin"
        },
		{
            "name": "Customer.VATNumber.Identifier",
            "value": ""
        },
		{
            "name": "Customer.Address.Line1",
            "value": "51 rue Nationale"
        },
		{
            "name": "Customer.Address.ZipCode",
            "value": "75003"
        },
		{
            "name": "Customer.Address.City",
            "value": "Paris"
        },
    ],
    "id": "DemoTrial_20100831_Armstrong_Neil_0014.pdf",
    "labelId": "FactureMaSociete"
}
                            

Tech

Approach

We do it our own way: based on our family of unsupervised classifiers, document query language and query generator engine.

Divide et impera

The use of several uncorrelated unsupervised classifiers allow us to group similar documents together.

For instance, we are able to recognize a trademark in the document's header or a recurrent paragraph in the footer. Once the documents are grouped into the correct homogeneous collections, finding the right extraction rules is easier.

Whitebox

We have developed our own query language (PQL) that allows us to navigate the layout structure of the document, jumping to a specific point and use regex selectors.

Machine learning techniques are used to automatically generate the extraction rules. As these queries are human-readable, we can always correct or improve them in case of overfit or other issues.


Get in touch.

Looking for more information? We’re always available*.

*Ce formulaire permet de contacter Securibox pour toute question générale ! Vous pouvez accéder, obtenir une copie des données vous concernant, vous opposer au traitement de ces données, les faire rectifier effacer ainsi que limiter leur traitement.
Les données envoyées par ce formulaire peuvent être transférées hors Europe, dans le respect du RGPD.**