Securibox: ParseXtract - PX Docs:API

ReDoc documentation

Securibox PX API (0.5.2)

Download OpenAPI specification:Download

Securibox ParseXtract (PX) API allows you to classify and extract information from PDF documents. You can find out more about PX at https://px.securibox.eu.

Authentication

basic

Security scheme type: HTTP
HTTP Authorization Scheme basic

jwt

Security scheme type: API Key
Header parameter name: Authorization

docs

Methods related to document classification and parsing

Cluster the provided documents

Creates a set of classifiers on-the-fly based on the provided documents and uses them to clusterize the batch. The output documents will have Document.labelId and Document.detailedLabelId filled in according to the classifiers output. This values can be used to split the batch.

Moreover, the client can provide a serialized instance of the classifiers (Cricket and/or Orion) which will be updated accordingly, then used to classify the documents and finally returned in the response object.

If mode='split', it accepts an array containing only a single document, which content will be split by page.

The API will return an error if the number of documents is more than 100.

Authorizations:
query Parameters
mode
string

Special handling parameter. If provided, it must be equal to 'split'.

Request Body schema: application/json

ClusterizationUnit containing the docs to be clustered and (optionally) the content of Cricket and/or Orion classifier instances, serialized as base64 strings

docs
Array of objects (Document)
orionContent
string
cricketContent
string

Responses

200

ClusterizationUnit containing the clusterized documents and the content of Cricket and/or Orion classifier instances, serialized as base64 strings. Any error will be listed in the Document.processingErrors property

400

Bad request. Missing or invalid parameters.

post /docs/cluster
https://parse.securibox.eu/api/v1/docs/cluster

Request samples

application/json
Copy
Expand all Collapse all
{
  • "docs":
    [
    ],
  • "orionContent": "string",
  • "cricketContent": "string"
}

Response samples

application/json
Copy
Expand all Collapse all
{
  • "docs":
    [
    ],
  • "orionContent": "string",
  • "cricketContent": "string"
}

Parse the provided documents

If Document.labelId is not provided, returns the document with Document.ExtractedData parsed according to the most probable label. Otherwise, it parses the document according to the specified Document.labelId and Document.detailedLabelId.

If mode='split', it accepts an array containing only a single document, which content will be split by page.

Authorizations:
query Parameters
mode
string

Special handling parameter. If provided, it must be equal to 'split'.

Request Body schema: application/json

Documents to be (classified and) parsed

Array
id
string
content
string <byte>
labelId
string
detailedLabelId
string
customerId
string
extractedData
Array of objects (ExtractedData)
processingErrors
Array of objects (ProcessingError)

Responses

200

Classified and parsed documents. Any error will be listed in the Document.processingErrors property

400

Bad request. Missing or invalid parameters.

post /docs/parse
https://parse.securibox.eu/api/v1/docs/parse

Request samples

application/json
Copy
Expand all Collapse all
[
  • {
    }
]

Response samples

application/json
Copy
Expand all Collapse all
[
  • {
    }
]

Store the provided documents for manual parsing

Store documents which were wrongly classified and/or parsed for manual classification and parsing. extractedData may be missing.

Authorizations:
Request Body schema: application/json

Documents (classified and parsed)

Array
id
string
content
string <byte>
labelId
string
detailedLabelId
string
customerId
string
extractedData
Array of objects (ExtractedData)
processingErrors
Array of objects (ProcessingError)

Responses

200

Array of documents. Documents with errors are NOT stored. Any error will be listed in the Document.processingErrors property.

400

Bad request. Missing or invalid parameters.

post /docs/feed
https://parse.securibox.eu/api/v1/docs/feed

Request samples

application/json
Copy
Expand all Collapse all
[
  • {
    }
]

Response samples

application/json
Copy
Expand all Collapse all
[
  • {
    }
]

Try to parse the documents based on the few data provided

Based on a few Document.extractedData that must be present in the input array, create a parser on-the-fly and use it to parse the entire batch.

This method can be used to speed-up data entry if the application does not recognize the documents (i.e. the models were not trained on the document type you want to parse). For instance, it can help you provide new train data for the feed or train methods.

Authorizations:
Request Body schema: application/json

Documents to be (classified and) parsed

Array
id
string
content
string <byte>
labelId
string
detailedLabelId
string
customerId
string
extractedData
Array of objects (ExtractedData)
processingErrors
Array of objects (ProcessingError)

Responses

200

Parsed documents. Note that the parsing may be uncomplete (need user validation). Any error will be listed in the Document.processingErrors property.

400

Bad request. Missing or invalid parameters.

post /docs/guess
https://parse.securibox.eu/api/v1/docs/guess

Request samples

application/json
Copy
Expand all Collapse all
[
  • {
    }
]

Response samples

application/json
Copy
Expand all Collapse all
[
  • {
    }
]

Store the provided documents for model training

Store documents which are properly classified and parsed for further training. To be used to feed new or wrongly processed data back into the application.

Authorizations:
Request Body schema: application/json

Documents (classified and parsed)

Array
id
string
content
string <byte>
labelId
string
detailedLabelId
string
customerId
string
extractedData
Array of objects (ExtractedData)
processingErrors
Array of objects (ProcessingError)

Responses

200

Array of documents. Documents with errors are NOT stored. Any error will be listed in the Document.processingErrors property.

400

Bad request. Missing or invalid parameters.

post /docs/train
https://parse.securibox.eu/api/v1/docs/train

Request samples

application/json
Copy
Expand all Collapse all
[
  • {
    }
]

Response samples

application/json
Copy
Expand all Collapse all
[
  • {
    }
]

cycle

Start the train cycle

Start the training cycle if no cycle is running and at least one new document was sent to /train. Response content is always empty

Authorizations:

Responses

201

Cycle started.

204

Cycle already running.

304

No new documents found.

post /cycle/start
https://parse.securibox.eu/api/v1/cycle/start