Overview
This guide will help you extract data from Paystubs using Butler's OCR APIs in Python. In 15 minutes you'll be ready to add Python Paystub OCR into your product or workflow!
Before getting started, you'll want to make sure to do the following:
- Signup for a free Butler account at https://app.butlerlabs.ai
- Write down your Butler API key from the Settings menu. Follow the Getting Started guide for more details about how to do that.
Get your API ID
Sign into the Butler product, go to the Library and search for the Paystub model:
Click on the Paystub card, then press the Try Now button to create a new Paystub model:
Once on the model details page, go to the APIs tab:
Copy the API ID (also known as the Queue ID) and write it down. We'll use it in our code below.
Sample Python Paystub OCR Code
You can copy and paste the following Python sample code to process documents with OCR using the API.
# Ensure it's installed in your environment with pip install butler-sdk
from butler import Client
# Get API Key from https://docs.butlerlabs.ai/reference/uploading-documents-to-the-rest-api#get-your-api-key
api_key = '<api-key>'
# Get Queue ID from https://docs.butlerlabs.ai/reference/uploading-documents-to-the-rest-api#go-to-the-model-details-page
queue_id = '<queue_id>'
# Response is a strongly typed object
response = Client(api_key).extract_document(queue_id, 'paystub.pdf')
# Convert to a dictionary for printing
print(response.to_dict())
In-Product Sample Code
You can also copy the sample code directly from the product. This code will have your API ID and API Key already pre-populated for you!
Extracted Paystub Fields
Here is an example of what an Invoice JSON response looks like:
{
"documentId": "63ac0a7e-5cd6-4cbf-8d4f-05be429fb33f",
"documentStatus": "Completed",
"fileName": "paystub-sample-2017.jpeg",
"mimeType": "image/jpeg",
"documentType": "Paystubs",
"confidenceScore": "High",
"formFields": [
{
"fieldName": "Employee Address",
"value": "123 Franklin St\nCHAPEL HILL, NC 27517",
"confidenceScore": "Low"
},
{
"fieldName": "Employer Address",
"value": "103 South Building, Campus Box 9100\nChapel Hill, NC 27599-9100",
"confidenceScore": "Low"
},
{
"fieldName": "Employer Name",
"value": "The University of North Carolina at Chapel Hill",
"confidenceScore": "Low"
},
{
"fieldName": "Start Date",
"value": "07/10/2017",
"confidenceScore": "High"
},
{
"fieldName": "End Date",
"value": "07/23/2017",
"confidenceScore": "High"
},
{
"fieldName": "Gross Earnings",
"value": "1,627.74",
"confidenceScore": "High"
},
{
"fieldName": "Gross Earnings YTD",
"value": "28,707.21",
"confidenceScore": "High"
},
{
"fieldName": "Net Pay",
"value": "1,040.23",
"confidenceScore": "High"
},
{
"fieldName": "Net Pay YTD",
"value": "18,396.25",
"confidenceScore": "High"
},
{
"fieldName": "Pay Date",
"value": "08/04/2017",
"confidenceScore": "High"
},
{
"fieldName": "Federal Allowance",
"value": "0",
"confidenceScore": "High"
},
{
"fieldName": "Federal Marital Status",
"value": "Single",
"confidenceScore": "High"
},
{
"fieldName": "State Allowance",
"value": "0",
"confidenceScore": "High"
},
{
"fieldName": "State Marital Status",
"value": "Single",
"confidenceScore": "High"
}
],
"tables": [
{
"tableName": "Deductions",
"confidenceScore": "Low",
"rows": [
{
"cells": [
{
"columnName": "Deduction Type",
"value": "TSERS - Retirement",
"confidenceScore": "High"
},
{
"columnName": "Deduction This Period",
"value": "25.00",
"confidenceScore": "High"
},
{
"columnName": "Deduction YTD",
"value": "425.00",
"confidenceScore": "High"
}
]
},
{
"cells": [
{
"columnName": "Deduction Type",
"value": "Critical Illness",
"confidenceScore": "High"
},
{
"columnName": "Deduction This Period",
"value": "32.10",
"confidenceScore": "High"
},
{
"columnName": "Deduction YTD",
"value": "32.00",
"confidenceScore": "High"
}
]
}
]
},
{
"tableName": "Direct Deposits",
"confidenceScore": "Low",
"rows": [
{
"cells": [
{
"columnName": "Amount",
"value": "1,040.23",
"confidenceScore": "High"
},
{
"columnName": "Employee Account Number",
"value": "XXXXX000000",
"confidenceScore": "High"
}
]
}
]
},
{
"tableName": "Earnings",
"confidenceScore": "Low",
"rows": [
{
"cells": [
{
"columnName": "Earning Type",
"value": "Regular",
"confidenceScore": "High"
},
{
"columnName": "Earning Rate",
"value": "20.346846",
"confidenceScore": "High"
},
{
"columnName": "Earning Hours",
"value": "74.50",
"confidenceScore": "High"
},
{
"columnName": "Earning This Period",
"value": "1,515.84",
"confidenceScore": "High"
},
{
"columnName": "Earning YTD",
"value": "17,446.65",
"confidenceScore": "High"
}
]
},
{
"cells": [
{
"columnName": "Earning Type",
"value": "Sick",
"confidenceScore": "High"
},
{
"columnName": "Earning Rate",
"value": "20.346846",
"confidenceScore": "High"
},
{
"columnName": "Earning Hours",
"value": "3.50",
"confidenceScore": "High"
},
{
"columnName": "Earning This Period",
"value": "71.21",
"confidenceScore": "High"
},
{
"columnName": "Earning YTD",
"value": "395.51",
"confidenceScore": "High"
}
]
}
]
},
{
"tableName": "Taxes",
"confidenceScore": "Low",
"rows": [
{
"cells": [
{
"columnName": "Tax Type",
"value": "Fed Withholdng",
"confidenceScore": "High"
},
{
"columnName": "Tax This Period",
"value": "182.98",
"confidenceScore": "High"
},
{
"columnName": "Tax YTD",
"value": "3,319.78",
"confidenceScore": "Low"
}
]
},
{
"cells": [
{
"columnName": "Tax Type",
"value": "Fed MED/EE",
"confidenceScore": "High"
},
{
"columnName": "Tax This Period",
"value": "22.12",
"confidenceScore": "High"
},
{
"columnName": "Tax YTD",
"value": "394.81",
"confidenceScore": "Low"
}
]
}
]
}
]
}
Full Paystub API Response
The above JSON does not include all of the values that can be extracted from Paystub. For full details, see the Pay Stub page.