Overview

This guide will help you extract data from W2s using Butler's OCR APIs in Python. In 15 minutes you'll be ready to add Python W2 OCR into your product or workflow!

Before getting started, you'll want to make sure to do the following:

  1. Signup for a free Butler account at https://app.butlerlabs.ai
  2. Write down your Butler API key from the Settings menu. Follow the Getting Started guide for more details about how to do that.

Get your API ID

Sign into the Butler product, go to the Library and search for the W2 model:

28802880

Click on the W2 card, then press the Try Now button to create a new W2 model:

28802880

Once on the model details page, go to the APIs tab:

28802880

Copy the API ID (also known as the Queue ID) and write it down. We'll use it in our code below.

Sample Python W2 OCR Code

You can copy and paste the following Python sample code to process documents with OCR using the API.

# Ensure it's installed in your environment with pip install butler-sdk
from butler import Client

# Get API Key from https://docs.butlerlabs.ai/reference/uploading-documents-to-the-rest-api#get-your-api-key
api_key = '<api-key>'
# Get Queue ID from https://docs.butlerlabs.ai/reference/uploading-documents-to-the-rest-api#go-to-the-model-details-page
queue_id = '<queue_id>'

# Response is a strongly typed object
response = Client(api_key).extract_document(queue_id, 'sample_w2.pdf')
# Convert to a dictionary for printing
print(response.to_dict())

📘

In-Product Sample Code

You can also copy the sample code directly from the product. This code will have your API ID and API Key already pre-populated for you!

Extracted W2 Fields

Here is an example of what an W2 JSON response looks like:

{
  "documentId": "f3305614-9352-4a65-a49c-6e11dd534b22",
  "documentStatus": "Completed",
  "fileName": "W2-sample-1.png",
  "mimeType": "image/png",
  "documentType": "W2s",
  "confidenceScore": "High",
  "formFields": [
    {
      "fieldName": "Form Year",
      "value": "2014",
      "confidenceScore": "High"
    },
    {
      "fieldName": "SSN",
      "value": "123-45-6789",
      "confidenceScore": "High"
    },
    {
      "fieldName": "EIN",
      "value": "11-2233445",
      "confidenceScore": "Low"
    },
    {
      "fieldName": "Control Number",
      "value": "A1B2",
      "confidenceScore": "High"
    },
    {
      "fieldName": "Employee Name",
      "value": "Jane A DOE",
      "confidenceScore": "Low"
    },
    {
      "fieldName": "Employee Address",
      "value": "123 Elm Street\nAnywhere Else, PA 23456",
      "confidenceScore": "Low"
    },
    {
      "fieldName": "Wages Tips and Other Compensation",
      "value": "48,500.00",
      "confidenceScore": "High"
    },
    {
      "fieldName": "Federal Income Tax Withheld",
      "value": "6,835.00",
      "confidenceScore": "High"
    },
    {
      "fieldName": "Social Security Wages",
      "value": "50,000.00",
      "confidenceScore": "High"
    },
    {
      "fieldName": "Social Security Tax Withheld",
      "value": "3,100.00",
      "confidenceScore": "High"
    },
    {
      "fieldName": "Medicare Wages And Tips",
      "value": "50,000.00",
      "confidenceScore": "High"
    },
    {
      "fieldName": "Medicare Tax Withheld",
      "value": "725.00",
      "confidenceScore": "High"
    },
    {
      "fieldName": "State Line 1",
      "value": "PA",
      "confidenceScore": "Low"
    },
    {
      "fieldName": "State Wages Tips Etc Line 1",
      "value": "50,000",
      "confidenceScore": "High"
    },
    {
      "fieldName": "State Income Tax Line 1",
      "value": "1,535",
      "confidenceScore": "Low"
    },
    {
      "fieldName": "Local Wages Tips Etc Line 1",
      "value": "1,535",
      "confidenceScore": "Low"
    },
    {
      "fieldName": "Local Income Tax Line 1",
      "value": "750",
      "confidenceScore": "Low"
    },
    {
      "fieldName": "Locality Name Line 1",
      "value": "MU",
      "confidenceScore": "High"
    }
  ],
  "tables": []
}

📘

W2 API Response Details

For full details about the W2 Model and its API response, see the W2 page.