Overview

This guide will help you extract data from W9s using Butler's OCR APIs in Node.js. In 15 minutes you'll be ready to add Node.js W9 OCR into your product or workflow!

Before getting started, you'll want to make sure to do the following:

  1. Signup for a free Butler account at https://app.butlerlabs.ai
  2. Write down your Butler API key from the Settings menu. Follow the Getting Started guide for more details about how to do that.

Get your API ID

Sign into the Butler product, go to the Library and search for the W9 model:

2880

Click on the W9s card, then press the Try Now button to create a new W9 model:

2880

Once on the model details page, go to the APIs tab:

2880

Copy the API ID (also known as the Queue ID) and write it down. We'll use it in our code below.

Sample Node.js W9 OCR Code

You can copy and paste the following Node.js sample code to process documents with OCR using the API.

// Import necessary libraries
const axios = require('axios');
const fs = require('fs');
const FormData = require('form-data');

// Specify variables for use in script below
const apiBaseUrl = 'https://app.butlerlabs.ai/api';

// Make sure to add the API Key you wrote down above to the auth headers
const apiKey = 'MY_API_KEY';
const authHeaders = {
  'Authorization': 'Bearer ' + apiKey
};

// Use the Queue API Id you grabbed earlier
const queueId = 'MY_QUEUE_ID';

// Specify the path to the file you would like to process
const localFilePaths = ['/path/to/file'];

// Specify the API URL
const uploadUrl = apiBaseUrl + '/queues/' + queueId + '/uploads';

// This async function uploads the files passed to it and returns the id
// needed for fetching results.
// It is used in our main execution function below
const uploadFiles = async (filePaths) => {
  // Prepare file for upload
  const formData = new FormData();
  filePaths.forEach((filePath) => {
    formData.append('files', fs.createReadStream(filePath));
  });

  // Upload files to the upload API
  console.log('Uploding files to Butler for processing');
  const uploadResponse = await axios.post(
    uploadUrl,
    formData,
    {
      headers: {
        ...authHeaders,
        ...formData.getHeaders(),
      }
    })
    .catch((err) => console.log(err));

  // Return the Upload ID
  return uploadResponse.data.uploadId;
}

// This async function polls every 5 seconds for the extraction results using the
// upload id provided and returns the results once ready
const getExtractionResults = async (uploadId) => {
  // URL to fetch the result
  const extractionResultsUrl = apiBaseUrl + '/queues/' + queueId + '/extraction_results';
  const params = { uploadId };

  // Simple helper function for use while polling on results
  const sleep = (waitTimeInMs) => new Promise(resolve => setTimeout(resolve, waitTimeInMs));

  // Make sure to poll every few seconds for results.
  // For smaller documents this will typically take only a few seconds
  let extractionResults = null;
  while (!extractionResults) {
    console.log('Fetching extraction results');
    const resultApiResponse = await axios.get(
      extractionResultsUrl,
      { headers: { ...authHeaders }, params, }
    );

    const firstDocument = resultApiResponse.data.items[0];
    const extractionStatus = firstDocument.documentStatus;
    // If extraction has not yet completed, sleep for 5 seconds
    if (extractionStatus !== 'Completed') {
      console.log('Extraction still in progress. Sleeping for 5 seconds...');
      await sleep(5 * 1000);
    } else {
      console.log('Extraction results ready');
      return resultApiResponse.data;
    }
  }
}

// Use the main function to run our entire script
const main = async () => {
  // Upload Files
  const uploadId = await uploadFiles(localFilePaths);
  // Get the extraction results
  const extractionResults = await getExtractionResults(uploadId);

  // Print out the extraction results for each document
  extractionResults.items.forEach(documentResult => {
    const fileName = documentResult.fileName;
    console.log('Extraction results from ' + fileName);

    // Print out each field name and extracted value
    console.log('Fields')
    documentResult.formFields.forEach(field => {
      const fieldName = field.fieldName;
      const extractedValue = field.value;

      console.log(fieldName + ' : ' + extractedValue);
    });

    // Print out the results of each table
    console.log('\n\nTables');
    documentResult.tables.forEach(table => {
      console.log('Table name: ' + table.tableName);
      table.rows.forEach((row, idx) => {
        let rowResults = 'Row ' + idx + ': \n';
        row.cells.forEach(cell => {
          // Add each cells name and extracted value to the row results
          rowResults += cell.columnName + ': ' + cell.value + ' \n';
        });

        console.log(rowResults);
      });
    });
  });
}

main();

Make sure to do the following before running the code:

  1. Replace the queueId variable with your API ID
  2. Replace the apiKey variable with your API Key
  3. Replace the localFilePaths variable with your local file location

📘

In-Product Sample Code

You can also copy the sample code directly from the product. This code will have your API ID and API Key already pre-populated for you!

Extracted W9 Fields

Here is an example of what a W9 JSON response looks like:

{
  "documentId": "59b1ea9b-cd96-482e-8fc7-36d18c778aa6",
  "documentStatus": "Completed",
  "fileName": "w9-sample-2.png",
  "mimeType": "image/png",
  "documentType": "W9s",
  "confidenceScore": "High",
  "formFields": [
    {
      "fieldName": "Form Revision Date",
      "value": "( Rev. October 2018)",
      "confidenceScore": "High"
    },
    {
      "fieldName": "Name",
      "value": "Paul Sakhatskyi",
      "confidenceScore": "High"
    },
    {
      "fieldName": "Business Name",
      "value": "Readdle Inc",
      "confidenceScore": "High"
    },
    {
      "fieldName": "Federal Tax Classification",
      "value": "CCorporation",
      "confidenceScore": "High"
    },
    {
      "fieldName": "Federal Tax Classification Other",
      "value": "(",
      "confidenceScore": "Low"
    },
    {
      "fieldName": "Address",
      "value": "795 Folsom St",
      "confidenceScore": "Low"
    },
    {
      "fieldName": "City State Zip",
      "value": "San Francisco, CA 94107",
      "confidenceScore": "High"
    },
    {
      "fieldName": "Has Signature",
      "value": "NO",
      "confidenceScore": "Low"
    },
    {
      "fieldName": "Has Signature Date",
      "value": "NO",
      "confidenceScore": "Low"
    }
  ],
  "tables": []
}

📘

W9 API Response Details

For full details about the W9 Model and its API response, see the W9 page.