Skip to main content
All CollectionsHelp Center
Getting Started with DocuPanda πŸš€
Getting Started with DocuPanda πŸš€

Your first steps in DocuPanda

Updated over 6 months ago

Overview

DocuPanda is a tool that lets you process complex documents into consistent structured outputs. Even if the document has complex tables, checkmarks, and handwriting, DocuPanda will comprehend all of its content and map it to a consistent, structured output.

If you find videos more enjoyable then text, you can follow along to this video instead of reading through the guide. It includes everything covered in this guide and takes 10 minutes.

Using DocuPanda, can define exactly what properties you want to extract from PDFs, scans, and images. Once defined, you can extract these properties even with highly variable layout, structure. DocuPanda will event work across different document languages without any setup required!

The task can be as simple as extracting the monthly rent amount from a lease, and as understanding complex as extracting itemized and classified lines from financial statements from with deeply nested tables.

Example: Rental Leases

The rest of this guide will follow along one specific use case where we extract information from a rental lease.

Keep it mind that rental leases are just an example. DocuPanda can handle any document content and layout, even if they documents are completely unique to your business, such as a specific form or report type

Step 1: Upload a Few Rental Leases

DocuPanda works with as little as one document. You usually want to build your schema with 5-10 documents, but it's actually a good idea to first experiment with 1-2 documents for quick results.

To upload a bunch of documents, all you need to do is go to your dashboard and hit the upload button at the top right, like so:
​

Then, an upload modal will show up. Hit Upload file, and submit:


Once the documents upload is complete, you will see them populate in the documents tab.

Step 2: Create a Schema

This step is crucial. Good schemas are the heart of the DocuPanda. A schema is basically a template, or a plan of how we want to understand our documents. All you need to build a schema is have a few example documents (1-20), and explain with words what you want to extract from your documents.

Select the documents you want to use to build a schema

Go to the documents tab on your dashboard, and select your rental leases. Then hit the Create Schema Button.

Tip

The documents that you use together to build a schema need to contain the same information that you want to extract.

They do not need to have the same layout, structure, or even be written in the same language!

So for example, rental contracts from Japan and the USA can easily be understood using the same schema. A rental contract and an invoice, on the other hand, should probably be understood using different schemas

Explain what you need from the rental leases

DocuPanda lets you explain exactly what fields you want to extract from your lease. Let's say we want the following:

  1. The monthly rental amount, broken down into a number and currency symbol like "USD" or "EUR"

  2. The move in and move out date

  3. The deposit, if any, also broken into amount and currency

  4. Specifically, are pets allowed under the lease. If not specified, assume that pets are allowed

All you need to do is: type out exactly the above list as instructions to our schema creation process and you're done. That's literally all it takes to communicate with DocuPanda - plain, human language that explains what you want.

Hit next, then "submit".

Once you launch a the schema creation job, you will see a "loading" screen for a minute or so. you can either hang around on this screen and wait for schema creation to complete (the page will indicate when the schema is done), or continue to navigate around your dashboard and accomplish more tasks.

You can go to the jobs tab to check in on your schema creation progress. Refresh it a couple of times, and it should be complete in about a minute.

Step 3: Inspect the Results

Once the job is done, you will get two things:

  1. A new schema. The schema will include items like {"petsAllowed": boolean, or "moveInDate": date}. The schema is a general template for how we'll understand rental lease documents going forward.

  2. A standardization for each document that participated in the schema creation. That's the good stuff. A standardization contains the actual answers for a given document. If you're technically minded, a standardization is simply a JSON that obeys a predefined schema.

It's generally much easier to understand a schema by inspecting the standardizations it produces. So let's take a look at the standardization results. Go to the standardizations tab and click on any of the standardized results to see what they look like

Viewing the results

Click on any row to view it:

Downloading the results

Check all the items you want to download, and download results. We recommend JSON format because it keeps the results more organized, but if you want you can also download as CSV (excel) file. Just be aware that each document corresponds to a single row, so if you extract a list of values from a single document, it quickly becomes very unwieldy to use CSV format.

Step 4: Apply the schema to more documents

Once we've made a schema that we're happy with, and it extracts everything we want, we can apply the schema to millions of documents, and get standardizations that conform to the same structure every time.

All we need to do to generate more standardizations is go to the documents tab, upload more documents, select them, and hit the standardize button.


Then a new window will pop up where we select what schema we want to run - we only have on schema in this example, so we can apply the rental schema to more documents.

This way we can process millions of documents using our rental schema that we generated once and for all.

Step 5: Using the Results

Most of DocuPanda's users are businesses that need the standardization results to go someplace else. A result typically goes into a database, Google Spreadsheet, accounting software, etc. There are three tools at your disposal:

  1. Use our API, and webhooks, and accomplish anything you want with code

  2. Use our make.com integration to build a low code integration with 1000s of other destinations, including accounting softwares, CRMs, Google Spreadsheets, etc. Check out our make.com video guide.

  3. Reach out to us over customer support, and we'll work together to make sure your document insights are making an impact in hours, not months.

Did this answer your question?