Skip to main content

Building a Google Docs Text to Speech (TTS) Agent

· 9 min read
Christopher Brox
Building AI Agents @ Google

How to turn Google Docs into speech.

Disclaimers:

  • At the time of this writing, I am employed by Google Cloud. However the thoughts expressed here are my own and do not represent my employer.
  • The code provided here is sample code for educational purposes only. Please write your own production code.

Introduction

Marketing professionals are constantly creating content, from website copy and email campaigns to video scripts and social media posts. But what if you could easily convert that written content into audio? Imagine being able to create audio versions of your blog posts for wider accessibility, generate voiceovers for your promotional videos without hiring voice actors, or even proof-listen to your ad copy for tone and impact. This is where a Text-to-Speech (TTS) solution integrated with your content creation workflow can be a game-changer.

Today, we're going to build a powerful Python tool that does exactly that. We'll create a script that securely connects to the Google Docs API, extracts the text from any document, and then uses Google Cloud's high-fidelity Text-to-Speech (TTS) API—specifically the new Chirp HD voices—to generate a high-quality audio file.

What We're Building

The end goal is a Python script that you can run from your terminal, providing it with a Google Doc URL. The script will output a link to an audio file (.wav) of your document's content, ready for you to listen to.

Here’s a high-level look at the architecture:

Prerequisites

Before we dive into the code, make sure you have the following set up. This is the most crucial part of the setup process!

ItemDescriptionPurpose
Google Cloud AccountAn active GCP account with billing enabled.Required for using Secret Manager, Docs API, TTS API, and Cloud Storage.
A Google DocA document you want to convert to speech.The source material for our script.
Python EnvironmentPython 3.11+ with pip installed.To run our script and install libraries. You can use Google Cloud Shell Editor if you do not have a local Python environment.
Google Cloud SDKThe gcloud CLI tool installed and authenticated.To interact with GCP services from your terminal.

You'll also need to install the necessary Python libraries:

pip install google-api-python-client google-auth-httplib2 google-auth-oauthlib google-cloud-secret-manager google-cloud-texttospeech

Step 1: Creating and Securing a Service Account

To allow our script to interact with Google Cloud services on our behalf, we need to create a Service Account. This is like a robot user with specific, limited permissions.

  1. Enable the APIs: Make sure the following APIs are enabled in your Google Cloud project:

  2. Create the Service Account:

    • In the Google Cloud Console, navigate to IAM & Admin > Service Accounts.
    • Click + CREATE SERVICE ACCOUNT.
    • Give it a name (e.g., doc-to-speech-service) and a description.
    • Click Done.
  3. Create and Download a Key:

    • Find your newly created service account in the list, click the three-dot menu under "Actions", and select Manage keys.
    • Click ADD KEY > Create new key.
    • Choose JSON and click CREATE. A JSON file will be downloaded to your computer. Treat this file like a password! Do not commit it to Git.
  4. Store the Key in Secret Manager:

    • Navigate to Security > Secret Manager in the Cloud Console.
    • Click + CREATE SECRET.
    • Give the secret a name, for example, doc-to-speech-credentials. Copy this name, as we will need it for an environment variable later.
    • Under "Secret value", upload the JSON key file you just downloaded.
    • Click Create secret.
  5. Share your Google Doc: Finally, take the client_email from the downloaded JSON key file and share your Google Doc with that email address, giving it "Viewer" access.

Step 2: Building the Google Docs Client (google_doc.py)

Now for the fun part: the code! We'll split our logic into two main classes: GoogleDocsService to handle authentication and API connections, and GoogleDoc to represent the document itself.

GoogleDocsService Class

This class is our gateway to Google's APIs. It fetches our secure credentials from Secret Manager and initializes the Docs API client.

import os
import json
import google.auth
from google.cloud import secretmanager
from googleapiclient.discovery import build
from google.oauth2 import service_account

class GoogleDocsService:
"""Handles authentication and service building for Google Docs API."""
def __init__(self, **kwargs):
self.credentials, self.project_id = google.auth.default()
self._service_account = None
self._docs = None
self.scopes = ['https://www.googleapis.com/auth/documents.readonly']

def get_secret(self, secret_id: str, version_id: str = "latest") -> dict:
"""Gets a secret from Google Cloud Secret Manager."""
client = secretmanager.SecretManagerServiceClient()
name = f"projects/{self.project_id}/secrets/{secret_id}/versions/{version_id}"
response = client.access_secret_version(request={"name": name})
data = response.payload.data.decode("UTF-8")
try:
return json.loads(data)
except json.JSONDecodeError:
return data

@property
def service_account_creds(self) -> dict:
"""Lazy-loads the service account credentials from Secret Manager."""
if not self._service_account:
secret_id = os.getenv('SERVICE_ACCOUNT_SECRET_ID')
if not secret_id:
raise ValueError('SERVICE_ACCOUNT_SECRET_ID env var is not set.')
self._service_account = self.get_secret(secret_id=secret_id)
return self._service_account

@property
def docs(self):
"""Builds and returns an authenticated Google Docs service resource."""
if not self._docs:
creds = service_account.Credentials.from_service_account_info(
self.service_account_creds, scopes=self.scopes
)
self._docs = build('docs', 'v1', credentials=creds)
return self._docs

GoogleDoc Class

This class takes our authenticated service and a document ID. Its main job is to fetch the document's content and parse the raw JSON into clean, readable text.

class GoogleDoc:
"""Represents a single Google Document and its content."""
def __init__(self, google_docs_service: GoogleDocsService, file_id_or_uri: str):
self.google_docs_service = google_docs_service
self.file_id_or_uri = file_id_or_uri
self._document = None

@property
def id(self) -> str:
"""Extracts the document ID from a full URI or returns the ID itself."""
if '/d/' in self.file_id_or_uri:
return self.file_id_or_uri.split('/d/')[1].split('/')[0]
return self.file_id_or_uri

@property
def document(self) -> dict:
"""Fetches the full document resource from the API."""
if not self._document:
self._document = self.google_docs_service.docs.documents().get(documentId=self.id).execute()
return self._document

@property
def text(self) -> str:
"""Parses the document content to extract plain text."""
doc_content = self.document.get('body', {}).get('content', [])
doc_text = ""
for value in doc_content:
if 'paragraph' in value:
elements = value.get('paragraph', {}).get('elements', [])
for elem in elements:
if 'textRun' in elem:
doc_text += elem.get('textRun', {}).get('content', '')
return doc_text

The text property is where the magic happens. It navigates the nested structure of a Google Doc's JSON representation to find and concatenate all textRun content, effectively stripping out all formatting and giving us the raw text.

Step 3: Adding Text-to-Speech

Now, let's add the final piece to our GoogleDoc class: a method to send the extracted text to the Google Cloud TTS API.

We'll use the SynthesizeLongAudio method, which is perfect for documents as it can handle large amounts of text and conveniently saves the output directly to a Google Cloud Storage bucket.

First, you'll need to create a GCS bucket in your project if you don't have one already.

Let's add the doc_tts method to our GoogleDoc class:

# Add these imports to the top of your file
from uuid import uuid4
from google.cloud import texttospeech

# Add this method inside the GoogleDoc class
def doc_tts(self, gcs_bucket_name: str, project_id: str):
"""Converts the document's text to speech and saves it to GCS."""

blob_name = f'chirp-audio/{self.id}_{uuid4().hex}.wav'
client = texttospeech.TextToSpeechLongAudioSynthesizeClient()

# The text we extracted earlier
synthesis_input = texttospeech.SynthesisInput(text=self.text)

# Voice selection: Using the new high-definition Chirp voices
voice = texttospeech.VoiceSelectionParams(
language_code="en-US",
name="en-US-Chirp3-HD-Charon" # A great, versatile Chirp voice
)

# Audio output configuration
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.LINEAR16
)

# Set the GCS output path
output_gcs_uri = f'gs://{gcs_bucket_name}/{blob_name}'

request = texttospeech.SynthesizeLongAudioRequest(
parent=f"projects/{project_id}/locations/us-central1", # or your preferred location
input=synthesis_input,
voice=voice,
audio_config=audio_config,
output_gcs_uri=output_gcs_uri
)

print("Synthesizing audio... this may take a moment.")
operation = client.synthesize_long_audio(request=request)
result = operation.result(timeout=600) # Timeout in seconds

storage_url = f'https://storage.cloud.google.com/{gcs_bucket_name}/{blob_name}'

print("Synthesis complete!")
return {
'status': 'success',
'detail': f'Audio generated successfully and stored in artifacts.',
'url': storage_url
}

A Note on Voices: We're using en-US-Chirp-HD, which is one of Google's newest and most advanced universal voices. It provides incredible clarity and naturalness. You can explore all available voices, including standard and WaveNet options, in the official Cloud TTS documentation.

Conclusion: Putting It All Together

We now have all the components. Let's create a main block to run our script.

# Add this to the bottom of your google_doc.py file

if __name__ == '__main__':
# --- CONFIGURATION ---
# Set these as environment variables for better security
os.environ['SERVICE_ACCOUNT_SECRET_ID'] = 'doc-to-speech-credentials' # The name of your secret
GCS_BUCKET = 'your-gcs-bucket-name' # Your GCS bucket name

# Get the Google Doc ID or URL from the user
doc_id_or_url = input("Enter the Google Doc URL or ID: ")

try:
# 1. Initialize the service
docs_service = GoogleDocsService()

# 2. Create the GoogleDoc object
gdoc = GoogleDoc(
google_docs_service=docs_service,
file_id_or_uri=doc_id_or_url
)

print(f"Reading text from document ID: {gdoc.id}")
# print(f"Extracted Text: {gdoc.text[:200]}...") # Uncomment to preview text

# 3. Generate the audio
result = gdoc.doc_tts(
gcs_bucket_name=GCS_BUCKET,
project_id=docs_service.project_id
)

print("\n--- Success! ---")
print(f"Listen to your document here: {result['url']}")

except Exception as e:
print(f"\nAn error occurred: {e}")

To run your script:

  1. Make sure your GCS bucket is public if you want to share the links, or that you're logged into a Google account with read access to the bucket.
  2. Set the GOOGLE_APPLICATION_CREDENTIALS environment variable if you're running this outside of a Google Cloud environment.
  3. Run the script: python google_doc.py
  4. Paste your Google Doc URL when prompted, and watch it go!

You've just built a serverless pipeline to convert written documents into high-quality audiobooks. From here, you could expand this into a web application with Flask, create a Cloud Function that triggers on a new document, or add support for more languages and voices. Happy listening