Featured image of post Building Apps using Gemini 1.5 Pro's Massive Context Length Featured image of post Building Apps using Gemini 1.5 Pro's Massive Context Length

Building Apps using Gemini 1.5 Pro's Massive Context Length

Harnessing the power of Gemini 1.5 Pro's extensive multi-million token input parameters in Node web wrappers.

The Context Window Revolution

Gemini 1.5 Pro redefined what’s possible with large language models by offering a context window of up to 2 million tokens. This means you can pass entire codebases, hours of video, or thousands of pages of documents in a single request — fundamentally changing how we interact with AI.

Understanding Gemini 1.5 Pro’s Capabilities

FeatureCapability
Context windowUp to 2M tokens (1M standard)
Input modalitiesText, image, audio, video, code
OutputText, code, structured data
Max output tokens8,192
Languages100+ languages
Pricing (input)$1.25–$10.00 per 1M tokens
Pricing (output)$10.00–$40.00 per 1M tokens

Multimodal Input Handling

Gemini 1.5 Pro natively processes multiple modalities in a single request. You can combine text, images, audio, and video seamlessly:

import { GoogleGenerativeAI } from "@google/generative-ai";

const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);
const model = genAI.getGenerativeModel({ model: "gemini-1.5-pro" });

// Multimodal request with text, image, and audio
const result = await model.generateContent([
  "Analyze this presentation recording and provide feedback:",
  { inlineData: { mimeType: "image/png", data: slideImageBase64 } },
  { inlineData: { mimeType: "audio/mpeg", data: narrationBase64 } },
]);

Node.js SDK Setup

The official Google AI SDK for JavaScript simplifies integration:

npm install @google/generative-ai
const { GoogleGenerativeAI } = require("@google/generative-ai");

const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);
const model = genAI.getGenerativeModel({
  model: "gemini-1.5-pro",
  systemInstruction: "You are an expert code reviewer.",
});

async function reviewCode(code) {
  const result = await model.generateContent(code);
  return result.response.text();
}

Streaming Responses

For real-time applications, streaming provides low-latency outputs:

const streamingResult = await model.generateContentStream(prompt);

for await (const chunk of streamingResult.stream) {
  const chunkText = chunk.text();
  process.stdout.write(chunkText);
}

System Instructions

System instructions let you define the model’s behavior globally:

const model = genAI.getGenerativeModel({
  model: "gemini-1.5-pro",
  systemInstruction: `
    You are a senior TypeScript developer.
    Always provide type-safe code examples.
    Include JSDoc comments for all functions.
    Prefer functional programming patterns.
  `,
});

Function Calling

Function calling enables structured data extraction and external tool use:

const model = genAI.getGenerativeModel({
  model: "gemini-1.5-pro",
  tools: {
    functionDeclarations: [{
      name: "searchDocs",
      parameters: {
        type: "object",
        properties: {
          query: { type: "string" },
          maxResults: { type: "number" },
        },
      },
    }],
  },
});

Use Cases

The massive context window enables novel applications:

  1. Full codebase analysis: Pass entire repositories for architecture review
  2. Video content understanding: Analyze hours of recording for meeting summaries
  3. Long document Q&A: Query thousands of pages of documentation or legal contracts
  4. Multi-file code generation: Generate entire feature implementations with context
  5. Audio transcription & analysis: Combine transcription with semantic understanding in one pass

Pricing Considerations

With large context windows, token usage adds up quickly. Strategies to manage costs:

  • Chunking: Split large inputs strategically
  • Caching: Cache responses for identical queries
  • Prompt optimization: Minimize token usage in system instructions
  • Batch processing: Aggregate similar requests

Conclusion

Gemini 1.5 Pro’s massive context window fundamentally expands what’s possible with LLMs in applications. The ability to process multi-million token contexts with multimodal inputs enables a new class of AI-powered tools that can reason over entire codebases, libraries, or media collections in a single pass.