The Context Window Revolution
Gemini 1.5 Pro redefined what’s possible with large language models by offering a context window of up to 2 million tokens. This means you can pass entire codebases, hours of video, or thousands of pages of documents in a single request — fundamentally changing how we interact with AI.
Understanding Gemini 1.5 Pro’s Capabilities
| Feature | Capability |
|---|---|
| Context window | Up to 2M tokens (1M standard) |
| Input modalities | Text, image, audio, video, code |
| Output | Text, code, structured data |
| Max output tokens | 8,192 |
| Languages | 100+ languages |
| Pricing (input) | $1.25–$10.00 per 1M tokens |
| Pricing (output) | $10.00–$40.00 per 1M tokens |
Multimodal Input Handling
Gemini 1.5 Pro natively processes multiple modalities in a single request. You can combine text, images, audio, and video seamlessly:
import { GoogleGenerativeAI } from "@google/generative-ai";
const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);
const model = genAI.getGenerativeModel({ model: "gemini-1.5-pro" });
// Multimodal request with text, image, and audio
const result = await model.generateContent([
"Analyze this presentation recording and provide feedback:",
{ inlineData: { mimeType: "image/png", data: slideImageBase64 } },
{ inlineData: { mimeType: "audio/mpeg", data: narrationBase64 } },
]);
Node.js SDK Setup
The official Google AI SDK for JavaScript simplifies integration:
npm install @google/generative-ai
const { GoogleGenerativeAI } = require("@google/generative-ai");
const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);
const model = genAI.getGenerativeModel({
model: "gemini-1.5-pro",
systemInstruction: "You are an expert code reviewer.",
});
async function reviewCode(code) {
const result = await model.generateContent(code);
return result.response.text();
}
Streaming Responses
For real-time applications, streaming provides low-latency outputs:
const streamingResult = await model.generateContentStream(prompt);
for await (const chunk of streamingResult.stream) {
const chunkText = chunk.text();
process.stdout.write(chunkText);
}
System Instructions
System instructions let you define the model’s behavior globally:
const model = genAI.getGenerativeModel({
model: "gemini-1.5-pro",
systemInstruction: `
You are a senior TypeScript developer.
Always provide type-safe code examples.
Include JSDoc comments for all functions.
Prefer functional programming patterns.
`,
});
Function Calling
Function calling enables structured data extraction and external tool use:
const model = genAI.getGenerativeModel({
model: "gemini-1.5-pro",
tools: {
functionDeclarations: [{
name: "searchDocs",
parameters: {
type: "object",
properties: {
query: { type: "string" },
maxResults: { type: "number" },
},
},
}],
},
});
Use Cases
The massive context window enables novel applications:
- Full codebase analysis: Pass entire repositories for architecture review
- Video content understanding: Analyze hours of recording for meeting summaries
- Long document Q&A: Query thousands of pages of documentation or legal contracts
- Multi-file code generation: Generate entire feature implementations with context
- Audio transcription & analysis: Combine transcription with semantic understanding in one pass
Pricing Considerations
With large context windows, token usage adds up quickly. Strategies to manage costs:
- Chunking: Split large inputs strategically
- Caching: Cache responses for identical queries
- Prompt optimization: Minimize token usage in system instructions
- Batch processing: Aggregate similar requests
Conclusion
Gemini 1.5 Pro’s massive context window fundamentally expands what’s possible with LLMs in applications. The ability to process multi-million token contexts with multimodal inputs enables a new class of AI-powered tools that can reason over entire codebases, libraries, or media collections in a single pass.
