Building RAG Systems with Zapserp: Real-Time Web Data for LLMs
Retrieval-Augmented Generation (RAG) has revolutionized how we build AI applications by combining the power of Large Language Models (LLMs) with real-time, factual information retrieval. While traditional RAG systems rely on static knowledge bases, integrating Zapserp enables your LLMs to access fresh, comprehensive web data in real-time.
This comprehensive guide shows you how to build sophisticated RAG systems that leverage Zapserp's search and content extraction capabilities to feed current, relevant information to your LLMs.
Understanding RAG with Real-Time Web Data
What is RAG?
Retrieval-Augmented Generation combines two powerful AI capabilities:
- Information Retrieval: Finding relevant documents or data
- Text Generation: Using LLMs to synthesize responses based on retrieved information
Why Zapserp for RAG?
Traditional RAG systems are limited to pre-indexed knowledge bases. Zapserp enables:
- Real-time Information: Access to current web content and breaking news
- Comprehensive Coverage: Multi-engine search across the entire web
- Rich Context: Full content extraction with metadata
- Diverse Sources: News, blogs, documentation, research papers, and more
Core RAG Architecture with Zapserp
Let's build a production-ready RAG system that combines Zapserp with modern LLM APIs:
import { Zapserp, SearchEngine, Page, SearchResponse } from 'zapserp'
import OpenAI from 'openai'
import { createEmbedding, similarity } from '@/lib/embeddings'
interface RAGContext {
query: string
retrievedDocs: Array<{
content: string
source: string
url: string
relevanceScore: number
metadata?: any
}>
searchStrategy: 'web' | 'news' | 'academic' | 'mixed'
}
interface RAGResponse {
answer: string
sources: Array<{
title: string
url: string
snippet: string
relevanceScore: number
}>
confidence: number
searchQuery: string
timestamp: string
}
class ZapserpRAGSystem {
private zapserp: Zapserp
private openai: OpenAI
private maxContextLength: number
private embeddingCache: Map<string, number[]>
constructor(apiKeys: {
zapserp: string
openai: string
}) {
this.zapserp = new Zapserp({ apiKey: apiKeys.zapserp })
this.openai = new OpenAI({ apiKey: apiKeys.openai })
this.maxContextLength = 6000 // Tokens for context window
this.embeddingCache = new Map()
}
async generateRAGResponse(
userQuery: string,
options: {
searchStrategy?: 'web' | 'news' | 'academic' | 'mixed'
maxSources?: number
model?: string
temperature?: number
} = {}
): Promise<RAGResponse> {
const {
searchStrategy = 'mixed',
maxSources = 5,
model = 'gpt-4-turbo-preview',
temperature = 0.1
} = options
try {
// Step 1: Retrieve relevant documents
const context = await this.retrieveRelevantDocs(userQuery, {
strategy: searchStrategy,
maxSources
})
// Step 2: Generate response with retrieved context
const ragResponse = await this.generateResponse(userQuery, context, {
model,
temperature
})
return ragResponse
} catch (error) {
console.error('RAG generation failed:', error)
throw new Error('Failed to generate RAG response')
}
}
private async retrieveRelevantDocs(
query: string,
options: {
strategy: string
maxSources: number
}
): Promise<RAGContext> {
const searchQueries = this.generateSearchQueries(query, options.strategy)
const allDocs: any[] = []
// Execute multiple search strategies
for (const searchQuery of searchQueries) {
try {
const searchResults = await this.performSearch(searchQuery, options.strategy)
const extractedContent = await this.extractAndRankContent(
searchResults,
query,
options.maxSources
)
allDocs.push(...extractedContent)
} catch (error) {
console.error(`Search failed for query: ${searchQuery}`, error)
}
}
// Deduplicate and rank documents
const rankedDocs = await this.rankDocumentsByRelevance(allDocs, query)
return {
query,
retrievedDocs: rankedDocs.slice(0, options.maxSources),
searchStrategy: options.strategy as any
}
}
private generateSearchQueries(query: string, strategy: string): string[] {
const baseQuery = query.trim()
switch (strategy) {
case 'news':
return [
`${baseQuery} news latest`,
`${baseQuery} breaking news`,
`${baseQuery} today news`
]
case 'academic':
return [
`${baseQuery} research paper`,
`${baseQuery} study academic`,
`${baseQuery} scientific publication`
]
case 'web':
return [
baseQuery,
`${baseQuery} guide tutorial`,
`${baseQuery} documentation`
]
case 'mixed':
default:
return [
baseQuery,
`${baseQuery} latest`,
`${baseQuery} 2024`,
`${baseQuery} guide`
]
}
}
private async performSearch(query: string, strategy: string): Promise<SearchResponse> {
const searchConfig: any = {
query,
limit: 15,
language: 'en',
country: 'us'
}
// Adjust search engines based on strategy
switch (strategy) {
case 'news':
searchConfig.engines = [SearchEngine.GOOGLE, SearchEngine.BING]
break
case 'academic':
searchConfig.engines = [SearchEngine.GOOGLE]
break
default:
searchConfig.engines = [SearchEngine.GOOGLE, SearchEngine.BING]
}
return await this.zapserp.search(searchConfig)
}
private async extractAndRankContent(
searchResults: SearchResponse,
originalQuery: string,
maxResults: number
): Promise<any[]> {
// Filter for high-quality sources
const qualityUrls = this.filterQualityUrls(
searchResults.results.map(r => r.url)
)
if (qualityUrls.length === 0) return []
// Extract content from top URLs
const contentResults = await this.zapserp.readerBatch({
urls: qualityUrls.slice(0, Math.min(maxResults * 2, 10))
})
// Process and structure the extracted content
const processedDocs = contentResults.results
.filter(page => page && page.content && page.content.length > 200)
.map(page => ({
content: this.cleanAndTruncateContent(page.content, 1000),
source: page.title || 'Unknown',
url: page.url,
relevanceScore: 0, // Will be calculated later
metadata: {
author: page.metadata?.author,
publishedTime: page.metadata?.publishedTime,
description: page.metadata?.description,
contentLength: page.contentLength
}
}))
return processedDocs
}
private filterQualityUrls(urls: string[]): string[] {
// Filter for reputable domains and exclude low-quality sources
const qualityDomains = [
'wikipedia.org', 'stackoverflow.com', 'github.com', 'medium.com',
'techcrunch.com', 'wired.com', 'arstechnica.com', 'reuters.com',
'bbc.com', 'cnn.com', 'nytimes.com', 'wsj.com', 'bloomberg.com',
'nature.com', 'science.org', 'arxiv.org', 'acm.org', 'ieee.org'
]
const excludeDomains = [
'pinterest.com', 'youtube.com', 'facebook.com', 'twitter.com',
'instagram.com', 'tiktok.com'
]
return urls.filter(url => {
const domain = this.extractDomain(url)
// Include if from quality domain
if (qualityDomains.some(qd => domain.includes(qd))) {
return true
}
// Exclude if from excluded domain
if (excludeDomains.some(ed => domain.includes(ed))) {
return false
}
// Include other domains by default
return true
})
}
private async rankDocumentsByRelevance(
docs: any[],
query: string
): Promise<any[]> {
// Generate embedding for the query
const queryEmbedding = await this.getEmbedding(query)
// Calculate relevance scores using semantic similarity
const docsWithScores = await Promise.all(
docs.map(async (doc) => {
try {
const docEmbedding = await this.getEmbedding(doc.content.substring(0, 500))
const relevanceScore = this.calculateCosineSimilarity(queryEmbedding, docEmbedding)
return {
...doc,
relevanceScore
}
} catch (error) {
console.error('Error calculating relevance:', error)
return {
...doc,
relevanceScore: 0
}
}
})
)
// Sort by relevance score (descending)
return docsWithScores
.sort((a, b) => b.relevanceScore - a.relevanceScore)
.filter(doc => doc.relevanceScore > 0.1) // Filter out very low relevance
}
private async getEmbedding(text: string): Promise<number[]> {
// Check cache first
const cacheKey = text.substring(0, 100)
if (this.embeddingCache.has(cacheKey)) {
return this.embeddingCache.get(cacheKey)!
}
try {
const response = await this.openai.embeddings.create({
model: 'text-embedding-ada-002',
input: text.substring(0, 8000) // Limit input length
})
const embedding = response.data[0].embedding
// Cache the embedding
this.embeddingCache.set(cacheKey, embedding)
return embedding
} catch (error) {
console.error('Error generating embedding:', error)
return new Array(1536).fill(0) // Return zero vector as fallback
}
}
private calculateCosineSimilarity(vecA: number[], vecB: number[]): number {
if (vecA.length !== vecB.length) return 0
let dotProduct = 0
let normA = 0
let normB = 0
for (let i = 0; i < vecA.length; i++) {
dotProduct += vecA[i] * vecB[i]
normA += vecA[i] * vecA[i]
normB += vecB[i] * vecB[i]
}
const magnitude = Math.sqrt(normA) * Math.sqrt(normB)
return magnitude === 0 ? 0 : dotProduct / magnitude
}
private async generateResponse(
query: string,
context: RAGContext,
options: {
model: string
temperature: number
}
): Promise<RAGResponse> {
if (context.retrievedDocs.length === 0) {
throw new Error('No relevant documents found for the query')
}
// Build context for the LLM
const contextText = this.buildContextText(context.retrievedDocs)
const systemPrompt = `You are a helpful AI assistant that provides accurate, well-sourced answers based on the provided web content.
Instructions:
1. Use ONLY the information provided in the context to answer questions
2. If the context doesn't contain enough information, say so clearly
3. Cite specific sources when making claims
4. Provide a confidence score (1-10) for your answer
5. Be concise but comprehensive
6. If information conflicts between sources, acknowledge this
Context from web sources:
${contextText}`
const userPrompt = `Based on the provided web content, please answer this question: ${query}
Please structure your response as:
1. Direct answer
2. Supporting evidence from sources
3. Confidence score (1-10)
4. Any limitations or caveats`
try {
const completion = await this.openai.chat.completions.create({
model: options.model,
messages: [
{ role: 'system', content: systemPrompt },
{ role: 'user', content: userPrompt }
],
temperature: options.temperature,
max_tokens: 1500
})
const answer = completion.choices[0]?.message?.content || 'No response generated'
// Extract confidence score from response
const confidence = this.extractConfidenceScore(answer)
return {
answer: answer,
sources: context.retrievedDocs.map(doc => ({
title: doc.source,
url: doc.url,
snippet: doc.content.substring(0, 200) + '...',
relevanceScore: doc.relevanceScore
})),
confidence,
searchQuery: context.query,
timestamp: new Date().toISOString()
}
} catch (error) {
console.error('Error generating LLM response:', error)
throw new Error('Failed to generate response from LLM')
}
}
private buildContextText(docs: any[]): string {
return docs
.map((doc, index) => {
const metadata = doc.metadata || {}
const metaInfo = [
metadata.author && `Author: ${metadata.author}`,
metadata.publishedTime && `Published: ${metadata.publishedTime}`,
`Relevance: ${(doc.relevanceScore * 100).toFixed(1)}%`
].filter(Boolean).join(', ')
return `[Source ${index + 1}: ${doc.source}]
URL: ${doc.url}
${metaInfo && `Metadata: ${metaInfo}`}
Content: ${doc.content}
---`
})
.join('\n\n')
}
private extractConfidenceScore(response: string): number {
// Look for confidence score patterns in the response
const patterns = [
/confidence:?\s*(\d+)/i,
/confidence score:?\s*(\d+)/i,
/(\d+)\/10/,
/score:?\s*(\d+)/i
]
for (const pattern of patterns) {
const match = response.match(pattern)
if (match) {
const score = parseInt(match[1])
return Math.min(Math.max(score, 1), 10) // Clamp between 1-10
}
}
return 7 // Default confidence score
}
private cleanAndTruncateContent(content: string, maxLength: number): string {
// Remove extra whitespace and clean up content
const cleaned = content
.replace(/\s+/g, ' ')
.replace(/\n+/g, '\n')
.trim()
if (cleaned.length <= maxLength) return cleaned
// Truncate at word boundary
const truncated = cleaned.substring(0, maxLength)
const lastSpace = truncated.lastIndexOf(' ')
return lastSpace > maxLength * 0.8
? truncated.substring(0, lastSpace) + '...'
: truncated + '...'
}
private extractDomain(url: string): string {
try {
return new URL(url).hostname.replace('www.', '')
} catch {
return 'unknown'
}
}
}
// Usage Example
const ragSystem = new ZapserpRAGSystem({
zapserp: 'YOUR_ZAPSERP_API_KEY',
openai: 'YOUR_OPENAI_API_KEY'
})
// Example 1: General web search RAG
const webResponse = await ragSystem.generateRAGResponse(
"What are the latest developments in artificial intelligence in 2024?",
{
searchStrategy: 'mixed',
maxSources: 5,
model: 'gpt-4-turbo-preview'
}
)
console.log('Web RAG Response:', webResponse)
// Example 2: News-focused RAG
const newsResponse = await ragSystem.generateRAGResponse(
"What happened in the stock market today?",
{
searchStrategy: 'news',
maxSources: 3,
temperature: 0.1
}
)
console.log('News RAG Response:', newsResponse)
// Example 3: Academic/research RAG
const academicResponse = await ragSystem.generateRAGResponse(
"What are the recent breakthroughs in quantum computing?",
{
searchStrategy: 'academic',
maxSources: 4,
model: 'gpt-4-turbo-preview'
}
)
console.log('Academic RAG Response:', academicResponse)
Advanced RAG Patterns
Multi-Turn Conversation RAG
For chatbot applications that maintain conversation context:
interface ConversationContext {
messages: Array<{
role: 'user' | 'assistant'
content: string
timestamp: string
sources?: any[]
}>
currentQuery: string
retrievedContext: RAGContext[]
}
class ConversationalRAG extends ZapserpRAGSystem {
private conversations: Map<string, ConversationContext> = new Map()
async continueConversation(
conversationId: string,
userMessage: string,
options: {
searchStrategy?: string
refreshContext?: boolean
} = {}
): Promise<RAGResponse> {
// Get or create conversation context
let context = this.conversations.get(conversationId) || {
messages: [],
currentQuery: '',
retrievedContext: []
}
// Add user message to context
context.messages.push({
role: 'user',
content: userMessage,
timestamp: new Date().toISOString()
})
// Determine if we need fresh web data
const needsWebSearch = this.shouldPerformWebSearch(userMessage, context)
let ragResponse: RAGResponse
if (needsWebSearch || options.refreshContext) {
// Generate enhanced query from conversation context
const enhancedQuery = this.generateEnhancedQuery(userMessage, context)
// Perform RAG with web search
ragResponse = await this.generateRAGResponse(enhancedQuery, {
searchStrategy: options.searchStrategy as any,
maxSources: 5
})
// Update retrieved context
context.retrievedContext.push({
query: enhancedQuery,
retrievedDocs: ragResponse.sources.map(s => ({
content: s.snippet,
source: s.title,
url: s.url,
relevanceScore: s.relevanceScore
})),
searchStrategy: options.searchStrategy as any
})
} else {
// Use existing context for response
ragResponse = await this.generateFromExistingContext(userMessage, context)
}
// Add assistant response to context
context.messages.push({
role: 'assistant',
content: ragResponse.answer,
timestamp: ragResponse.timestamp,
sources: ragResponse.sources
})
// Update conversation
this.conversations.set(conversationId, context)
return ragResponse
}
private shouldPerformWebSearch(message: string, context: ConversationContext): boolean {
// Keywords that indicate need for fresh information
const freshInfoKeywords = [
'latest', 'recent', 'current', 'today', 'now', 'breaking',
'new', 'update', 'what happened', 'current status'
]
const messageLower = message.toLowerCase()
// Check if message contains fresh info keywords
if (freshInfoKeywords.some(keyword => messageLower.includes(keyword))) {
return true
}
// Check if context is getting stale (older than 10 minutes)
const lastSearch = context.retrievedContext[context.retrievedContext.length - 1]
if (lastSearch) {
// This would need proper timestamp parsing
return true // Simplified for example
}
return false
}
private generateEnhancedQuery(userMessage: string, context: ConversationContext): string {
// Combine current message with recent conversation context
const recentMessages = context.messages.slice(-3) // Last 3 messages
const conversationContext = recentMessages
.map(m => m.content)
.join(' ')
// Create enhanced query that includes context
return `${conversationContext} ${userMessage}`.trim()
}
private async generateFromExistingContext(
message: string,
context: ConversationContext
): Promise<RAGResponse> {
// Use existing retrieved context to answer
const combinedContext = context.retrievedContext.flatMap(rc => rc.retrievedDocs)
// Generate response using existing context
// This would use the same LLM generation logic but with existing context
return await this.generateResponse(message, {
query: message,
retrievedDocs: combinedContext,
searchStrategy: 'mixed'
}, {
model: 'gpt-4-turbo-preview',
temperature: 0.1
})
}
}
// Usage Example
const conversationalRAG = new ConversationalRAG({
zapserp: 'YOUR_ZAPSERP_API_KEY',
openai: 'YOUR_OPENAI_API_KEY'
})
// Start conversation
const response1 = await conversationalRAG.continueConversation(
'conv-123',
"What's happening with Tesla stock?"
)
// Continue conversation - will use fresh web data
const response2 = await conversationalRAG.continueConversation(
'conv-123',
"What about their latest earnings report?"
)
// Follow-up question - might use existing context
const response3 = await conversationalRAG.continueConversation(
'conv-123',
"How does this compare to their previous quarter?"
)
Specialized RAG Applications
Real-Time Market Intelligence RAG
For financial applications requiring up-to-the-minute information:
class MarketIntelligenceRAG extends ZapserpRAGSystem {
async getMarketInsights(
query: string,
options: {
includePrice?: boolean
timeframe?: '1h' | '1d' | '1w'
sources?: 'news' | 'analysis' | 'both'
} = {}
): Promise<RAGResponse & {
marketData?: any
riskFactors?: string[]
sentiment?: 'bullish' | 'bearish' | 'neutral'
}> {
// Enhanced search for financial content
const enhancedQuery = this.buildFinancialQuery(query, options.timeframe)
const response = await this.generateRAGResponse(enhancedQuery, {
searchStrategy: 'news',
maxSources: 7,
model: 'gpt-4-turbo-preview'
})
// Extract financial insights
const sentiment = this.analyzeSentiment(response.answer)
const riskFactors = this.extractRiskFactors(response.answer)
return {
...response,
sentiment,
riskFactors
}
}
private buildFinancialQuery(query: string, timeframe?: string): string {
const timeMap = {
'1h': 'past hour',
'1d': 'today',
'1w': 'this week'
}
const timeFilter = timeframe ? timeMap[timeframe] : 'latest'
return `${query} stock market ${timeFilter} news analysis`
}
private analyzeSentiment(content: string): 'bullish' | 'bearish' | 'neutral' {
const bullishWords = ['up', 'rise', 'gain', 'positive', 'growth', 'bull', 'optimistic']
const bearishWords = ['down', 'fall', 'loss', 'negative', 'decline', 'bear', 'pessimistic']
const text = content.toLowerCase()
const bullishCount = bullishWords.filter(word => text.includes(word)).length
const bearishCount = bearishWords.filter(word => text.includes(word)).length
if (bullishCount > bearishCount + 1) return 'bullish'
if (bearishCount > bullishCount + 1) return 'bearish'
return 'neutral'
}
private extractRiskFactors(content: string): string[] {
const riskKeywords = [
'volatility', 'uncertainty', 'risk', 'concern', 'challenge',
'threat', 'downside', 'warning', 'caution', 'pressure'
]
const sentences = content.split('.').filter(s => s.length > 20)
const riskFactors: string[] = []
for (const sentence of sentences) {
const lowerSentence = sentence.toLowerCase()
if (riskKeywords.some(keyword => lowerSentence.includes(keyword))) {
riskFactors.push(sentence.trim())
}
}
return riskFactors.slice(0, 5) // Top 5 risk factors
}
}
Production Deployment Considerations
Caching and Performance Optimization
class ProductionRAGSystem extends ZapserpRAGSystem {
private responseCache: Map<string, { response: RAGResponse, timestamp: number }> = new Map()
private embeddingCache: Map<string, number[]> = new Map()
private cacheTTL: number = 5 * 60 * 1000 // 5 minutes
async generateRAGResponseWithCaching(
query: string,
options: any = {}
): Promise<RAGResponse> {
// Generate cache key
const cacheKey = `${query}-${JSON.stringify(options)}`
// Check cache first
const cached = this.responseCache.get(cacheKey)
if (cached && Date.now() - cached.timestamp < this.cacheTTL) {
return {
...cached.response,
timestamp: new Date().toISOString() // Update timestamp
}
}
// Generate fresh response
const response = await this.generateRAGResponse(query, options)
// Cache the response
this.responseCache.set(cacheKey, {
response,
timestamp: Date.now()
})
return response
}
// Implement cache cleanup
private startCacheCleanup() {
setInterval(() => {
const now = Date.now()
// Clean response cache
for (const [key, value] of this.responseCache.entries()) {
if (now - value.timestamp > this.cacheTTL) {
this.responseCache.delete(key)
}
}
// Clean embedding cache (keep more entries, longer TTL)
if (this.embeddingCache.size > 1000) {
const keys = Array.from(this.embeddingCache.keys())
const keysToDelete = keys.slice(0, 200) // Remove oldest 200
keysToDelete.forEach(key => this.embeddingCache.delete(key))
}
}, 60000) // Every minute
}
}
Key Benefits of Zapserp RAG
- Real-Time Information: Access to current web content and breaking news
- Comprehensive Sources: Multi-engine search across diverse content types
- Quality Content: Rich extraction with metadata for better context
- Scalable Architecture: Handle high-volume requests efficiently
- Flexible Integration: Works with any LLM provider (OpenAI, Anthropic, etc.)
Best Practices
- Query Enhancement: Improve search queries based on conversation context
- Source Quality: Filter for reputable domains and fresh content
- Relevance Scoring: Use embeddings for semantic similarity matching
- Caching Strategy: Cache responses and embeddings for performance
- Error Handling: Graceful fallbacks when searches fail
- Rate Limiting: Respect API limits and implement proper throttling
Next Steps
- Implement vector databases for better embedding storage and retrieval
- Add support for multi-modal content (images, PDFs)
- Build specialized RAG systems for specific domains
- Integrate with popular LLM frameworks like LangChain or LlamaIndex
Ready to build your own RAG system? Contact our team for implementation guidance and advanced integration patterns.