Building a News Aggregator in 10 Minutes with Zapserp
Need to build a news aggregator fast? This tutorial will show you how to create a fully functional news aggregator in just 10 minutes using Zapserp. By the end, you'll have a working application that pulls real-time news from multiple sources.
What We're Building
A simple but powerful news aggregator that:
- Fetches news from multiple search engines
- Categorizes news by topic
- Displays articles with summaries and sources
- Updates automatically with fresh content
- Filters for quality news sources
Setup (2 minutes)
First, let's set up our project:
mkdir news-aggregator
cd news-aggregator
npm init -y
npm install zapserp express cors dotenv
Create your .env
file:
ZAPSERP_API_KEY=your_api_key_here
PORT=3000
Core News Aggregator (5 minutes)
Create news-aggregator.js
:
require('dotenv').config()
const { Zapserp } = require('zapserp')
class NewsAggregator {
constructor() {
this.zapserp = new Zapserp({ apiKey: process.env.ZAPSERP_API_KEY })
this.categories = {
tech: ['technology news', 'AI news', 'startup news'],
business: ['business news', 'finance news', 'market news'],
world: ['world news', 'international news', 'breaking news'],
science: ['science news', 'research news', 'space news']
}
}
async getNewsByCategory(category, limit = 5) {
const queries = this.categories[category] || [category]
const allArticles = []
for (const query of queries) {
try {
const searchResults = await this.zapserp.search({
query: `${query} today`,
engines: ['google', 'bing'],
limit: 8,
language: 'en',
country: 'us'
})
// Filter for news sources
const newsArticles = this.filterNewsUrls(searchResults.results)
if (newsArticles.length > 0) {
// Get article content
const urls = newsArticles.slice(0, 3).map(article => article.url)
const contentResults = await this.zapserp.readerBatch({ urls })
// Process articles
contentResults.results.forEach((content, index) => {
if (content && content.content) {
allArticles.push({
title: content.title,
summary: this.generateSummary(content.content),
content: content.content,
url: content.url,
source: this.extractSource(content.url),
publishedTime: content.metadata?.publishedTime,
category: category,
originalQuery: query
})
}
})
}
} catch (error) {
console.error(`Failed to fetch news for query: ${query}`, error)
}
}
// Remove duplicates and sort by relevance
const uniqueArticles = this.removeDuplicates(allArticles)
return uniqueArticles.slice(0, limit)
}
filterNewsUrls(results) {
const newsSources = [
'reuters.com', 'bbc.com', 'cnn.com', 'ap.org',
'bloomberg.com', 'wsj.com', 'nytimes.com', 'washingtonpost.com',
'techcrunch.com', 'wired.com', 'arstechnica.com', 'theverge.com'
]
return results.filter(result =>
newsSources.some(source => result.url.includes(source))
)
}
generateSummary(content) {
// Extract first meaningful paragraph
const paragraphs = content.split('\n').filter(p => p.trim().length > 50)
const summary = paragraphs[0] || content
return summary.length > 200
? summary.substring(0, 200) + '...'
: summary
}
extractSource(url) {
try {
const hostname = new URL(url).hostname
return hostname.replace('www.', '').replace('.com', '')
} catch {
return 'Unknown'
}
}
removeDuplicates(articles) {
const seen = new Set()
return articles.filter(article => {
const key = article.title.toLowerCase()
if (seen.has(key)) return false
seen.add(key)
return true
})
}
async getAllNews() {
const categories = Object.keys(this.categories)
const newsPromises = categories.map(category =>
this.getNewsByCategory(category, 4)
)
const results = await Promise.all(newsPromises)
const categorizedNews = {}
categories.forEach((category, index) => {
categorizedNews[category] = results[index]
})
return categorizedNews
}
async getBreakingNews() {
return this.getNewsByCategory('breaking news', 10)
}
}
module.exports = NewsAggregator
Express Server (2 minutes)
Create server.js
:
const express = require('express')
const cors = require('cors')
const NewsAggregator = require('./news-aggregator')
const app = express()
const newsAggregator = new NewsAggregator()
app.use(cors())
app.use(express.json())
// Get all categorized news
app.get('/api/news', async (req, res) => {
try {
const news = await newsAggregator.getAllNews()
res.json({
success: true,
data: news,
timestamp: new Date().toISOString()
})
} catch (error) {
res.status(500).json({
success: false,
error: 'Failed to fetch news'
})
}
})
// Get news by specific category
app.get('/api/news/:category', async (req, res) => {
try {
const { category } = req.params
const limit = parseInt(req.query.limit) || 5
const news = await newsAggregator.getNewsByCategory(category, limit)
res.json({
success: true,
category,
data: news,
timestamp: new Date().toISOString()
})
} catch (error) {
res.status(500).json({
success: false,
error: `Failed to fetch ${category} news`
})
}
})
// Get breaking news
app.get('/api/breaking', async (req, res) => {
try {
const news = await newsAggregator.getBreakingNews()
res.json({
success: true,
data: news,
timestamp: new Date().toISOString()
})
} catch (error) {
res.status(500).json({
success: false,
error: 'Failed to fetch breaking news'
})
}
})
// Simple HTML interface
app.get('/', (req, res) => {
res.send(`
<!DOCTYPE html>
<html>
<head>
<title>News Aggregator</title>
<style>
body { font-family: Arial, sans-serif; margin: 40px; }
.category { margin-bottom: 30px; }
.article { border: 1px solid #ddd; padding: 15px; margin: 10px 0; }
.source { color: #666; font-size: 12px; }
.summary { margin: 10px 0; }
h1 { color: #333; }
h2 { color: #666; }
a { text-decoration: none; color: #0066cc; }
a:hover { text-decoration: underline; }
</style>
</head>
<body>
<h1>📰 News Aggregator</h1>
<div id="news-container">Loading news...</div>
<script>
async function loadNews() {
try {
const response = await fetch('/api/news')
const result = await response.json()
if (result.success) {
displayNews(result.data)
}
} catch (error) {
document.getElementById('news-container').innerHTML =
'<p>Error loading news. Please try again later.</p>'
}
}
function displayNews(newsData) {
const container = document.getElementById('news-container')
let html = ''
Object.entries(newsData).forEach(([category, articles]) => {
html += \`<div class="category">
<h2>\${category.toUpperCase()}</h2>\`
articles.forEach(article => {
html += \`<div class="article">
<h3><a href="\${article.url}" target="_blank">\${article.title}</a></h3>
<div class="source">Source: \${article.source} | \${article.publishedTime || 'Recently'}</div>
<div class="summary">\${article.summary}</div>
</div>\`
})
html += '</div>'
})
container.innerHTML = html
}
// Load news on page load
loadNews()
// Refresh every 5 minutes
setInterval(loadNews, 5 * 60 * 1000)
</script>
</body>
</html>
`)
})
const PORT = process.env.PORT || 3000
app.listen(PORT, () => {
console.log(`📰 News Aggregator running on port ${PORT}`)
console.log(`Visit http://localhost:${PORT} to view news`)
})
Run Your Aggregator (1 minute)
Start your news aggregator:
node server.js
Visit http://localhost:3000
to see your news aggregator in action!
API Endpoints
Your aggregator provides several useful endpoints:
GET /api/news
- All categorized newsGET /api/news/tech
- Technology news onlyGET /api/news/business
- Business news onlyGET /api/breaking
- Breaking newsGET /api/news/:category?limit=10
- Custom limit per category
Quick Enhancements
Want to make it even better? Add these features:
Auto-refresh:
// Add to your frontend
setInterval(() => {
loadNews()
}, 2 * 60 * 1000) // Refresh every 2 minutes
Search functionality:
// Add to NewsAggregator class
async searchNews(query, limit = 10) {
const results = await this.zapserp.search({
query: `${query} news`,
engines: ['google', 'bing'],
limit: limit * 2
})
return this.filterNewsUrls(results.results).slice(0, limit)
}
Caching for performance:
const NodeCache = require('node-cache')
const cache = new NodeCache({ stdTTL: 300 }) // 5 minutes
// Wrap your methods with caching
async getCachedNews(category) {
const cacheKey = `news_${category}`
let news = cache.get(cacheKey)
if (!news) {
news = await this.getNewsByCategory(category)
cache.set(cacheKey, news)
}
return news
}
Deployment Tips
Deploy to Heroku:
- Add
"start": "node server.js"
to package.json - Set your
ZAPSERP_API_KEY
in Heroku config vars - Deploy with
git push heroku main
Deploy to Vercel:
- Add
vercel.json
with Node.js configuration - Set environment variables in Vercel dashboard
- Deploy with
vercel --prod
Conclusion
In just 10 minutes, you've built a functional news aggregator that:
- Pulls real-time news from multiple sources
- Categorizes content automatically
- Provides a clean web interface
- Offers a RESTful API for integration
The aggregator is ready for production use and can be easily extended with additional features like user preferences, email notifications, or mobile apps.
Want to add more advanced features? Check out our Advanced Data Extraction Techniques guide for more sophisticated processing patterns.