Back to Blog
January 26, 2024
Tutorial Team
4 min read
Quick Start

Building a News Aggregator in 10 Minutes with Zapserp

Quick tutorial to create a fully functional news aggregator using Zapserp. Get real-time news from multiple sources with just a few lines of code.

tutorialnews-aggregatorquick-buildreal-timejavascript

Building a News Aggregator in 10 Minutes with Zapserp

Need to build a news aggregator fast? This tutorial will show you how to create a fully functional news aggregator in just 10 minutes using Zapserp. By the end, you'll have a working application that pulls real-time news from multiple sources.

What We're Building

A simple but powerful news aggregator that:

  • Fetches news from multiple search engines
  • Categorizes news by topic
  • Displays articles with summaries and sources
  • Updates automatically with fresh content
  • Filters for quality news sources

Setup (2 minutes)

First, let's set up our project:

mkdir news-aggregator
cd news-aggregator
npm init -y
npm install zapserp express cors dotenv

Create your .env file:

ZAPSERP_API_KEY=your_api_key_here
PORT=3000

Core News Aggregator (5 minutes)

Create news-aggregator.js:

require('dotenv').config()
const { Zapserp } = require('zapserp')

class NewsAggregator {
  constructor() {
    this.zapserp = new Zapserp({ apiKey: process.env.ZAPSERP_API_KEY })
    this.categories = {
      tech: ['technology news', 'AI news', 'startup news'],
      business: ['business news', 'finance news', 'market news'],
      world: ['world news', 'international news', 'breaking news'],
      science: ['science news', 'research news', 'space news']
    }
  }

  async getNewsByCategory(category, limit = 5) {
    const queries = this.categories[category] || [category]
    const allArticles = []

    for (const query of queries) {
      try {
        const searchResults = await this.zapserp.search({
          query: `${query} today`,
          engines: ['google', 'bing'],
          limit: 8,
          language: 'en',
          country: 'us'
        })

        // Filter for news sources
        const newsArticles = this.filterNewsUrls(searchResults.results)
        
        if (newsArticles.length > 0) {
          // Get article content
          const urls = newsArticles.slice(0, 3).map(article => article.url)
          const contentResults = await this.zapserp.readerBatch({ urls })

          // Process articles
          contentResults.results.forEach((content, index) => {
            if (content && content.content) {
              allArticles.push({
                title: content.title,
                summary: this.generateSummary(content.content),
                content: content.content,
                url: content.url,
                source: this.extractSource(content.url),
                publishedTime: content.metadata?.publishedTime,
                category: category,
                originalQuery: query
              })
            }
          })
        }
      } catch (error) {
        console.error(`Failed to fetch news for query: ${query}`, error)
      }
    }

    // Remove duplicates and sort by relevance
    const uniqueArticles = this.removeDuplicates(allArticles)
    return uniqueArticles.slice(0, limit)
  }

  filterNewsUrls(results) {
    const newsSources = [
      'reuters.com', 'bbc.com', 'cnn.com', 'ap.org',
      'bloomberg.com', 'wsj.com', 'nytimes.com', 'washingtonpost.com',
      'techcrunch.com', 'wired.com', 'arstechnica.com', 'theverge.com'
    ]

    return results.filter(result => 
      newsSources.some(source => result.url.includes(source))
    )
  }

  generateSummary(content) {
    // Extract first meaningful paragraph
    const paragraphs = content.split('\n').filter(p => p.trim().length > 50)
    const summary = paragraphs[0] || content
    
    return summary.length > 200 
      ? summary.substring(0, 200) + '...'
      : summary
  }

  extractSource(url) {
    try {
      const hostname = new URL(url).hostname
      return hostname.replace('www.', '').replace('.com', '')
    } catch {
      return 'Unknown'
    }
  }

  removeDuplicates(articles) {
    const seen = new Set()
    return articles.filter(article => {
      const key = article.title.toLowerCase()
      if (seen.has(key)) return false
      seen.add(key)
      return true
    })
  }

  async getAllNews() {
    const categories = Object.keys(this.categories)
    const newsPromises = categories.map(category => 
      this.getNewsByCategory(category, 4)
    )

    const results = await Promise.all(newsPromises)
    const categorizedNews = {}

    categories.forEach((category, index) => {
      categorizedNews[category] = results[index]
    })

    return categorizedNews
  }

  async getBreakingNews() {
    return this.getNewsByCategory('breaking news', 10)
  }
}

module.exports = NewsAggregator

Express Server (2 minutes)

Create server.js:

const express = require('express')
const cors = require('cors')
const NewsAggregator = require('./news-aggregator')

const app = express()
const newsAggregator = new NewsAggregator()

app.use(cors())
app.use(express.json())

// Get all categorized news
app.get('/api/news', async (req, res) => {
  try {
    const news = await newsAggregator.getAllNews()
    res.json({
      success: true,
      data: news,
      timestamp: new Date().toISOString()
    })
  } catch (error) {
    res.status(500).json({
      success: false,
      error: 'Failed to fetch news'
    })
  }
})

// Get news by specific category
app.get('/api/news/:category', async (req, res) => {
  try {
    const { category } = req.params
    const limit = parseInt(req.query.limit) || 5
    
    const news = await newsAggregator.getNewsByCategory(category, limit)
    res.json({
      success: true,
      category,
      data: news,
      timestamp: new Date().toISOString()
    })
  } catch (error) {
    res.status(500).json({
      success: false,
      error: `Failed to fetch ${category} news`
    })
  }
})

// Get breaking news
app.get('/api/breaking', async (req, res) => {
  try {
    const news = await newsAggregator.getBreakingNews()
    res.json({
      success: true,
      data: news,
      timestamp: new Date().toISOString()
    })
  } catch (error) {
    res.status(500).json({
      success: false,
      error: 'Failed to fetch breaking news'
    })
  }
})

// Simple HTML interface
app.get('/', (req, res) => {
  res.send(`
    <!DOCTYPE html>
    <html>
    <head>
      <title>News Aggregator</title>
      <style>
        body { font-family: Arial, sans-serif; margin: 40px; }
        .category { margin-bottom: 30px; }
        .article { border: 1px solid #ddd; padding: 15px; margin: 10px 0; }
        .source { color: #666; font-size: 12px; }
        .summary { margin: 10px 0; }
        h1 { color: #333; }
        h2 { color: #666; }
        a { text-decoration: none; color: #0066cc; }
        a:hover { text-decoration: underline; }
      </style>
    </head>
    <body>
      <h1>📰 News Aggregator</h1>
      <div id="news-container">Loading news...</div>
      
      <script>
        async function loadNews() {
          try {
            const response = await fetch('/api/news')
            const result = await response.json()
            
            if (result.success) {
              displayNews(result.data)
            }
          } catch (error) {
            document.getElementById('news-container').innerHTML = 
              '<p>Error loading news. Please try again later.</p>'
          }
        }
        
        function displayNews(newsData) {
          const container = document.getElementById('news-container')
          let html = ''
          
          Object.entries(newsData).forEach(([category, articles]) => {
            html += \`<div class="category">
              <h2>\${category.toUpperCase()}</h2>\`
            
            articles.forEach(article => {
              html += \`<div class="article">
                <h3><a href="\${article.url}" target="_blank">\${article.title}</a></h3>
                <div class="source">Source: \${article.source} | \${article.publishedTime || 'Recently'}</div>
                <div class="summary">\${article.summary}</div>
              </div>\`
            })
            
            html += '</div>'
          })
          
          container.innerHTML = html
        }
        
        // Load news on page load
        loadNews()
        
        // Refresh every 5 minutes
        setInterval(loadNews, 5 * 60 * 1000)
      </script>
    </body>
    </html>
  `)
})

const PORT = process.env.PORT || 3000
app.listen(PORT, () => {
  console.log(`📰 News Aggregator running on port ${PORT}`)
  console.log(`Visit http://localhost:${PORT} to view news`)
})

Run Your Aggregator (1 minute)

Start your news aggregator:

node server.js

Visit http://localhost:3000 to see your news aggregator in action!

API Endpoints

Your aggregator provides several useful endpoints:

  • GET /api/news - All categorized news
  • GET /api/news/tech - Technology news only
  • GET /api/news/business - Business news only
  • GET /api/breaking - Breaking news
  • GET /api/news/:category?limit=10 - Custom limit per category

Quick Enhancements

Want to make it even better? Add these features:

Auto-refresh:

// Add to your frontend
setInterval(() => {
  loadNews()
}, 2 * 60 * 1000) // Refresh every 2 minutes

Search functionality:

// Add to NewsAggregator class
async searchNews(query, limit = 10) {
  const results = await this.zapserp.search({
    query: `${query} news`,
    engines: ['google', 'bing'],
    limit: limit * 2
  })
  
  return this.filterNewsUrls(results.results).slice(0, limit)
}

Caching for performance:

const NodeCache = require('node-cache')
const cache = new NodeCache({ stdTTL: 300 }) // 5 minutes

// Wrap your methods with caching
async getCachedNews(category) {
  const cacheKey = `news_${category}`
  let news = cache.get(cacheKey)
  
  if (!news) {
    news = await this.getNewsByCategory(category)
    cache.set(cacheKey, news)
  }
  
  return news
}

Deployment Tips

Deploy to Heroku:

  1. Add "start": "node server.js" to package.json
  2. Set your ZAPSERP_API_KEY in Heroku config vars
  3. Deploy with git push heroku main

Deploy to Vercel:

  1. Add vercel.json with Node.js configuration
  2. Set environment variables in Vercel dashboard
  3. Deploy with vercel --prod

Conclusion

In just 10 minutes, you've built a functional news aggregator that:

  • Pulls real-time news from multiple sources
  • Categorizes content automatically
  • Provides a clean web interface
  • Offers a RESTful API for integration

The aggregator is ready for production use and can be easily extended with additional features like user preferences, email notifications, or mobile apps.

Want to add more advanced features? Check out our Advanced Data Extraction Techniques guide for more sophisticated processing patterns.

Found this helpful?

Share it with your network and help others discover great content.

Related Articles

Build a live stock market monitoring dashboard using Zapserp for real-time financial news and market analysis. Complete with React components and WebSocket updates.

4 min read
Financial Apps

Master advanced Next.js patterns for AI applications with Zapserp. Learn server components, edge functions, real-time updates, and enterprise deployment strategies.

16 min read
Next.js & Vercel

Build an automated SEO content gap analysis tool to discover ranking opportunities, analyze competitor content strategies, and identify high-value keywords your competitors rank for but you don't.

3 min read
Digital Marketing