Want to build a feed generator, real-time analytics dashboard, or bot that responds to Bluesky events? You need the Firehose - Bluesky's real-time event stream. This guide covers everything from what the Firehose is to building applications that consume it.

What is the Bluesky Firehose?

The Firehose is a live stream of every public event that happens on the Bluesky network. Every post, like, repost, follow, block, and profile update flows through this stream in real-time.

Think of it as a river of data constantly flowing from all Bluesky users. By tapping into this stream, you can:

  • Build custom feeds - Filter and curate posts for specific audiences
  • Create analytics - Track trends, popular topics, and network statistics
  • Power bots - Respond to mentions, keywords, or specific actions
  • Send notifications - Alert users when something relevant happens
  • Archive content - Store posts for research or backup

Two Ways to Access: Firehose vs Jetstream

Bluesky offers two ways to consume the real-time stream:

1. Raw Firehose (com.atproto.sync.subscribeRepos)

The original, full-fidelity stream:

  • Format: CBOR (Concise Binary Object Representation)
  • Content: Complete repository sync events with Merkle tree proofs
  • Endpoint: wss://bsky.network/xrpc/com.atproto.sync.subscribeRepos
  • Use case: When you need cryptographic verification or full sync capability

2. Jetstream (Recommended for Most Apps)

A developer-friendly alternative:

  • Format: JSON (easy to parse in any language)
  • Content: Record-level events with filtering options
  • Endpoint: wss://jetstream2.us-east.bsky.network/subscribe
  • Use case: Most applications - bots, feeds, analytics

Recommendation: Start with Jetstream. It's simpler, uses less bandwidth, and provides the same data for most use cases.

Understanding Event Types

Events in the stream correspond to ATProtocol collections. Each collection represents a type of content:

Collection Description
app.bsky.feed.post Posts (including replies and quotes)
app.bsky.feed.like Likes on posts
app.bsky.feed.repost Reposts (shares)
app.bsky.graph.follow Follow relationships
app.bsky.graph.block Block relationships
app.bsky.graph.list Lists (curation and moderation)
app.bsky.actor.profile Profile updates

Connecting to Jetstream

Here's how to connect to Jetstream and start receiving events:

Basic Connection (JavaScript/Node.js)

const WebSocket = require('ws');

const JETSTREAM_URL = 'wss://jetstream2.us-east.bsky.network/subscribe';

// Connect to all events
const ws = new WebSocket(JETSTREAM_URL);

ws.on('open', () => {
    console.log('Connected to Jetstream');
});

ws.on('message', (data) => {
    const event = JSON.parse(data.toString());
    console.log('Received event:', event);
});

ws.on('close', () => {
    console.log('Disconnected from Jetstream');
    // Implement reconnection logic
});

ws.on('error', (error) => {
    console.error('WebSocket error:', error);
});

Filtering by Collection

Reduce bandwidth by requesting only the collections you need:

// Only receive posts
const url = 'wss://jetstream2.us-east.bsky.network/subscribe?wantedCollections=app.bsky.feed.post';

// Multiple collections
const url = 'wss://jetstream2.us-east.bsky.network/subscribe?' +
    'wantedCollections=app.bsky.feed.post&' +
    'wantedCollections=app.bsky.feed.like';

const ws = new WebSocket(url);

Event Structure

Each Jetstream event looks like this:

{
    "did": "did:plc:xyz...",           // User who created the event
    "time_us": 1702829400000000,        // Microsecond timestamp
    "kind": "commit",                   // Event kind
    "commit": {
        "rev": "3abc...",               // Repository revision
        "operation": "create",          // create, update, or delete
        "collection": "app.bsky.feed.post",
        "rkey": "3abc...",              // Record key
        "record": {                     // The actual content
            "$type": "app.bsky.feed.post",
            "text": "Hello, Bluesky!",
            "createdAt": "2025-12-17T12:00:00Z"
        }
    }
}

Building a Keyword Monitor

Here's a complete example that monitors posts for specific keywords:

const WebSocket = require('ws');

class KeywordMonitor {
    constructor(keywords) {
        this.keywords = keywords.map(k => k.toLowerCase());
        this.ws = null;
    }

    connect() {
        const url = 'wss://jetstream2.us-east.bsky.network/subscribe?' +
            'wantedCollections=app.bsky.feed.post';

        this.ws = new WebSocket(url);

        this.ws.on('open', () => {
            console.log('Monitoring for keywords:', this.keywords);
        });

        this.ws.on('message', (data) => {
            this.handleMessage(JSON.parse(data.toString()));
        });

        this.ws.on('close', () => {
            console.log('Connection closed, reconnecting in 5s...');
            setTimeout(() => this.connect(), 5000);
        });

        this.ws.on('error', (error) => {
            console.error('WebSocket error:', error);
        });
    }

    handleMessage(event) {
        if (event.kind !== 'commit') return;
        if (event.commit.operation !== 'create') return;

        const record = event.commit.record;
        if (!record || !record.text) return;

        const text = record.text.toLowerCase();

        for (const keyword of this.keywords) {
            if (text.includes(keyword)) {
                this.onMatch(event, keyword);
                break;
            }
        }
    }

    onMatch(event, keyword) {
        console.log(`\n--- Match found for "${keyword}" ---`);
        console.log(`User: ${event.did}`);
        console.log(`Text: ${event.commit.record.text}`);
        console.log(`Time: ${new Date(event.time_us / 1000).toISOString()}`);

        // Build the post URL
        const postUri = `at://${event.did}/app.bsky.feed.post/${event.commit.rkey}`;
        console.log(`URI: ${postUri}`);
    }
}

// Usage
const monitor = new KeywordMonitor(['bluesky', 'atprotocol', 'skyscraper']);
monitor.connect();

Building a Feed Generator

Feed generators use the Firehose to collect posts and serve custom feeds. Here's the architecture:

// 1. Consume the firehose
const collectPosts = (event) => {
    if (event.commit.collection !== 'app.bsky.feed.post') return;

    const post = {
        uri: `at://${event.did}/app.bsky.feed.post/${event.commit.rkey}`,
        cid: event.commit.cid,
        author: event.did,
        text: event.commit.record.text,
        createdAt: event.commit.record.createdAt,
        indexedAt: new Date().toISOString()
    };

    // 2. Apply your feed logic
    if (matchesFeedCriteria(post)) {
        saveToDatabase(post);
    }
};

// 3. Serve the feed via API
app.get('/xrpc/app.bsky.feed.getFeedSkeleton', (req, res) => {
    const { feed, cursor, limit } = req.query;

    const posts = getPostsFromDatabase(feed, cursor, limit);

    res.json({
        cursor: posts.length ? posts[posts.length - 1].indexedAt : undefined,
        feed: posts.map(p => ({ post: p.uri }))
    });
});

Using the Raw Firehose

If you need the raw Firehose for verification or sync capabilities:

const WebSocket = require('ws');
const cbor = require('cbor');
const { CarReader } = require('@ipld/car');

const FIREHOSE_URL = 'wss://bsky.network/xrpc/com.atproto.sync.subscribeRepos';

const ws = new WebSocket(FIREHOSE_URL);

ws.on('message', async (data) => {
    // Messages are CBOR-encoded
    const decoded = cbor.decode(data);

    // The message contains a header and body
    const header = decoded[0];
    const body = decoded[1];

    if (header.op === 1 && header.t === '#commit') {
        // body.blocks contains a CAR file with the records
        const car = await CarReader.fromBytes(body.blocks);

        for (const op of body.ops) {
            console.log('Operation:', op.action, op.path);

            if (op.cid) {
                const block = await car.get(op.cid);
                const record = cbor.decode(block.bytes);
                console.log('Record:', record);
            }
        }
    }
});

Cursor-Based Recovery

Both Firehose and Jetstream support cursors for recovering missed events:

class ResilientConsumer {
    constructor() {
        this.lastCursor = null;
    }

    connect() {
        let url = 'wss://jetstream2.us-east.bsky.network/subscribe';

        // Resume from last position if available
        if (this.lastCursor) {
            url += `?cursor=${this.lastCursor}`;
        }

        this.ws = new WebSocket(url);

        this.ws.on('message', (data) => {
            const event = JSON.parse(data.toString());

            // Save cursor for recovery
            this.lastCursor = event.time_us;

            this.processEvent(event);
        });

        this.ws.on('close', () => {
            // Reconnect with cursor to resume where we left off
            setTimeout(() => this.connect(), 5000);
        });
    }

    processEvent(event) {
        // Your processing logic
    }
}

Jetstream Instances

Bluesky provides multiple Jetstream instances for redundancy:

  • wss://jetstream1.us-east.bsky.network/subscribe
  • wss://jetstream2.us-east.bsky.network/subscribe
  • wss://jetstream1.us-west.bsky.network/subscribe
  • wss://jetstream2.us-west.bsky.network/subscribe

Choose based on your geographic location, or implement failover between instances.

Best Practices

Performance

  • Filter at the source - Use wantedCollections to reduce bandwidth
  • Buffer events - Don't process synchronously; queue for async handling
  • Batch database writes - Insert in batches rather than per-event
  • Monitor memory - High-volume streams can consume significant RAM

Reliability

  • Implement reconnection - Connections will drop; auto-reconnect is essential
  • Persist cursors - Save cursor to disk/database for crash recovery
  • Handle backpressure - If you can't keep up, events will be dropped
  • Use multiple instances - Failover to another Jetstream if one is down

Rate Considerations

  • Peak volume - Thousands of events per second during busy periods
  • Bandwidth - Raw Firehose uses 4-8 GB/hour; Jetstream is more efficient
  • Storage growth - Plan for significant data if archiving all events

Use Cases at Skyscraper

Here's how we use the Firehose for Skyscraper's features:

Trending Hashtags

We consume all posts, extract hashtags, and calculate trending scores based on volume and velocity.

Keyword Alerts

When users configure alerts, we filter the stream for matching keywords and send push notifications.

Analytics

Aggregate statistics about posting patterns, popular topics, and network growth.

Frequently Asked Questions

What is the Bluesky Firehose?

A real-time stream of all public events on Bluesky - posts, likes, follows, and more - delivered via WebSocket.

What is Jetstream?

Bluesky's developer-friendly API for consuming the Firehose. It provides JSON format and filtering capabilities.

Should I use Firehose or Jetstream?

Use Jetstream for most applications. Use the raw Firehose only if you need cryptographic verification or full repository sync.

How much data does the Firehose produce?

The raw Firehose produces 4-8 GB per hour. Jetstream with filtering uses significantly less bandwidth.

Resources