Data Catalog
58 analyst-ready ClickHouse tables for crypto data analysis
58 tables across 11 categories, available via a readonly ClickHouse account.
The Surf Data Catalog documents all tables available to agents and analysts. Tables are organized into categories and exposed as ClickHouse views in the agent and curated databases.
Connection
| Setting | Value |
|---|---|
| Protocol | ClickHouse HTTP interface (port 8443, TLS) |
| User | agent (readonly) |
| Databases | agent (enriched tables), curated (analytics views) |
| Max execution time | 120s |
| Max memory | 16GB |
| Max result rows | 1M |
Quick Start
-- Top DEX protocols by volume yesterday
SELECT project, sum(amount_usd) AS volume_usd
FROM agent.ethereum_dex_trades
WHERE block_date = today() - 1
GROUP BY project
ORDER BY volume_usd DESC
LIMIT 10💡 Tip
Always filter on block_date first — it is the partition key for most tables and enables ClickHouse to skip entire partitions.
Categories
| Category | Tables | Description |
|---|---|---|
| DEX Trades | 6 | Enriched swap events across 6 chains with token symbols, USD amounts, and protocol attribution. |
| Prices | 3 | Daily token prices combining CoinGecko data and DEX VWAP calculations. |
| Token Metadata | 1 | ERC-20 contract registry mapping addresses to symbols, names, and decimals. |
| Transfers | 5 | ERC-20 and native token transfers enriched with symbols and USD pricing. |
| TVL / Fees / Yields | 6 | Protocol-level daily aggregates for total value locked, fee revenue, and yield rates. |
| Lending & Staking | 2 | Protocol-level daily activity aggregates for lending and staking protocols. |
| Bridges | 1 | Cross-chain bridge volume aggregated daily. |
| Prediction Markets | 17 | Polymarket, Kalshi, and cross-platform prediction market data — trades, market details, prices, volume, and open interest. |
| Curated Analytics | 11 | Dashboard-ready pre-computed analytics — whale trades, hot markets, leaderboards, and daily reports. |
| Hyperliquid | 3 | Perpetual futures market data, funding rates, and contract metadata from Hyperliquid L1. |
| Chain Metrics | 3 | Daily chain-level aggregates including transaction counts, active addresses, gas usage, and contract deployments. |
Query Best Practices
- Always filter on block_date — partition pruning is the #1 performance lever
- Use LIMIT liberally during exploration; remove for final aggregation
- Pre-filter in CTEs before JOINing — ClickHouse puts RIGHT side into hash table
- Use IN (SELECT ...) instead of JOIN when you only need existence checks
- Avoid SELECT * on billion-row tables — project only needed columns
- For ReplacingMergeTree tables, always use FINAL to get deduplicated rows
- Date range first, then entity filters — this matches partition layout
- Use toStartOfWeek/toStartOfMonth for time-series rollups, not GROUP BY toString(block_date)
- When counting unique addresses, uniq() is faster than count(DISTINCT) and equally accurate
- For cross-table analysis, filter each table independently in CTEs before joining
- Handle NULLs explicitly — use ifNull(col, default) or coalesce(). ClickHouse 3-valued logic silently drops NULL rows in WHERE filters.
- FINAL syntax: FROM table_name alias FINAL (not FROM table_name FINAL alias). Place FINAL after the alias.
Entity Linking
Common join patterns across tables:
token_address -> symbol: JOIN agent.ethereum_erc20 ON contract_addresscondition_id -> market_question: JOIN agent.polymarket_market_details ON condition_idmarket_ticker -> category: Use curated.kalshi_daily_categorized (pre-joined)coin -> max_leverage: JOIN agent.hyperliquid_perp_meta ON coinproject + version -> TVL: JOIN agent.ethereum_tvl_daily ON (project, version, block_date)
Data Freshness
| Pipeline | Schedule | Typical Lag |
|---|---|---|
| Ethereum raw (BQ) | 12:00 UTC | ~1 day |
| Base raw (cryo) | Sensor (6h) | ~6 hours |
| Arbitrum raw (BQ) | 12:30 UTC | ~1 day |
| Polygon raw (BQ) | Every 30 min | ~1 hour |
| TRON raw (BQ) | 13:30 UTC | ~1 day |
| Bitcoin raw (BQ) | 14:00 UTC | ~1 day |
| HyperEVM (S3) | 13:00 UTC | ~1 day |
| CoinGecko prices | Hourly | ~1 hour |
| RPC snapshots | Daily | ~1 day |
| Kalshi API | Every 6h | ~6 hours |
| Polymarket (BQ) | 03:00 UTC | ~1 day |
| dbt models | Post-ingestion | +30 min |
Check freshness for any table:
SELECT max(block_date) FROM agent.<table_name>