Data Catalog

58 analyst-ready ClickHouse tables for crypto data analysis

58 tables across 11 categories, available via a readonly ClickHouse account.

The Surf Data Catalog documents all tables available to agents and analysts. Tables are organized into categories and exposed as ClickHouse views in the agent and curated databases.

Connection

SettingValue
ProtocolClickHouse HTTP interface (port 8443, TLS)
Useragent (readonly)
Databasesagent (enriched tables), curated (analytics views)
Max execution time120s
Max memory16GB
Max result rows1M

Quick Start

-- Top DEX protocols by volume yesterday
SELECT project, sum(amount_usd) AS volume_usd
FROM agent.ethereum_dex_trades
WHERE block_date = today() - 1
GROUP BY project
ORDER BY volume_usd DESC
LIMIT 10
💡 Tip

Always filter on block_date first — it is the partition key for most tables and enables ClickHouse to skip entire partitions.

Categories

CategoryTablesDescription
DEX Trades6Enriched swap events across 6 chains with token symbols, USD amounts, and protocol attribution.
Prices3Daily token prices combining CoinGecko data and DEX VWAP calculations.
Token Metadata1ERC-20 contract registry mapping addresses to symbols, names, and decimals.
Transfers5ERC-20 and native token transfers enriched with symbols and USD pricing.
TVL / Fees / Yields6Protocol-level daily aggregates for total value locked, fee revenue, and yield rates.
Lending & Staking2Protocol-level daily activity aggregates for lending and staking protocols.
Bridges1Cross-chain bridge volume aggregated daily.
Prediction Markets17Polymarket, Kalshi, and cross-platform prediction market data — trades, market details, prices, volume, and open interest.
Curated Analytics11Dashboard-ready pre-computed analytics — whale trades, hot markets, leaderboards, and daily reports.
Hyperliquid3Perpetual futures market data, funding rates, and contract metadata from Hyperliquid L1.
Chain Metrics3Daily chain-level aggregates including transaction counts, active addresses, gas usage, and contract deployments.

Query Best Practices

  1. Always filter on block_date — partition pruning is the #1 performance lever
  2. Use LIMIT liberally during exploration; remove for final aggregation
  3. Pre-filter in CTEs before JOINing — ClickHouse puts RIGHT side into hash table
  4. Use IN (SELECT ...) instead of JOIN when you only need existence checks
  5. Avoid SELECT * on billion-row tables — project only needed columns
  6. For ReplacingMergeTree tables, always use FINAL to get deduplicated rows
  7. Date range first, then entity filters — this matches partition layout
  8. Use toStartOfWeek/toStartOfMonth for time-series rollups, not GROUP BY toString(block_date)
  9. When counting unique addresses, uniq() is faster than count(DISTINCT) and equally accurate
  10. For cross-table analysis, filter each table independently in CTEs before joining
  11. Handle NULLs explicitly — use ifNull(col, default) or coalesce(). ClickHouse 3-valued logic silently drops NULL rows in WHERE filters.
  12. FINAL syntax: FROM table_name alias FINAL (not FROM table_name FINAL alias). Place FINAL after the alias.

Entity Linking

Common join patterns across tables:

  • token_address -> symbol: JOIN agent.ethereum_erc20 ON contract_address
  • condition_id -> market_question: JOIN agent.polymarket_market_details ON condition_id
  • market_ticker -> category: Use curated.kalshi_daily_categorized (pre-joined)
  • coin -> max_leverage: JOIN agent.hyperliquid_perp_meta ON coin
  • project + version -> TVL: JOIN agent.ethereum_tvl_daily ON (project, version, block_date)

Data Freshness

PipelineScheduleTypical Lag
Ethereum raw (BQ)12:00 UTC~1 day
Base raw (cryo)Sensor (6h)~6 hours
Arbitrum raw (BQ)12:30 UTC~1 day
Polygon raw (BQ)Every 30 min~1 hour
TRON raw (BQ)13:30 UTC~1 day
Bitcoin raw (BQ)14:00 UTC~1 day
HyperEVM (S3)13:00 UTC~1 day
CoinGecko pricesHourly~1 hour
RPC snapshotsDaily~1 day
Kalshi APIEvery 6h~6 hours
Polymarket (BQ)03:00 UTC~1 day
dbt modelsPost-ingestion+30 min

Check freshness for any table:

SELECT max(block_date) FROM agent.<table_name>