An RIA list built from scratch is one of those projects that looks straightforward and turns out to be a six-week engineering effort if you do it well. We've built and rebuilt this pipeline many times. Here's the playbook with the gotchas at every step.
Step 1: Decide What "RIA" Means for Your List
Sounds obvious. It's not. "RIA" can mean:
- SEC-registered investment advisers only (firms over $100M AUM)
- SEC + state-registered (every regulated investment adviser)
- Pure-play fee-only RIAs (excluding hybrids with BD affiliation)
- Hybrids (firms with both RIA and BD affiliation)
- Specific advisor reps (IARs), not the firms themselves
Each definition produces a different list. The SEC-only universe is about 15,000 firms. SEC + state is closer to 35,000 firms. IARs (individual reps) number in the hundreds of thousands. We cover the channel distinctions in detail here.
Most outreach use cases want SEC + selected state-registered firms over a minimum AUM threshold. Pick before you start.
Step 2: Pull the Universe from IAPD
IAPD (Investment Adviser Public Disclosure) is the SEC's free public portal for investment adviser data. The full dataset is downloadable as Form ADV bulk files. You want:
- The Investment Adviser Firm Summary file (firm-level data)
- The Investment Adviser Representative file (IAR data)
- The Form ADV Part 1A and Schedule A/B files (filing details)
These are large XML or CSV files. Plan for a few gigabytes uncompressed. Parse with care. The schema changes occasionally and historical filings have different formats than current ones.
The IAPD bulk download covers SEC-registered firms thoroughly. State-only firms file with their state regulator and not always with the SEC. If you need state-registered firms, you have to pull from each state separately, which is its own project.
Step 3: Cross-Reference FINRA BrokerCheck
If your list includes broker-dealer activity or hybrid registration data, FINRA BrokerCheck is the parallel source. BrokerCheck doesn't have a clean bulk download (their API is restrictive and rate-limited), but you can pull individual CRD records and aggregate. For hybrid firm identification, you need both ADV and BD registration data linked by CRD number.
FINRA also publishes the BD Database (Form BD filings) which is the parallel of ADV for broker-dealers. If you need any BD data this is your source.
Step 4: Pull Form ADV Detail
Once you have the firm universe, pull the rich Form ADV detail for each firm. The key items:
- Item 1 (identifying info, address, contact email)
- Item 5 (employees, AUM, client types, services)
- Item 7 (affiliations)
- Item 9 (custody)
- Item 11 (disciplinary)
- Schedule A (direct owners and officers)
- Schedule B (indirect owners)
Schedule A is the goldmine. It names the principal officers, their ownership percentage, their CRD number, and their start date. This is how you identify the controlling principal at every firm.
Step 5: Apply Your Criteria
With the full firm-and-officer dataset loaded, apply your filters:
- AUM band (Item 5.F)
- Client type composition (Item 5.D)
- Affiliation status (Item 7)
- Custody model (Item 9)
- Disciplinary history (Item 11)
- Geographic location (Item 1)
- Firm age and growth trajectory (year-over-year AUM)
A typical campaign filter (SEC-registered RIAs, $250M to $1B AUM, no BD affiliation, primary residence in target states, no significant disciplinary disclosures) cuts the 15,000-firm SEC universe to 1,500 to 2,500 firms.
Step 6: Identify the Right Person
Schedule A gives you the principal officers. The right outreach target depends on your product:
- For tech and platform decisions: the COO, Director of Operations, or CTO. At small firms this is the controlling principal.
- For investment-related decisions: the CIO or head of investments.
- For compliance products: the CCO.
- For M&A or financing: the controlling principal or CEO.
Form ADV lists officers but doesn't classify their functional role beyond title. You often need to enrich with LinkedIn or firm-website data to identify the right functional contact.
Step 7: Verified Contact Enrichment
This is the step most teams underestimate. The ADV gives you the firm address and the compliance email. It doesn't give you the principal's direct email or mobile. Enrichment options:
- Generic B2B databases (ZoomInfo, Apollo, Cognism) for email + phone, with deliverability checking
- LinkedIn-derived data via dedicated tools (RocketReach, ContactOut)
- Direct outreach to firm websites to find named officer email patterns, then validation
- Specialist advisor-data vendors (us, AdvizorPro) that bundle enrichment
Whatever your method, run email validation before sending. We use a multi-step verification including MX lookup, SMTP check, and recent-activity validation. Expect 80% to 90% deliverability for well-enriched principal-level contacts. Lower and your bounces will damage your sending domain.
Step 8: Validate and Sample-Test
Before launching a full campaign, manually check 25 to 50 records from your list. Confirm:
- The firm is who you think it is (correct channel, AUM band, services)
- The named person is currently with the firm (LinkedIn check)
- The email format matches the firm's other public email patterns
- The phone, if direct-dial, is reachable
If your manual-check error rate is over 10%, fix the pipeline before going broader. This step is non-negotiable.
Step 9: Refresh Cadence
RIA data has decay. Principals leave, firms merge, AUM changes, registrations lapse. Annual ADV amendments rebuild much of the universe every spring. Plan a refresh cadence:
- Quarterly: rebuild AUM and firm-status data
- Monthly: re-check email deliverability on stale records
- Per-campaign: validate the actual list you're about to email
When to Outsource
If you're a wealthtech or fintech company running quarterly campaigns and don't have dedicated data engineering, outsource. The pipeline above takes 4 to 8 engineering weeks to build, plus ongoing maintenance. We've already built it and can deliver a list in 3 to 5 business days.
If you have data engineering and run continuous campaigns, build it in-house. The economics flip above about 200,000 records per year of usage.
Pipeline Architecture Patterns
Three patterns we've seen work for sustained RIA-list pipelines.
The batch refresh pattern. Pull bulk IAPD and ADV data monthly. Normalize and de-duplicate. Apply current filter criteria. Run enrichment against a contact-data vendor. Output a fresh CSV monthly. Simple, reliable, and good enough for most quarterly outreach motions.
The streaming pattern. Subscribe to ADV amendment feeds (if your vendor exposes them) or poll the IAPD search API. Detect changes (new registrations, amendment filings, AUM changes) and trigger enrichment workflows for changed firms only. Higher complexity but better data freshness. Used by teams that monitor watchlists.
The hybrid pattern. Maintain a base list refreshed quarterly. Run streaming detection on a watchlist subset (top 500 to 2,000 target accounts). Trigger high-value alerts on watchlist firms while keeping the broader universe fresh enough for cold outreach. This is what most enterprise wealthtech teams settle on.
The Cost of Bad Data
If you skip any of the steps above, the cost shows up downstream. Specific failure modes we've seen:
- Sending to outdated emails: a 25% bounce rate damages your sender reputation and depresses inbox placement for 4 to 6 weeks afterward.
- Targeting wrong-channel firms: pitching RIA software to bank-trust-department contacts wastes AE cycles and frustrates good prospects who get the wrong pitch.
- Mis-classified AUM: pitching enterprise pricing to a $200M emerging RIA kills the deal in the first call.
- Wrong functional contact: hitting the office manager instead of the CIO loses an entire outreach cycle and resets the firm's interest level.
The pipeline above seems heavy because it is. Each step exists because we've watched teams skip it and lose campaigns. The good news is that once the pipeline is built, the marginal cost per list is low.
Tools That Help
If you're building in-house, a few tools are worth knowing.
For ADV bulk parsing: Python with lxml or Polars handles the XML well. The SEC's documentation is sparse but the schema is documented in the Form ADV instructions.
For FINRA CRD lookups: there's no public API. Most teams scrape (carefully, with rate limiting) or buy access through a data vendor.
For contact enrichment: Apollo, ZoomInfo, and Cognism for breadth. NeverBounce, ZeroBounce, or Kickbox for email validation. We use a multi-vendor cascade with our own quality scoring on top.
For LinkedIn-derived data: RocketReach, ContactOut, and Hunter cover most needs. Direct LinkedIn scraping violates ToS and is risky.
For ADV-amendment monitoring: the SEC's EDGAR full-text RSS feed plus a parser. Or use a vendor that's already done this.
If you'd rather not build the pipeline, the standard build above is roughly what we run for every list. Send us the criteria and the list shows up in 3 to 5 business days.
Sanity-Checking Your Pipeline
A few sanity checks to run on any RIA list, whether you built it or bought it.
Total count. SEC-registered RIAs number around 15,000. SEC plus state-registered is closer to 35,000. If your full-universe count is wildly different, you have a problem.
AUM distribution. The distribution is heavily right-skewed. Most firms are sub-$500M. A handful of firms are over $50B. Median is well below mean. If your distribution looks normal, something's wrong with your filter or your data.
Geographic distribution. California, New York, Texas, Florida, and Illinois dominate. If your list over-indexes another state by an unexpected margin, check for filter errors.
Channel mix. Pure RIAs are roughly 30% of advisors but a much higher share of total AUM. Hybrid firms are growing. If your channel mix doesn't reflect your filter intent, recheck.
These sanity checks catch most pipeline bugs before they hit a campaign. They take 10 minutes to run and save weeks of recovery from a bad campaign.