Amassed Insights #3: LegAI & Compliance
The Compliance Playbook for AI-Era Data Deals (Plus, a Legal Thriller You Won’t Believe)

New Alternative Data Compliance Concerns
The fast-moving alternative data industry is always on the forefront of new technologies and business models, creating significant risks for the end users. The legal & compliance departments of data buyers, particularly those in highly-regulated industries like asset management, have an obligation to stay abreast of how these new technologies are being leveraged by their data suppliers.
Modernizing your Compliance Procedures in the Age of AI
All of the new artificial intelligence (AI) tools & data processes, including but not limited to the Large Language Models (LLMs) that power Generative AI (GenAI), used by both data providers and data consumers alike, have created a host of novel legal & compliance concerns. If they haven't already, data buyers need to add a section to their compliance Due Diligence Questionnaires (DDQs) and data providers need to integrate AI into their workflows thoughtfully.
With these concerns top of mind to many industry participants, we've started to see standards being publishing by certain industry experts. The Alternative Data Council of FISD had published a widely-used (but typically heavily-modified by savvy legal teams) standard alt data DDQ a few years ago. But recently they've published a major update to this industry standard DDQ, adding several questions geared towards AI, and an overview of compliance considerations if you're using GenAI. Additionally, the alternative data legal experts at the Lowenstein Sandler lawfirm published a summary of their concerns and recommendations in the Hedge Fund Law Report. Below is my abridged summary of these materials:
- Diligence your internal AI systems similarly to externally sourced ones.
- You'll need to decide what your risk tolerance is overall as the laws on fair use and human authorship of GenAI output is still in flux.
- For example, what training data is being used by the model, and does has it secured the appropriate permissions to train on that data?
- Additional AI questions in an alt data DDQ should cover:
- The AI tools used and the nature of how/why they are used, such as whether you're using the public version of the tool or you've internally adapted the tool.
- The sources of Personally Identifiable Information (PII), Material Non-Public Information (MNPI), confidential information or intellectual property (IP) that are being disclosed to the AI tools.
- How the output of the AI systems is being reviewed, if at all.
- License agreements for AI systems should cover:
- IP rights in the software underlying the AI system itself.
- Confidentiality and use protections for the data buyers' confidential information, which may include restrictions on the use of such information to train the AI system.
- Enhanced indemnification obligations for IP infringement.
How are you adapting your AI compliance frameworks?
Let's Discuss This over Breakfast Tomorrow, April 3!
Speaking of the brilliant legal minds at Lowenstein Sandler, I'm co-hosting alongside them the inaugral edition of the Alternative Data Breakfast Series tomorrow, April 3 at 8:30 AM in NYC. Boris Liberman from Lowenstein Sandler, Jason Koulouras from Bridgewater Associates, Michael Recce from AlphaROC and I will explore strategies for asset managers to accelerate alpha generation using alternative data, and for data providers to enhance data monetization within the asset management sector, including a brief discussion of the above alternative data AI concerns. To be added to invitations like these, share your info below.
Law-Related AI News
- In the previous edition of this newsletter, I referenced the OpenAI whistleblower, Suchir Balaji's, detailed exploration into what the line should be for fair use for training GenAI models. Shockingly, three days later, Suchir was found dead in his apartment and the authorities quickly closed the case and ruled a suicide. Since then, his mother has been on a media tour trying to bring attention to the suspicious circumstances of his son's death. This story seems far from over and it's very hard to know who or what to trust here, but the most recent, unverified viral thread about the case points to a possible drugging, second bullet and a botched autopsy.
🚨 New forensic findings have just been released in the death of Suchir Balaji — a whistleblower against OpenAI.
— James Li (@5149jamesli) March 26, 2025
Police ruled it a suicide.
But the evidence just uncovered tells a very different story: drugging, a possible second bullet, and a botched autopsy. 🧵1/ pic.twitter.com/lywn24trtL
- Cloudflare is luring web-scraping bots into an ‘AI Labyrinth’ by Wes Davis
- Cloudflare, one of the biggest network internet infrastructure companies in the world, has announced AI Labyrinth, a new tool to fight web-crawling bots that scrape sites for AI training data without permission. The company says in a blog post that when it detects “inappropriate bot behavior,” the free, opt-in tool lures crawlers down a path of links to AI-generated decoy pages that “slow down, confuse, and waste the resources” of those acting in bad faith.
- U.S. Copyright Office says AI generated content can be copyrighted — if a human contributes to or edits it by Carl Franzen
"The report reaffirms the longstanding principle that copyright applies only to human creativity. While AI can serve as a tool in the creative process, its outputs are not copyrightable unless a human author has exercised sufficient creative control.
The Copyright Office outlines three key scenarios where AI-generated material can apply for, and receive, an official certificate of copyright from the office:
- When human-authored content is incorporated into the AI output.
- When a human significantly modifies or arranges the AI-generated material.
- When the human contribution is sufficiently expressive and creative."
Be Very Careful when Monetizing Location Data
- US company's geolocation data transaction draws intense scrutiny in Germany by The Record
- A data marketplace, Datarade, connected a US-based data broker, Datastream Group, to a German journalist and provided 3.6 billion individual geolocation data points derived from millions of Germans' smartphone apps as a "free sample which was intended to serve as a preview for a monthly subscription." My one thought here: maybe offering a wide-ranging free trial to someone when you don't know their intentions isn't such a great idea?
- Anyone Can Buy Data Tracking US Soldiers and Spies to Nuclear Vaults and Brothels in Germany by Dhruv Mehrotra, Dell Cameron
"'The unregulated data broker industry poses a clear threat to national security,' says Ron Wyden, a US senator from Oregon with more than 20 years overseeing intelligence work. 'It is outrageous that American data brokers are selling location data collected from thousands of brave members of the armed forces who serve in harms’ way around the world.'"
Additional Data-Related Compliance News
- No, hashing still doesn't make your data anonymous by Federal Trade Commission
"companies often claim that hashing allows them to preserve user privacy...This logic is as old as it is flawed – hashes aren’t 'anonymous' and can still be used to identify users, and their misuse can lead to harm. Companies should not act or claim as if hashing personal information renders it anonymized. FTC staff will remain vigilant to ensure companies are following the law and take action when the privacy claims they make are deceptive."
- Fintech Giant Finastra Investigating Data Breach – Krebs on Security by Krebs on Security
- FCC’s Net Neutrality Rules Struck Down by Federal Appeals Court by The New York Times
- The trade-secrets fight between 2 of alternative data's biggest names is getting nasty by Bradley Saacks of Business Insider
- The Jefferies-owned M Science and the Carlyle-backed Yipit are suing each other, putting the alternative-data industry on high alert. I posted about this before and just one meme comes to mind here:

Data Providers & Products
If any of the following data providers piques your interest for any reason, respond and I'll share additional materials & directly introduce you, if necessary.
New Data Providers
- Quant Data Bureau
- Main Data Category: Quantitative Investment Trading Algorithms
- Brief: Provides curated datasets for quantitative investment research, including risk modeling, company classification, and market volatility insights.
- Covariance.ai
- Main Data Category: Alternative Data-Driven Investment Research
- Brief: Uses advanced machine learning to transform external data into actionable KPIs and insights for enterprises and financial institutions. Their platform provides accurate data analytics and forecasts, helping investors make confident decisions.
- Rogo
- Main Data Category: Financial Data-Driven Investment Research
- Brief: Platform for financial institutions, integrating public and private financial data sources to automate research, analysis, and workflow tasks access to sources including SEC filings and market research.
- HarmoniQ Insights
- Main Data Category: Quantitative Investment Trading Algorithms
- Brief: Offers quantitative analytical tools and processes for fundamental investment analysis, focusing on integrating quantitative methods into investment strategies.
- Zeki Research
- Main Data Category: Hiring & Employment
- Brief: Collects and analyzes talent data across deep-tech sectors, including AI, quantum computing, and life sciences, focusing on over 10M scientists and engineers. The platform evaluates expertise within 40,000+ organizations, providing insights to guide research and decisions.
New or Updated Data Products
- Archedata Joins Forces with Eagle Alpha and 3D Innovations to Deliver a Secure, Curated Services Library for Kairos, its Procurement and Vendor Management Solution by Archedata
- Neudata releases GPT-powered AI tool, enabling faster data scouting by Neudata
- Datos, A Semrush Company, Expands E-commerce Data Access with Grips Intelligence by Datos Team
- FactSet and MT Newswires Partner to Power Financial News and AI Capabilities by Michael Mayhew
- FactSet and MT Newswires recently announced an expanded partnership to enhance FactSet’s AI-driven financial news capabilities. In addition, MT Newswire will adopt FactSet’s workstations and content into its news production facilities. The collaboration between these two firms will integrate MT Newswires' content into FactSet's AI delivery channels.
- Similarweb & S&P Global Partner on Credit Risk by Or Offer
- Similarweb & S&P Global Market Intelligence partner to integrate digital footprint data into credit risk predictions...some measure of alternative data maturity is finally coming to the risk side of investing!
- Introducing The TRUF.Network - Redefining Economic Data by Truflation
- The TRUF.Network is a revolutionary decentralized platform that provides real-time economic data, enabling new levels of transparency and accuracy. TRUF.Network forms part of the broader Truflation ecosystem, aiming to redefine how economic data is sourced, managed, utilized, and validated.
M&A + Funding
- Carbon Arc Emerges from Stealth Mode and Raises $56 mln in Seed Capital by Michael Mayhew
- A huge round for an alt data veteran with a unique business model.
- Rogo raises $18M Series A from Khosla Ventures to Build Wall Street’s First AI Analyst - Rogo by Rogo
- Which of the many AI analyst platforms will win?
- Forian Acquires Kyber Data Science to Enhance Data Analytics Capabilities by Forian Inc.
- Castine's Acquisition of ResearchPool - A Conversation with the Decision Makers by Castine Conversations
- The combined company can now offer research discovery and due diligence, research management, budgeting, and payments from end-to-end, helping existing CSA users and those returning to CSAs in the UK and the EU, to benefit from a one-stop approach for solutions.
- ExtractAlpha Acquires ESG Analytics, Appoints Qayyum Rajan as Head of ExtractAlpha Labs by Julie Craig
- ExtractAlpha, a leading provider of alternative data and analytics solutions, announced the acquisition of ESG Analytics, an innovative platform offering real-time, actionable insights into environmental, social, and governance (ESG) metrics.
13 Data Providers Featured
I had recently been digging into a data request from a client related to some niche commodities and it led me down a rabbit hole of the Fastmarkets "data rollup", as Matt Ober would call it. In June 2023, Fastmarkets demerged from another rollup, Delinian (fka Euromoney) and is now backed by Astorg, a private equity firm. Fastmarkets has numerous subsidiaries, most of which are discrete data providers/products, typically focused on a different niche commodity. In my proprietary data provider profile taxonomy, commodity market data is categorized as Financial Market Data. I've profiled the following data providers owned by Fastmarkets:
- FastMarkets
- Brief: Provides live prices, news, market data feeds, and research and analysis reports on base and precious metal markets.
- Fastmarkets RISI
- Brief: Provides forest product price reporting and market analytical insights.
- The Jacobsen Publishing
- Brief: Specializes in commodity price reporting and forecasting, offering data on animal fats & oils, biodiesel, grain & feed ingredients, hemp, hides & leather, organic & non-GMO products, sausage casings, and vegetable oils
- Census Commodity Data
- Brief: Provides independent price assessments and market analysis for global biofuel and related certificate markets.
- Foex Indexes
- Brief: Provides audited price indices for pulp, paper, recovered paper, wood-based bioenergy, and timber.
- HMR (Harwood Market Report)
- Brief: Provides pricing and market commentary on hardwood lumber and the lumber products industry.
- Metal Bulletin
- Brief: Provides leading pricing intelligence, including independent industry benchmarks, for the metal and mining industry.
- RISI-UMPAPER.com
- Brief: The leading local pulp and paper market information provider.
Recent News, Blogs & Podcasts
106 Featured Articles, including:
- Lessons from Cybersyn by Alex Izydorczyk
- My main question from my Cyber Sins post concerning the shutdown of Cybersyn has been answered:
"Cybersyn had the opportunity to return significant capital to our investors based on our financial position and by selling the assets of the public domain business to Snowflake.
These realizations led us to shut down for the purpose of returning maximum capital. In another version of this story, a more reckless (or courageous) founder might have pushed ahead, spending the remaining capital on acquiring the necessary data, even with uncertain prospects for fast revenue growth."
- Data-Hungry Investors Dive Deep for Economic Clues by Owen Tucker-Smith
- A Comprehensive Guide to Navigating Alternative Data by Stacie Rabinowitz
- The Invisible Curve of Power by The Terminalist
Alternative Data Market Sizing Research
- How big is the alternative data market for investment managers? by Neudata
- Neudata estimates the alternative data market could be as large as $40bn by 2030.
- Hedge funds are planning a 'budget boom' in 2025 for the datasets that cost them millions a year by Bradley Saacks
- The above Neudata report finds that 95% of asset managers are either increasing their budgets next year or keeping the same data spend.
- The report notes the average manager only uses 20 datasets out of the more than 3,500 providers in the industry.
- Alternative data spending has become crucial for funds to keep pace with rivals.
- Alternative Data Poised for More Growth in the Age of AI: The 2024 Lowenstein Sandler Alternative Data Report by Lowenstein Sandler
- Eagle Alpha’s 4th Annual Alternative Data Report 2024 by Sarah Morrissey
- This annual alternative data report includes granular industry trends, data sourcing, leading case studies, and technical product updates.
- Alternative Data Research Report 2025 by Exabel
- Alternative Data Market - Industry Analysis and Forecast 2030 by Maximize Market Research
- Web Traffic and Clickstream Data Market by Mikheil Shengelia
- It is essential for businesses to understand customer journeys at various touchpoints and clickstream data analytics has become an important tool when evaluating online behaviors with the market size expected to reach 1.3 billion USD by 2030.
Podcasts
- The Battlefin Episode by The Alternative Data Podcast
- BattleFin, one of three main alternative data event series, acquired Exabel, a plug-and-play alternative data backtesting & analytics platform. To me, this merger seems like a natural combination of organizations in need of each other, and it will be interesting to track the pace of data acquisition in Exabel's platform post-acquisition.
- Episode 303: AggKnowledge’s Dan Entrup by WatersTechnology
- My favorite parts and why:
- 25:56 Why Dan doesn’t like the term “alternative data”
- I also don't like the term, but it's such a well-known umbrella term nowadays that it feels like a necessary-evil.
- 27:53 Data is still sold, not bought
- Preach!
- 25:56 Why Dan doesn’t like the term “alternative data”
- My favorite parts and why:
- The Norges Bank (NBIM) Episode by The Alternative Data Podcast
- Mark Thompson works in Primary Research for NBIM, the Norwegian sovereign wealth fund with $1.7tn (!) under management. The Marks discuss many of the unique challenges and opportunities that come with managing such a large amount of money, and the ways that the relatively lean team at NBIM ingest and use alternative data.
13 Upcoming Events Featured, including:
- Amass Insights & Lowenstein Sandler's Alternative Data Breakfast Series #1: Accelerating Alpha Generation & Data Monetization starts on Apr 3, 2025 in New York. RSVP here to attend.
- R Finance's Open Source Quantitative Finance 2025 starts on Apr 11, 2025 in Chicago.
- Neudata's Neudata New York Summer Data Summit 2025 starts on May 8, 2025 in New York.
- BattleFin's BattleFin Alternative Data Experience 2025 starts on May 14, 2025 in New York.
- The University of Chicago's Market Microstructure, Quantitative Trading, High Frequency and Large Data 2025 starts on May 15, 2025 in Chicago.
- Talking Hedge's Talking Hedge Austin 2025 starts on May 20, 2025 in Austin.
- Forecasting Financial Markets Association's Forecasting Financial Markets 2025 starts on May 21, 2025 in Venice.
My Alternative Data Events
As evidenced by the first Alternative Data Breakfast I announced above, I'm starting to expand beyond just monthly informal happy hours!


Bonus: Great Book about Boxing History
I just finished reading my father-in-law's awesome book, Sparring with Smokin' Joe, about the life of Joe Frazier and his phenom son, Marvis. It covers several months in the gym, on the road, and in verbal tussles with Joe Frazier in 1980. Along the way, it takes a personal look at his many epic fights, his legendary battles with Muhammad Ali, and the impact of racial and cultural upheavals on the legacies of both fighters.
The book was just released in paperback in February after a highly-praised 2021 hardcover release. Shameless plug: use the promo code 'RLFANDF30' for 30% off when you buy from the publisher at the link above.