Amassed Insights #3: LegAI & Compliance

The Compliance Playbook for AI-Era Data Deals (Plus, a Legal Thriller You Won’t Believe)

The Lowenstein Sandler and Amass Insights logos
Lowenstein Sandler & Amass Insights are starting an Alternative Data Breakfast Series

New Alternative Data Compliance Concerns

The fast-moving alternative data industry is always on the forefront of new technologies and business models, creating significant risks for the end users. The legal & compliance departments of data buyers, particularly those in highly-regulated industries like asset management, have an obligation to stay abreast of how these new technologies are being leveraged by their data suppliers.

Modernizing your Compliance Procedures in the Age of AI

All of the new artificial intelligence (AI) tools & data processes, including but not limited to the Large Language Models (LLMs) that power Generative AI (GenAI), used by both data providers and data consumers alike, have created a host of novel legal & compliance concerns. If they haven't already, data buyers need to add a section to their compliance Due Diligence Questionnaires (DDQs) and data providers need to integrate AI into their workflows thoughtfully.

With these concerns top of mind to many industry participants, we've started to see standards being publishing by certain industry experts. The Alternative Data Council of FISD had published a widely-used (but typically heavily-modified by savvy legal teams) standard alt data DDQ a few years ago. But recently they've published a major update to this industry standard DDQ, adding several questions geared towards AI, and an overview of compliance considerations if you're using GenAI. Additionally, the alternative data legal experts at the Lowenstein Sandler lawfirm published a summary of their concerns and recommendations in the Hedge Fund Law Report. Below is my abridged summary of these materials:

  • Diligence your internal AI systems similarly to externally sourced ones.
    • You'll need to decide what your risk tolerance is overall as the laws on fair use and human authorship of GenAI output is still in flux.
    • For example, what training data is being used by the model, and does has it secured the appropriate permissions to train on that data?
  • Additional AI questions in an alt data DDQ should cover:
    • The AI tools used and the nature of how/why they are used, such as whether you're using the public version of the tool or you've internally adapted the tool.
    • The sources of Personally Identifiable Information (PII), Material Non-Public Information (MNPI), confidential information or intellectual property (IP) that are being disclosed to the AI tools.
    • How the output of the AI systems is being reviewed, if at all.
  • License agreements for AI systems should cover:
    • IP rights in the software underlying the AI system itself.
    • Confidentiality and use protections for the data buyers' confidential information, which may include restrictions on the use of such information to train the AI system.
    • Enhanced indemnification obligations for IP infringement.

How are you adapting your AI compliance frameworks?

Let's Discuss This over Breakfast Tomorrow, April 3!

Speaking of the brilliant legal minds at Lowenstein Sandler, I'm co-hosting alongside them the inaugral edition of the Alternative Data Breakfast Series tomorrow, April 3 at 8:30 AM in NYC. Boris Liberman from Lowenstein Sandler, Jason Koulouras from Bridgewater Associates, Michael Recce from AlphaROC and I will explore strategies for asset managers to accelerate alpha generation using alternative data, and for data providers to enhance data monetization within the asset management sector, including a brief discussion of the above alternative data AI concerns. To be added to invitations like these, share your info below.

  • In the previous edition of this newsletter, I referenced the OpenAI whistleblower, Suchir Balaji's, detailed exploration into what the line should be for fair use for training GenAI models. Shockingly, three days later, Suchir was found dead in his apartment and the authorities quickly closed the case and ruled a suicide. Since then, his mother has been on a media tour trying to bring attention to the suspicious circumstances of his son's death. This story seems far from over and it's very hard to know who or what to trust here, but the most recent, unverified viral thread about the case points to a possible drugging, second bullet and a botched autopsy.

"The report reaffirms the longstanding principle that copyright applies only to human creativity. While AI can serve as a tool in the creative process, its outputs are not copyrightable unless a human author has exercised sufficient creative control.
The Copyright Office outlines three key scenarios where AI-generated material can apply for, and receive, an official certificate of copyright from the office:

  • When human-authored content is incorporated into the AI output.
  • When a human significantly modifies or arranges the AI-generated material.
  • When the human contribution is sufficiently expressive and creative."

Be Very Careful when Monetizing Location Data

"'The unregulated data broker industry poses a clear threat to national security,' says Ron Wyden, a US senator from Oregon with more than 20 years overseeing intelligence work. 'It is outrageous that American data brokers are selling location data collected from thousands of brave members of the armed forces who serve in harms’ way around the world.'"

"companies often claim that hashing allows them to preserve user privacy...This logic is as old as it is flawed – hashes aren’t 'anonymous' and can still be used to identify users, and their misuse can lead to harm. Companies should not act or claim as if hashing personal information renders it anonymized. FTC staff will remain vigilant to ensure companies are following the law and take action when the privacy claims they make are deceptive."

Data Providers & Products

If any of the following data providers piques your interest for any reason, respond and I'll share additional materials & directly introduce you, if necessary.

New Data Providers

  • Quant Data Bureau
    • Main Data Category: Quantitative Investment Trading Algorithms
    • Brief: Provides curated datasets for quantitative investment research, including risk modeling, company classification, and market volatility insights.
  • Covariance.ai
    • Main Data Category: Alternative Data-Driven Investment Research
    • Brief: Uses advanced machine learning to transform external data into actionable KPIs and insights for enterprises and financial institutions. Their platform provides accurate data analytics and forecasts, helping investors make confident decisions.
  • Rogo
    • Main Data Category: Financial Data-Driven Investment Research
    • Brief: Platform for financial institutions, integrating public and private financial data sources to automate research, analysis, and workflow tasks access to sources including SEC filings and market research.
  • HarmoniQ Insights
    • Main Data Category: Quantitative Investment Trading Algorithms
    • Brief: Offers quantitative analytical tools and processes for fundamental investment analysis, focusing on integrating quantitative methods into investment strategies.
  • Zeki Research
    • Main Data Category: Hiring & Employment
    • Brief: Collects and analyzes talent data across deep-tech sectors, including AI, quantum computing, and life sciences, focusing on over 10M scientists and engineers. The platform evaluates expertise within 40,000+ organizations, providing insights to guide research and decisions.

New or Updated Data Products

M&A + Funding

I had recently been digging into a data request from a client related to some niche commodities and it led me down a rabbit hole of the Fastmarkets "data rollup", as Matt Ober would call it. In June 2023, Fastmarkets demerged from another rollup, Delinian (fka Euromoney) and is now backed by Astorg, a private equity firm. Fastmarkets has numerous subsidiaries, most of which are discrete data providers/products, typically focused on a different niche commodity. In my proprietary data provider profile taxonomy, commodity market data is categorized as Financial Market Data. I've profiled the following data providers owned by Fastmarkets:

  • FastMarkets
    • Brief: Provides live prices, news, market data feeds, and research and analysis reports on base and precious metal markets.
  • Fastmarkets RISI
    • Brief: Provides forest product price reporting and market analytical insights.
  • The Jacobsen Publishing
    • Brief: Specializes in commodity price reporting and forecasting, offering data on animal fats & oils, biodiesel, grain & feed ingredients, hemp, hides & leather, organic & non-GMO products, sausage casings, and vegetable oils
  • Census Commodity Data
    • Brief: Provides independent price assessments and market analysis for global biofuel and related certificate markets.
  • Foex Indexes
    • Brief: Provides audited price indices for pulp, paper, recovered paper, wood-based bioenergy, and timber.
  • HMR (Harwood Market Report)
    • Brief: Provides pricing and market commentary on hardwood lumber and the lumber products industry.
  • Metal Bulletin
    • Brief: Provides leading pricing intelligence, including independent industry benchmarks, for the metal and mining industry.
  • RISI-UMPAPER.com
    • Brief: The leading local pulp and paper market information provider.

Recent News, Blogs & Podcasts

"Cybersyn had the opportunity to return significant capital to our investors based on our financial position and by selling the assets of the public domain business to Snowflake.

These realizations led us to shut down for the purpose of returning maximum capital. In another version of this story, a more reckless (or courageous) founder might have pushed ahead, spending the remaining capital on acquiring the necessary data, even with uncertain prospects for fast revenue growth."

Alternative Data Market Sizing Research

Podcasts

  • The Battlefin Episode by The Alternative Data Podcast
    • BattleFin, one of three main alternative data event series, acquired Exabel, a plug-and-play alternative data backtesting & analytics platform. To me, this merger seems like a natural combination of organizations in need of each other, and it will be interesting to track the pace of data acquisition in Exabel's platform post-acquisition.
  • Episode 303: AggKnowledge’s Dan Entrup by WatersTechnology
    • My favorite parts and why:
      • 25:56 Why Dan doesn’t like the term “alternative data”
        • I also don't like the term, but it's such a well-known umbrella term nowadays that it feels like a necessary-evil.
      • 27:53 Data is still sold, not bought
        • Preach!
Waters Wavelength Podcast · Episode 303: AggKnowledge’s Dan Entrup
  • The Norges Bank (NBIM) Episode by The Alternative Data Podcast
    • Mark Thompson works in Primary Research for NBIM, the Norwegian sovereign wealth fund with $1.7tn (!) under management. The Marks discuss many of the unique challenges and opportunities that come with managing such a large amount of money, and the ways that the relatively lean team at NBIM ingest and use alternative data.

My Alternative Data Events

As evidenced by the first Alternative Data Breakfast I announced above, I'm starting to expand beyond just monthly informal happy hours!

Alt Data Happy Hauer
Credit to Dan Entrup for his creativity with my name
A happy hour with alternative data industry participants.
Alternative Data Happy Hauer Fun

Bonus: Great Book about Boxing History

I just finished reading my father-in-law's awesome book, Sparring with Smokin' Joe, about the life of Joe Frazier and his phenom son, Marvis. It covers several months in the gym, on the road, and in verbal tussles with Joe Frazier in 1980. Along the way, it takes a personal look at his many epic fights, his legendary battles with Muhammad Ali, and the impact of racial and cultural upheavals on the legacies of both fighters.

The book was just released in paperback in February after a highly-praised 2021 hardcover release. Shameless plug: use the promo code 'RLFANDF30' for 30% off when you buy from the publisher at the link above.