The Changing Landscape of Data in the AI Era

Data has always been the lifeblood of business. In the era of artificial intelligence, it is all the more important. The quality and quantity of data directly determines the performance of AI models and the insights they generate. As AI becomes more integrated across industries, data pipelines and architectures are rapidly evolving to meet these new challenges. In this article, we’ll delve into how data companies are adapting and the opportunities that lie ahead.

The Primacy of Data in AI

In the past, there’s been a great focus on the quality and complexity of the application itself. Today, the spotlight has shifted to the AI model and its underlying data. With techniques like fine-tuning, the same augmented dataset can be used to power multiple applications. This change has highlighted the vital importance of data in an AI-driven landscape.

The stakes are also higher now. In domains like healthcare and legal, where AI is being applied to critical decisions, high-quality data is non-negotiable. The human experts who provide feedback and label data need to be highly skilled. This underscores a shift in focus from the quantity, to the quality of data.

Why Data Quality Matters

So why exactly is high-quality data so crucial for AI?

Fine-tuning models entirely depends on the data used. Higher quality data also enables the use of smaller, more efficient models when 1) there are fewer errors in the data, 2) fewer features are needed to explain underlying patterns, and 3) overfitting to noise is less likely. This leads to improved performance, faster training times, and significant savings in compute costs. 

As a result, data scientists and engineers are becoming the backbone of any AI-powered organization. Their skills in collecting, preparing, and managing high-quality data are indispensable. Additionally, the work in data vectorization is transforming how we interact with unstructured data. By converting PDFs, images, audio, or video into embeddings, we can ask more nuanced questions and find relevant information faster. Vector databases, while still evolving, will play a central role in this new paradigm.

The Evolution of Data Pipelines

So how are data workflows and pipelines changing in response to these new realities? Here are some of the key trends we’re seeing:

1. New Data Types & Modalities: Unstructured data like text, images, audio, and video are gaining prominence. New modalities are emerging, powered by techniques like embeddings and vector search.

2. Automation & Augmentation: Data prep and analysis are becoming increasingly automated, with tools like co-pilot assistants and auto-generated code. This is freeing up data scientists to focus on higher-value tasks. An example of a company in this domain is Keebo, which automatically optimizes Snowflake parameters and queries to save engineers time and money. Another, Typedef, backed by Pear, is working to monitor DAGs within a multi-step datapipeline to develop auto-tuning and optimizations that enhance the efficiency of pipeline execution.

3. Scalable Data Infrastructure: Data infrastructure is becoming faster and more efficient to handle the demands of AI. Vector databases are enabling fast retrieval and inference on massive datasets. “RAG-as-a-Service” is emerging to connect an organization’s proprietary data with large language models. For example, EdgeDB is an open source database that enhances PostgreSQL with hierarchical queries that are more efficient in handling AI applications with tree-like structures. 

4. Collaborative Data Science: “Notebooks 2.0” — in addition to advanced platforms such as Jupyter Notebook, Databricks, and Tableau — are enabling more collaborative features and tools for accessible data science. Techniques like text-to-SQL and semantic analytics are democratizing data exploration.

5. Data Quality & Labeling: With the primacy of data quality, companies are investing heavily in data labeling and annotation. A whole ecosystem of services is emerging to provide high-quality, human-in-the-loop data labeling at scale. Synthetic data generation using AI is also being used to augment datasets. An notable example is Osmos, backed by Pear, automatically catching errors in data and removing the need for manual data cleanup.

6. Feature Stores & ML Ops: Dedicated feature stores are becoming critical to serve up-to-date features across AI models. ML Ops platforms are being adopted to manage the full lifecycle from data prep to model deployment. Versioning, metadata management, and reproducibility are key focus areas. Examples from the many existing ML Ops platforms include Vertex AI, DataRobot, and Valohai.

7. Real-time & Streaming Data: As AI is applied to real-time use cases like fraud detection and recommendations, streaming data pipelines using tools like Kafka and Flink are gaining adoption. Online machine learning is enabling models to continuously learn from new data. A prominent example is Aklivity, a company that makes a business’s real-time data streams connected to Kafka available through APIs.

8. Governance & Privacy: With AI models becoming more complex and opaque, there is a heightened focus on responsible AI governance. Tools for data lineage, bias detection, and explainable AI are being developed. Techniques like federated learning and encrypted computation are enabling privacy-preserving AI.

The Opportunity Ahead

The rapid evolution of data handling in the AI era presents a massive opportunity for data and analytics companies. Organizations will need expert guidance and cutting-edge tools to harness the full power of their data assets for AI.

From automated data prep and quality assurance, to scalable vector databases and real-time feature stores, to secure collaboration and governance frameworks — there are opportunities at every layer of the modern AI data stack. By combining deep domain expertise with AI-native architectures, data companies can position themselves for outsized impact and growth in the years ahead.

As AI continues to advance and permeate every industry, the companies that can enable high-quality, responsible, and scalable data pipelines will be the picks and shovels of this new gold rush. The future belongs to those who can tame the data beast and unleash the full potential of AI. Are you building in this space? Let’s talk.

Acknowledgements

I’d like to thank Avika Patel and Libby Meshorer for their contributions to this post. Visit our AI page to read more about the 16 spaces we’re excited about.

Navigating Security: Opportunities and Challenges in the AI Era

The new generation of AI poses both huge opportunities and risks. While AI can open up a world of new capabilities, it also presents new security concerns, that require our focus at three levels:

  1. LLMs Reliability
  2. Security risks posed by GenAI
  3. AI-powered security solutions

In this article, we will explore the new reality with AI in each of these areas.

LLMs Reliability

LLMs have demonstrated remarkable capabilities in natural language processing, but their reliability remains a concern. In 2023, researchers from Stanford University discovered that GPT-4 could generate highly persuasive disinformation articles that were difficult to distinguish from real news, highlighting ongoing reliability challenges with state-of-the-art language models. We see a growing number of companies addressing issues like biased outputs, hallucinations, and the potential for generating harmful content, through improved AI infrastructure like RAG and mechanisms to test and validate LLMs.

There are a few different methods enabling to test and promise the reliability of the LLMs in our usage, among them:

1. Red teaming: Actively trying to find ways to make the model produce undesirable outputs, to identify weaknesses. Companies like Anthropic, Halcyon, and Adept AI are using red teaming in their AI development processes. Startups like Haize Labs, Robust Intelligence, and Scale AI have products helping provide solutions to handle Red Teaming.

2. Oversight sampling: Regularly sampling outputs and having them reviewed by human raters for quality and safety issues. Startups like Fiddler AI provide solutions with humans in the loop to check for quality issues

3. Runtime monitoring: Analyzing model inputs and outputs in real-time to detect potential reliability issues. Guardrails AI, Galileo and TrueEra are building infrastructure for runtime monitoring of LLMs in production.

    Security risks posed by GenAI

    Generative AI introduces new security challenges. For example, deepfakes can produce highly realistic fake content, potentially leading to misinformation and fraud, and cybercriminals are leveraging tools like Midjourney and Stable Diffusion to generate synthetic media for social engineering attacks. Additionally, GenAI systems are especially vulnerable to unique threats:

    • Prompt injection attacks attempt to craft inputs that cause the model to ignore instructions and do something else, like disclosing sensitive data. In 2023, prompt injection was used to get GPT-4 to reveal training data.
    • Jailbreaking aiming to bypassing safeguards and performing unintended actions, like creating harmful outputs or giving illegal instructions.
    • Model integrity erosion happening when an AI system’s performance deteriorates over time due to adversarial or unforeseen inputs, corrupting the effectiveness of of AI driven security measurements.

    Companies like Flow Security (now CrowdStrike), Sentra, Protect.ai and HiddenLayers are developing solutions to protect data and models from unauthorized access and malicious activity. Cohere, Anthropic, OpenAI, Adept and others are exploring new AI architectures that are more resistant to prompt attacks and jailbreaking attempts.

    AI powered security solutions

    Alongside these risks, AI offers an outstanding opportunity to address security challenges like never before. AI-driven tools can enable high-quality observability, accurate detection, clear prioritization, and accelerated mitigation. Overall, AI can transform the way we handle and mitigate security risks today. Here are a few areas with significant potential for improvement in the new era of AI:

    1. Anomaly and Threat Detection: LLMs are designed to analyze large amounts of data and identify anomalies more efficiently than humans. This enables the creation of better alert systems that detect fraud and security threats effectively and in real-time. For example, Noname uses AI to identify data leakage, suspicious behavior, and API security attacks, as they happen. Redcoat AI and Abnormal Security identify phishing attempts and malicious email activity.

    2. Penetration Testing: AI-powered tools can be used not only to test the reliability of LLMs, as demonstrated by companies like Adept and Haize Labs, but also to perform intensive and sophisticated penetration testing on systems to identify vulnerabilities, as offered by XBOW. AI-driven simulations of cyber-attacks on networks and systems can test their resilience and train cybersecurity professionals in incident handling, regularly improving security layers.

    3. Code as language: While GenAI-generated code can raise concerns among tech leaders due to potential vulnerabilities and logical flaws, LLMs can read code as if it were natural language, enabling the identification of problematic code blocks and configurations that may lead to security breaches. AI-powered tools and security-oriented LLMs like Snyk DeepCode and Codacy embody the ‘shift left’ philosophy, focusing on identifying and resolving security issues early in the development lifecycle rather than addressing them post-deployment.

    4. Vulnerability Management and prioritization: AI can be highly effective in assisting engineers with intelligent security vulnerability management and prioritization. By creating a unified source of truth for existing security vulnerabilities and analyzing factors such as severity and potential impact, platforms like Wiz and Balbix offer advanced vulnerability management and prioritization, resulting in decreased engineers confusion and response time.

    5. Incident Response and auto mitigation: AI can significantly enhance incident response and automated mitigation, like applying security patches and updates to vulnerable software components in real-time, reducing the time required to contain and resolve security breaches. Solutions like Palo Alto’s Cortex XSOAR, also leverage AI to speed up incident investigation, automate and expedite tedious, manual SOC work, towards the vision of mitigating risks with minimal human intervention.

      While the breakthroughs in AI present exciting opportunities, it is crucial to address the risks related to AI models and security. By focusing on the reliability of LLMs, understanding the new threats posed by GenAI, and leveraging AI to enhance security measures, we can navigate this new era of technology safely. Are you building in this space? Let’s talk.

      Acknowledgements: I would like to thank Pear AI Fellow Libby Meshorer for significant contributions to this post, as well as Avika Patel and Pear team members Lucy Lee Duckworth, Arash Afrakhteh, and Jill Puente for contributing.

      Human Simulating AI Agents are Closest Approach to AGI, Unlocking Value in our Everyday Lives

      AI Agents: Turning Imagination into Reality

      After sharing our GenAI thesis and the 16 fields we are particularly excited about in AI, we’re delving into one of the most interesting trends in the era of capabilities unlocked by GenAI: AI agents, and, specifically, human simulating agents.

      AI agents are software entities that can perceive their environment, make decisions, and take actions independently without human supervision. They are the closest approximation we have today to the vision of Artificial General Intelligence (AGI), replicating a broad range of human cognitive abilities, including perception, reasoning, planning, learning, and adapting to new situations without dedicated preparation.

      AI Agents Add Value Across the Board

      The AI agents space can be divided into several key subspaces and categories, some of which we have already started investing in. One agent can have multiple overlapping functionalities and several interfaces simultaneously:

      AI Agents with Specific Functionalities

      • Human-simulating agents: These agents simulate human behavior and thoughts based on a given profile or need. These transformative approaches can be applied to various use cases, from companions that fight loneliness, to agents with demographic traits, beliefs, and preferences that help predict trends like election results or consumer adoption more quickly, cheaply, and accurately. See more in our section on Human Simulating Agents below.
      • Assistant agents: These agents can handle a wide range of tasks, from running an online search, or playing a song upon voice request (e.g. Siri and Alexa), to scheduling a doctor’s appointment and maintaining a detailed to-do list, as Ohai.ai and Martin do.
      • Automation agents: These agents connect two machines, identify gaps and repetitive processes, and create automated workflows to enhance them. For example, Orby AI, backed by Pear, automates workflows for enterprises in minutes, improving team efficiency by over 60%.
      • General purpose agents: While very broad and challenging to build effectively, these agents can complete multiple tasks across different verticals and complexity levels. Key players in this space include BabyAGI, AgentGPT and Personal AI, which offer solutions for various problems through their agents.
      • Vertical agents: Highly skilled in specific fields such as healthcare, marketing, gaming, legal, and more, these agents excel in their respective areas. A promising code generation agent and engineering co-pilot is Devin, built by Cognition AI, which enables engineers to plan and execute complex engineering tasks.
      • Embodied agents: Embodied agents operate on edge devices such as IoT devices, robots, and drones, as navan.ai does. They enable smarter and more independent decision-making and actions, aiming to improve smart home systems, agricultural practices, defense, and more.
      • Collaborative agents: These agents can interact effectively with other agents, learn from each other, and cooperate to improve their performance and execution over time. Relevance AI is building such solutions to increase team productivity.

      Agents with Defined Interfaces

      • Human Facing Agents: interact with humans through text, audio, video, etc., just like Google Astra does to help people navigate their surroundings.
      • System Facing Agents: Interact with machines and systems through APIs, scripting, and data.
      • Physical world facing agents: interact with the physical world, learning and interacting with it through robots, drones, and automotive, as Tesla autopilot and Weymo.

      Agents Infrastructure

      Agents infrastructure will include ops layers that encompass memory, compute and data infrastructure. Additionally, new agent management platforms will emerge, enabling building, orchestration, observability, and monitoring. Companies like Dust and AgentOps aim to offer these solutions to help agents and their operators achieve their full potential.

      Human Stimulating Agents: Revolutionizing Customer Support, Trend Predictions, and More.

      Human simulating agents allow us to leverage AI to learn from and predict human behavior in different scenarios. There are two broad categories of Human Simulating Agents:

      Support agents

      • Assistants: As mentioned above, AI assistants can simulate human behavior while supporting us. For example, they can order and adjust our grocery lists based on our historical needs and preferences, declutter our email inboxes, and respond in our unique writing styles. They will interact with us in natural language, as if we were asking for help from a close friend, helping us manage our busy lifestyles.
      • Customer support: AI agents in the customer support space will become more sophisticated, providing relevant and satisfying support that saves man-hours, reduces customer frustration, and cuts costs for customer-facing companies. Sierra, Crescendo and yellow.ai are already on a mission to transform the customer support experience using AI agents.

      Human persona agents

      • Trend predictions: Agents will simulate people with specific demographic contexts, traits, and perspectives to predict the adoption of new consumer products, as Keplar and subconscious.ai do, create the most effective personalized marketing content, predict election results, and more. 
      • Companions: AI agents can become companions with personality and relationship history, remembering key moments in our lives, fighting loneliness, and offering initial mental health support if needed. For example, Replika and Kindroid provide engaging relationships with customizable AI companions that users can interact with through text and voice.

      Building in this space?

      If you are building in the agents space, reach out to us (Arash or Arpan). We would love to discuss your vision and explore how we can support your journey.

      Acknowledgements: I’d like to thank Pear AI Fellow Libby Meshorer for significant contributions to this post as well as Pear team members Arpan Shah, Jill Puente, and Lucy Lee Duckworth for contributing.

      Orby AI closes $30M Series A to continue building AI Agents for the Enterprise

      Orby AI recently announced its $30 million Series A round, co-led by NEA, Wing Venture Capital, and WndrCo, with participation from Pear VC. We are proud to be Orby’s earliest investors (when a LinkedIn message from their CEO first connected us) and we are thrilled to continue our support now.

      Orby’s Enterprise AI Automation tool automates complex workflows by observing users at work, identifying repetitive tasks, and writing the code to automate those tasks. Within minutes, a custom automation is ready to be implemented with user approval. 

      This is game changing.

      Orby AI is Changing the Game by Disrupting Process Automation Market

      Co-founder and CEO Bella Liu was heading AI Product at UI Path, a leading business automation software company, when she was first inspired with the idea behind Orby. At the time, the RPA (robotics process automation) software being used relied on human users to input specific “if this, then that” rules, which turned out to be rather fragile. For example, a user who frequently opens invoices and transfers numbers to a spreadsheet must specify exactly which buttons to click and where on the screen those buttons will be— a system that is prone to error, slow, and hard to scale. 

      Orby AI’s Model Learns and Implements Without User Input

      Orby’s approach to business automation is a huge leap forward. Unlike traditional RPA models, Orby’s LAM (Large Action Model) approach means their product doesn’t need to be told which tasks to automate, or how. Orby simply observes a user at work, learns what could be automated, and creates the actions to implement it. The user just approves the process and can correct the model at any time, thus continuously helping Orby improve.

      Why We Chose Orby AI

      We’re very excited about Orby’s team. Co-founders Bella Liu and Will Lu bring deep experience and expertise in the AI and automation technology space. Bella (CEO) was previously the AI product leader at UiPath, from early-stage to post-IPO. Will (CTO) was previously the data platform leader at Google Cloud AI and was involved in three AI products with real world deployments within Google. Orby’s team was a great founder-market match for Pear’s thesis on AI automation for human to machine and machine to machine automation. We are pleased to have backed Orby early on, and remain certain they are the right group to work on this problem.

      What Orby AI Does

      Before partnering with Orby, the Pear team was already deeply interested in AI automation for enterprise applications, aiming to solve specific problems within distinct industries one vertical at a time. They believed that the kinds of AI tools which could understand specific use cases, gather necessary datasets, and execute targeted solutions were the future. Additionally, they had a thesis that semantic understanding of workflows, enabled by backend interaction data, could enhance the generalizability of RPA.

      Orby’s team embraced a similar approach but expanded it to build a horizontal enterprise AI automation platform applicable across many verticals. Initially focusing on widely used workflows like invoice processing and expense auditing, they aimed to enhance their action-based foundational models. This led to the creation of a platform that delivers immediate value in various enterprise scenarios while achieving general-purpose AI automation.

      Orby is pioneering a Generative Process Automation (GPA) platform, leveraging the industry’s first Large Action Model (LAM) for enterprise use. This platform enhances efficiency by enabling teams to automate complex tasks independently. Orby’s multimodal large action model, combined with an AI agent capable of symbolic reasoning and neural network analysis, seamlessly handles intricate automation requests.

      When tasked with an assignment, Orby’s AI autonomously generates workflows, integrating with specialized AI agents for sub-tasks such as data analysis or customer interaction. By learning and automating workflows contextually and semantically, Orby surpasses traditional RPA systems. The LAMs empower Orby’s AI to understand and automate repetitive processes across unstructured datasets, emulating human capabilities.

      This neuro-symbolic programming captures standard process flows and ensures robust exception handling, making AI-driven automation accessible and efficient for enterprises. Orby’s patented technology, which combines LAMs with advanced programming techniques, empowers workers to automate tasks without needing technical assistance. The system continuously learns and adapts, improving productivity and efficiency over time.

      Market Opportunity

      The market potential for automation in enterprises has been evidenced by the success of Robotic Process Automation (RPA). However, AI Process Automation, like that offered by Orby, goes beyond traditional RPA by making previously uneconomical use cases viable. The return on investment (ROI) for RPA is often hindered by high implementation and maintenance costs, limiting its applicability.

      Orby’s innovative approach addresses two critical challenges of RPA:

      1. Semantic understanding of automatable workflows versus fragile rule-based systems.

      2. Hands-off, continuous online learning and improvement of both workflow discovery and implementation.

      By discovering automatable repetitive workflows and generating maintenance-free AI automations, Orby significantly reduces implementation and maintenance costs. This makes a much larger share of repeatable workflows candidates for automation, substantially improving ROI and expanding an already large market.

      This advancement is not merely an efficiency gain for high-volume, repeatable workflows. Imagine an AI capable of automating any workflow, regardless of volume, simply by demonstrating the process. This capability would enable enterprises to innovate their workflows at an accelerated pace, shifting focus to strategic improvements. In this competitive landscape, no enterprise can afford to ignore such technology, as those who adopt it will innovate faster.

      To provide a baseline, a 2017 McKinsey Future of Work report estimates that 60% of jobs involve at least 30% repetitive tasks that can be automated. Orby has already demonstrated massive productivity gains in several Fortune 500 companies through successful use cases. This is just the beginning; the market opportunity is far greater.

      How we partnered together

      After funding Orby’s seed round in July 2022, in addition to our close partnership on product and vision, we leveraged the full Pear team to partner with them in the following two years. Ana Leyva and Pepe Agell worked with the Orby team on their product-market fit and GTM strategy, and Jill Puente helped them in marketing and PR, including landing the Business Insider piece announcing the initial seed round. Nate Hirsch from Pear’s talent team helped Orby hire eight out of their first ten team members (eight engineers, two designers, and a recruiter). When it was time for Orby’s Series A raise, the company went through Pear’s fundraising bootcamp with Mar’s full support behind them. As we say to founders at Pear: if you get one of us, you get all of us as partners.

      We’re excited to have been supporting Orby AI since day one and look forward to their promising journey ahead!

      Arash and Bella at our AGM summit in March 2024.