At Think this week, IBM is radically simplifying the data-for-AI stack.
IBM is previewing the major evolution of watsonx.data, which can help organizations make their data AI-ready and provide an open, hybrid data foundation and enterprise-ready structured and unstructured data management.
The result? Forty percent more accurate AI than conventional RAG, according to testing with IBM watsonx.data.1 Products and features expected to debut in June include:
- Watsonx.data integration, software to orchestrate data access and engineering across diverse integration styles and formats in a single interface, with flexibility and scale at its core
- Watsonx.data intelligence, software to transform how organizations curate, manage, and utilize meaningful data, leveraging the power of AI to simplify data governance
- The addition of watsonx as an API provider within Meta’s Llama Stack, enhancing enterprises’ ability to deploy generative AI at scale and with openness at the core
Watsonx.data integration and watsonx.data intelligence will be available as standalone products, and select capabilities will also be available through watsonx.data—maximizing client choice and modularity.
To complement these products, IBM recently announced its intent to acquire DataStax, which excels at harnessing unstructured data for generative AI. With DataStax, clients can access additional vector search capabilities.
Based on internal testing comparing the answer correctness of AI model outputs using watsonx.data Premium Edition retrieval layer to vector-only RAG on three common use cases with IBM proprietary datasets using the same set of selected open source commodity inferencing, judging and embedding models and additional variables. Results can vary.
The context for this major evolution
Enterprises are facing a major barrier to accurate and performant generative AI— especially agentic AI. But the barrier is not what most business leaders think.
The problem is not inference costs or the elusive “perfect” model. The problem is data.
Organizations need trusted, company-specific data for agentic AI to truly create value—the unstructured data inside emails, documents, presentations, and videos. It is estimated that in 2022, 90% of data generated by enterprises was unstructured, but IBM projects only 1% is accounted for in LLMs.
Unstructured data can be immensely difficult to harness. It is highly distributed and dynamic, locked inside diverse formats, lacks neat labels, and often needs additional context to fully interpret. Conventional Retrieval-Augmented Generation (RAG) is ineffective at extracting its value and cannot properly combine unstructured and structured data.
Meanwhile, a range of disconnected tools can make the data-for-AI stack complex and cumbersome. Enterprises juggle data warehouses, data lakes, and data governance and data integration tools. The data stack can feel as disorienting as the unstructured data it is supposed to manage.
Many organizations are not addressing the root problem. They are focused solely on the generative AI application layer, rather than the essential data layer underneath. Until organizations fix their data foundation, AI agents and other generative AI initiatives will fail to deliver their full potential.
Helping organizations to make their data AI-ready
IBM’s new capabilities will enable organizations to ingest, govern and retrieve unstructured (and structured) data—and from there, scale accurate, performant generative AI. IBM Blog. E. C.