Graph-based methods are playing an increasingly vital role in data retrieval and machine learning, especially in nearest neighbor (NN) searches. NN search is essential for identifying data points closest to a given query, particularly with high-dimensional data like text, images, or audio. With the inefficiency of exact searches in high-dimensional spaces, approximate nearest neighbor (ANN) methods have become crucial, especially graph-based approaches that balance response time and accuracy. These methods are widely used in recommendation engines, e-commerce platforms, and AI-based search systems.
One of the main challenges in NN search involves combining vector-based search with additional numeric attribute constraints. For example, a user on an e-commerce platform may want to find products similar to a specific item within a certain price range. Traditional ANN methods either filter out irrelevant data before the search or perform the search without considering constraints and filter afterward – both facing performance issues as pre-filtering can be inefficient for large datasets and post-filtering may return irrelevant results.
The need for efficient search techniques that incorporate vector similarity and numeric constraints has become increasingly important across various industries dealing with massive amounts of data.
Existing approaches have included pre-filtering and post-filtering for range-filtering approximate nearest neighbor (RFANN) queries where numeric constraints are applied before or after an ANN search. Another method is called in-filtering which aims to integrate these numeric constraints during the search itself but these methods struggle to provide optimal performance across different query scenarios.
Researchers from Nanyang Technological University and Aalborg University have introduced a new method called iRangeGraph to address these limitations. This technique only materializes elemental graphs for a few ranges instead of precomputing graphs for every possible numeric range which reduces memory consumption while maintaining high query performance making it attractive for companies with large datasets such as Apple and Alibaba who utilize similar methods for their large-scale search systems.
iRangeGraph’s dynamic construction of graph-based indexes during query processing conserves memory and ensures efficient query response times - particularly useful when handling multi-attribute RFANN queries involving more than one numeric constraint such as specific price ranges combined with other attributes like date ranges.
Performance testing showed that iRangeGraph outperformed existing methods significantly achieving 2x to 5x better query-per-second (qps) performance at 0.9 recall compared to competitors while consuming less memory especially noticeable on real-world datasets including WIT-Image, TripClick, Redcaps, and YouTube datasets involving high-dimensional vector data and various numerical attributes like image size or publication date.
iRangeGraph presents an efficient solution addressing shortcomings of existing RFANN techniques by dynamically constructing graph indexes during querying execution using elemental graphs reducing memory requirements making it ideal choice for large-scale data systems emphasizing the potential revolutionizing capabilities when managing high-dimensional data with numerical constraints.