View on GitHub Viant Engineering

CacheStore


Downloads

Changelog

Groovy
Shell

Object
Query

Usage
Guide

Home

What is CacheStore?

CacheStore is a key-value hybrid storage system between memory cache and disk. It is high performance, horizontally scalable, and high availability. It was developed to support real time auction and high performance ad serving systems.

Motivation

Latest Version

The latest stable version is CacheStore 1.1.0.

Head over to the Downloads page to grab the latest version.

Features

  1. High performance.
    • Random access in a few milliseconds per record
  2. Three types of deployment.
    • Local (embedded in Java)
    • Remote
    • Cluster (distributed)
  3. Rich API.
    • Single operation (get, put, insert and remove).
    • Bulk operations (multiGet, multiPut and multiRemove).
    • Range scans for sorted stores.
    • Server side cursor support for range scans, key iterators and key-value pairs.
    • Customize stored procedures both in Java and Groovy (for dynamic deployment).
    • Customize before and after triggers for get, put and remove in Java and Groovy.
    • Query support for select and update statement with index search optimization.
  4. Supports sorted-map and hash-map stores with configuration change during system Bootstrap.
  5. Groovy command line shell for query statement and API interaction.
  6. Plug-in replication listener for heterogeneous storage RDBM persistence with support of computing delta between different versions.
  7. Server side serialization
    • Get, put, and scan operations support query parameters to reduce payload between client and server.
  8. Currently supports Java and Groovy.
    • PHP support in beta and other language support in development.
  9. Map Reduce *NEW*
    • Server-side Map Reduce for both single instance and cluster setups.
    • Speeds up operations exponentionally with the number of threads.
  10. New features currently in development.
    • Option to use a schema in bytes as an alternative to serialization.
    • Support for gRPC, a modern RPC framework that can run anywhere.

Architecture

CacheStore is a key-value system; it consists of two components; the in memory map (sorted or hash) and the persistence storage system. During the Bootstrap process, it loads all keys and meta data into the memory map. All values are lazy loaded from disk upon incoming request. It is implemented as a write-through cache and writes are durable. All values can be swapped between storage and memory. Data is replicated across designated nodes to assure reliability and the inclusion of clusters allows for load balancing. Nodes within clusters are homogeneous and interchangeable. The bulk operation, range scan, cursor and query engine are built on top of the core system.

Performance

This is a high performance distributed storage system with replication on multiple nodes that is able to access keys from cached memory. Because of this, performance can be affected by, but not limited to, memory cache size, data size on node, and replication factor.

CacheStore averages 1 millisecond per record for random Read/Write using SSD and 4-5 milliseconds per record for HDD Read/Write. However, under special cases, it can reach even faster speeds. For example, uploading 1 million keys that use a 100,000 records per batch operation and a server side aggregation stored procedure can average 3-4 seconds to complete, or 3-4 microseconds per record.

Memory

CacheStore is unique because it is a hybrid storage system between memory cache and disk. This allows CacheStore to access data cached in memory with extremely high performance. Before the keys are cached into memory, however, it must first be bootstrapped from disk to memory. Once the keys are cached into memory, they can be accessed in nanoseconds. CacheStore uses a write-through cache to write synchronously to the cache as well as the backing store.

Storage

Key-value storage is handled on the server side with all keys being written into the server disk with data correlating to each store. CacheStore is able to store multiple Java key types including Java primitives and objects.

Data for each store is saved into three unique files, extensions .key, .ndx, and .data, that work together to handle the data. The .key file stores information about the keys. The .ndx file links the keys to the values, which are stored in the .data file. As data is written, erased, and rewritten to the .data file, the .ndx file will update accordingly to point to the correct blocks in the .data file. Occasional data purging and compaction are recommended for regular maintenance routine via JMX.

Cluster

Clustering can be used for stores to increase performance and reliability. A multinode cluster can support data replication and load balancing to prevent "hot spots" from occurring.

Replication

Replication is supported by remote and clustered servers and eliminates downtime for fail over procedures.

Query

Full object queries, minus aggregation and joins, are supported on both primitives and objects. More query features are currently being developed.

Meta Data

As mentioned earlier, data is written synchronously to the store with writing to the cache. Data is written to server side stores defined in config/stores.xml with data being represented in the three files taking the store name with extensions .data, .key, and .ndx. Meta deta for objects is represented as java classes and must be included to correctly read and write corresponding objects.

CacheStore Advantages

CacheStore Disadvantages

Comparison with Project Voldemort

Features CacheStore Voldemort
Replication Yes Yes
Storage Built In Plugin
Fail Over Yes Yes
Load Balancing Yes Yes
Operations Simple & Bulk Simple
Query Yes No
Scan Yes No
Cursor Yes No
Stored Procedure Yes No
Trigger Yes No
Ordered Tables Ordered & Unordered Unordered