What is CacheStore?
CacheStore is a key-value hybrid storage system between memory cache and disk. It is high performance, horizontally scalable, and high availability. It was developed to support real time auction and high performance ad serving systems.
Motivation
- Originally, it was contributed as a storage plug-in for Project Voldemort, but we decided to start it as a stand alone distributed system
- Better "Fail-Safe" mechanism in a distributed system without compromising performance
- Utilize Memory and Disk
- Server side replication
- Ability to store data as Objects or Primitives
Latest Version
The latest stable version is CacheStore 1.1.0.
Head over to the Downloads page to grab the latest version.
Features
- High performance.
- Random access in a few milliseconds per record
- Three types of deployment.
- Local (embedded in Java)
- Remote
- Cluster (distributed)
- Rich API.
- Single operation (get, put, insert and remove).
- Bulk operations (multiGet, multiPut and multiRemove).
- Range scans for sorted stores.
- Server side cursor support for range scans, key iterators and key-value pairs.
- Customize stored procedures both in Java and Groovy (for dynamic deployment).
- Customize before and after triggers for get, put and remove in Java and Groovy.
- Query support for select and update statement with index search optimization.
- Supports sorted-map and hash-map stores with configuration change during system Bootstrap.
- Groovy command line shell for query statement and API interaction.
- Plug-in replication listener for heterogeneous storage RDBM persistence with support of computing delta between different versions.
- Server side serialization
- Get, put, and scan operations support query parameters to reduce payload between client and server.
- Currently supports Java and Groovy.
- PHP support in beta and other language support in development.
- Map Reduce *NEW*
- Server-side Map Reduce for both single instance and cluster setups.
- Speeds up operations exponentionally with the number of threads.
- New features currently in development.
- Option to use a schema in bytes as an alternative to serialization.
- Support for gRPC, a modern RPC framework that can run anywhere.
Architecture
CacheStore is a key-value system; it consists of two components; the in memory map (sorted or hash) and the persistence storage system. During the Bootstrap process, it loads all keys and meta data into the memory map. All values are lazy loaded from disk upon incoming request. It is implemented as a write-through cache and writes are durable. All values can be swapped between storage and memory. Data is replicated across designated nodes to assure reliability and the inclusion of clusters allows for load balancing. Nodes within clusters are homogeneous and interchangeable. The bulk operation, range scan, cursor and query engine are built on top of the core system.
Performance
This is a high performance distributed storage system with replication on multiple nodes that is able to access keys from cached memory. Because of this, performance can be affected by, but not limited to, memory cache size, data size on node, and replication factor.
CacheStore averages 1 millisecond per record for random Read/Write using SSD and 4-5 milliseconds per record for HDD Read/Write. However, under special cases, it can reach even faster speeds. For example, uploading 1 million keys that use a 100,000 records per batch operation and a server side aggregation stored procedure can average 3-4 seconds to complete, or 3-4 microseconds per record.
Memory
CacheStore is unique because it is a hybrid storage system between memory cache and disk. This allows CacheStore to access data cached in memory with extremely high performance. Before the keys are cached into memory, however, it must first be bootstrapped from disk to memory. Once the keys are cached into memory, they can be accessed in nanoseconds. CacheStore uses a write-through cache to write synchronously to the cache as well as the backing store.
Storage
Key-value storage is handled on the server side with all keys being written into the server disk with data correlating to each store. CacheStore is able to store multiple Java key types including Java primitives and objects.
Data for each store is saved into three unique files, extensions .key, .ndx, and .data, that work together to handle the data. The .key file stores information about the keys. The .ndx file links the keys to the values, which are stored in the .data file. As data is written, erased, and rewritten to the .data file, the .ndx file will update accordingly to point to the correct blocks in the .data file. Occasional data purging and compaction are recommended for regular maintenance routine via JMX.
Cluster
Clustering can be used for stores to increase performance and reliability. A multinode cluster can support data replication and load balancing to prevent "hot spots" from occurring.
Replication
Replication is supported by remote and clustered servers and eliminates downtime for fail over procedures.
Query
Full object queries, minus aggregation and joins, are supported on both primitives and objects. More query features are currently being developed.
Meta Data
As mentioned earlier, data is written synchronously to the store with writing to the cache. Data is written to server side stores defined in config/stores.xml with data being represented in the three files taking the store name with extensions .data, .key, and .ndx. Meta deta for objects is represented as java classes and must be included to correctly read and write corresponding objects.
CacheStore Advantages
- Hybrid of cache and disk
- Access speeds of cache storage systems
- Reliability, storage, and features of disk storage systems
- Server side replication
- No fail over downtime
- Continues to replicate and synchronize if there's failure
- Load balancing
- Keys are evenly distributed among nodes
- Prevents hot spots from occurring
- Access speeds in nanoseconds
- Built in caching system allows extremely high performance and access speeds
- Special functions
- Full object queries
- Scans
- Cursors
- Store procedures
- Triggers
- Bulk operations
- Map reducer
CacheStore Disadvantages
- Constrained by memory as trade-off for high performance
- All keys must be loaded by Bootstrap
- Amount of keys limited by memory size
Comparison with Project Voldemort
Features | CacheStore | Voldemort |
---|---|---|
Replication | Yes | Yes |
Storage | Built In | Plugin |
Fail Over | Yes | Yes |
Load Balancing | Yes | Yes |
Operations | Simple & Bulk | Simple |
Query | Yes | No |
Scan | Yes | No |
Cursor | Yes | No |
Stored Procedure | Yes | No |
Trigger | Yes | No |
Ordered Tables | Ordered & Unordered | Unordered |