Back to Top
View on GitHub Viant Engineering

CacheStore Usage Guide

A Practical Guide and Code Usage


Downloads

Changelog

Groovy
Shell

Object
Query

Usage
Guide

Home

Table of Contents

  1. Introduction
  2. Server - Remotes, Clusters, and Stores
    1. Start Remote
    2. Start Cluster
    3. Triggers
    4. Stored Procedures
    5. Queries
    6. Map Reduce
  3. Client - Connection and Functions
    1. Creating a Remote Client
    2. Creating a Scan Client
    3. Creating a Cluster Client
    4. Client Functions
      1. Get/Put
      2. Remove
      3. Bulk Operations
      4. Object Query
        1. Object File
      5. Scan
      6. Cursor
      7. Stored Procedures
    5. Key Sequencing
    6. Serialization
      1. Hessian Serialization
    7. Map Reduce
  4. Store and Cluster Config Files
    1. Store Config
      1. The stores.xml File
    2. Cluster Config
      1. The clusters.xml File
      2. The stores.xml File
      3. The node.properties File
  5. JMX Monitoring
  6. Maven
    1. Maven Depdendencies

1. Introduction

CacheStore contains a robust set of features implemented in Java that can be utilized to quickly access, manipulate, and modify billions of record sets contained in "stores". CacheStore will read mainly from XML config files to determine how to configure, read, and write store data.

This manual will take a look into CacheStore's Java API, XML config files, and general usage. Take note that the Java API can be called either through Java applications or through the CacheStore Groovy Shell

If you have questions about Object Query syntax, take a look at the Object Query Guide for reference.

Also, in our examples, we will assume a few things:

2. Server - Remotes, Clusters, and Stores

CacheStore can be deployed in three ways: locally, remotely, and on a cluster. The data within CacheStore is written and saved into what we call "stores". CacheStore servers can be started from either the provided startup scripts, custom startup scripts, or through JVM. Stores must be configured before servers can be started. For more information on configuration, click here.

When deciding whether to deploy CacheStore as a remote or cluster server, keep in mind the size of your data. The larger data size you have, the more likely you would want to use a cluster deployment over remote deployment.

2.1. Start Remote

To start a Remote Store, make a call to RemoteScan4ReplicaServer.java's main method: RemoteScan4ReplicaServer.main(args)

Take note that the argument opts are {"‐configPath", "‐start" } and are defaulted to {"./config", "true"}.

In addition, you must set the CLASSPATH and Java Memory settings. Because of this, it is HIGHLY recommended to start a server through the startup scripts provided in the Remote Deployment version of CacheStore, found Here.

2.2. Start Cluster

To start a Cluster, make a call to ClusterServer.java's main method: ClusterServer.main(args)

Take note that the argument opts are {"‐host","‐configPath","‐dataPath", "‐port", "replicaPort", "‐start", "‐freq"} and are defaulted to {"","",".", "0", "0", "true", "0"}.

In addition, you must set the CLASSPATH and Java Memory settings. Because of this, it is HIGHLY recommended to start a server through the startup scripts provided in the Cluster Deployment version of CacheStore, found Here.

Each node in the cluster must individually start CacheStore and have the correct config files set up accordingly. Please refer to the Cluster Config Section below and to the Cluster User Guide for more information regarding node setup.

2.3. Triggers

Get, Put and Delete Triggers are implemented on CacheStore servers. Get Triggers occur when doing a "get" operations, Put Triggers occur when doing a "put" operation, and Delete Triggers occur when doing a "remove" operation. Get and Put Triggers have a before operation and after operation function while Delete Triggers only have a before operation.

Triggers are implemented through interfaces GetTrigger.java and PutTrigger.java and their functions must be overridden.

Get Trigger:

public class GetTrigger implements com.sm.store.GetTrigger {
    @Override
    public boolean beforeGet(Key key, CacheStore store) {
        System.out.println("Before get");
        return false;
        }
    @Override
        public Value afterGet(Key key, Value value, CacheStore store) {
        System.out.println("After get");
        return value;
    }
}

Put Trigger:

public class PutTrigger implements com.sm.store.PutTrigger {
    @Override
    public boolean beforePut(Key key, Value value, CacheStore store) {
        System.out.println("Before put");
        return false;
    }
    @Override
    public Value afterPut(Key key, Value value, CacheStore store) {
        System.out.println("After put");
        return value;
    }
}

Delete Trigger:

public class DeleteTrigger implements com.sm.store.DeleteTrigger {
    @Override
    public void beforeDelete(Key key, CacheStore store) {
        System.out.println("Before Delete");
    }
}

Once a trigger is created, it must be linked on startup (such as by placing it into lib or dist folder). Also, since triggers are store-specific, each store that implements a trigger must define it in its stores.xml file.

XML tags in stores.xml:

<getTrigger>com.mycompany.app.GetTrigger</getTrigger>
<putTrigger>com.mycompany.app.PutTrigger</putTrigger>
<deleteTrigger>com.mycompany.app.DeleteTrigger</deleteTrigger>

2.4. Stored Procedures

Stored procedures are subroutines stored on servers that can be called by clients. For more information regarding stored procedures in CacheStore, refer to the Stored Procedures section under the Client section below.

2.5. Queries

Queries are filtered searches that, like stored procedures, are initiated on the client side, but handled on the servers. More information can be found in the Object Query section under the Client section below.

2.6. Map Reduce

CacheStore servers all have the ability to use the built in Map Reducer. The Map Reduce is implemented through multithreading on each server. A single server will distribute work among the specified thread count. A cluster of servers will already split work among the different servers in the cluster, with each server in the cluster splitting its own work among its different threads specified by the user. After the Map has finished, the Reduce is executed and the results can be combined.

The execution of the server-side Map Reduce is similar to how Stored Procedures are created and executed. Please refer to the Client-side Map Reduce section for more information.

In order to run Map Reduce, you must have the Map Reduce procedure and model object on both the client and the server. In addition, you must have a scan.xml in your config path that contains the stores that defines stores accessible to Map Reduce. The contents of this scan.xml file is defined in the same way that stores.xml is and should be a subset of stores.xml. Map Reduce WILL NOT WORK without scan.xml defined.

3. Client - Connection and Functions

To interact with the CacheStore servers, there must be a client to connect with the servers. Once a client is connected, it is free to do different operations and interactions with the server. Remember to close your client after use by calling the close() from the client.

3.1. Creating a Remote Client

The remote client is the basic client used to connect to a remote store. It is a lightweight client that has a simple implementation that can be easily customized and adjusted for other programming languages.

RemoteClientImpl client = new NTRemoteClientImpl(HOST_CONNECT_URL, null, STORENAME);

3.2. Creating a Scan Client

The scan client is a superset of the remote client as it extends the remote client and is dependent on more libraries, making it more difficult to customize and adjust for other languages, but making it have more features and be more powerful than the basic remote client.

ScanClientImpl client = new GZScanClientImpl(HOST_CONNECT_URL, null, STORENAME);

3.3. Creating a Cluster Client

The cluster client connects to a store within a cluster. Because clusters work together and concurrently, it doesn't matter which node within the cluster you connect to as they will access the same store.

ccf = ClusterClientFactory.connect(HOST_CONNECT_URL, STORENAME);
client = ccf.getDefaultStore();

3.4. Client Functions

Each type of CacheStore remote and cluster clients will generally have the same basic functions, such as "get" and "put", but some of the more advanced functions are only available or specifically tailored in certain types of clients. Because scan clients are a superset of Remote clients, they will have all the functions of remote clients and more.

3.4.1. Get/Put

The most basic of operations in a key-value storage system are the get and put commands. CacheStore is capable of storing object version and node information alongside object data information.

A simple get/put example:

Key<Integer> k = Key.createKey(1);
ByteArray FooBar = new ByteArray("FooBar".getBytes());
client.put(k, FooBar);
Value<ByteArray> value = client.get(k);

The Key and ByteArray are placed into the store by the client with "put" and is then retrieved as a Value with "get".

3.4.2. Remove

Remove is used to delete an existing key-value pair by key:

client.remove(k);

3.4.3. Bulk Operations

CacheStore supports a bulk (also known as multiOperation) system that allows users to easily and quickly get/put/remove keys and values. Keep in mind that multiOperations take in a query String as one of the parameters, but this parameter can also be null. Query String syntax can be found at: Object Query Guide.

The following is an example of multiGets, multiPuts, and multiRemoves:

List<KeyValue> mGet = client.multiGets(keyList);
client.multiRemoves(keyList);
client.multiPuts(mGet);

3.4.4. Object Query

Although queries are handled by the CacheStore server, they are initiated by CacheStore clients. CacheStore supports full Object Queries minus aggregation and joins. Please refer to the Object Query Guide for an object query overview and query statement syntax.

There are two main query functions, query and query4Json. Query returns a List of key value pairs while query4Json returns a List of Strings or a String that can be converted into JSON objects.

Remote/Scan:

List<KeyValue> queryResults = client.query("select name, age, height from Person where key# = 100");
String query4JResults = client.query4Json("select name, age from Student");

Cluster:

List<KeyValue> queryResults = client.query("select grade from Class where key# = 100");
List<String> query4JResults = client.query4Json("select name, age, salary from Employee");
3.4.4.1. Object File

Since the Object Query runs queries on objects, then of course we need some sort of object file. In Java, our object files will be in the form of Classes.

Here is a simple implementation of the Student Class referred in the examples above:

import java.io.Serializable;

public class Student implements Serializable {
    String name;
    int age;

    public Student(String name, int age) {
        this.name = name;
        this.age = age;
    }
    public String getName() {
        return name;
    }
    public int getAge() {
        return age;
    }
    @Override
    public String toString() {
        return "Student{" +
                "name='" + name + '\'' +
                ", age=" + age +
                '}';
    }
}

CacheStore is not limited to this Student Class. Other implementations can also be used as well. More information can be found in the Object Query Guide.

3.4.5. Scan

A range scan is a mechanism to search through a range of keys by reading index entries sequentially. These types of scans can be done in CacheStore and can be optionally filtered using a query statement. Scans in CacheStore can only be done on sorted stores.

List<KeyValue> scanResult1 = client.scan(key1, key2, null);
List<KeyValue> scanResult2 = client.scan(key3, key4, queryString);

3.4.6. Cursor

A cursor is a control structure that allows traversal over records. Cursors in CacheStore are implemented for range scans, key iterators, and key-value pairs.

A simple use of scan cursors:

CursorPara keyCursor = client.openKeyCursor((short) 1000);
System.out.println("KeyCursor: "+keyCursor.getKeyValueList());
keyCursor = client.nextCursor(keyCursor);
System.out.println("KeyCursor next: "+keyCursor.getKeyValueList());
client.closeCursor(keyCursor);

CursorPara keyValueCursor = client.openKeyValueCursor((short) 900);
System.out.println("KeyValueCursor: "+keyValueCursor.getKeyValueList());
keyValueCursor = client.nextCursor(keyValueCursor);
System.out.println("KeyValueCursor next: "+keyValueCursor.getKeyValueList());
client.closeCursor(keyValueCursor);

CursorPara scanCursor = client.openScanCursor((short) 1000, k1, k4);
System.out.println("ScanCursor: "+scanCursor.getKeyValueList());
scanCursor = client.nextCursor(scanCursor);
System.out.println("ScanCursor next: "+scanCursor.getKeyValueList());
client.closeCursor(scanCursor);

A simple use of cluster cursors:

ClusterClient.ClusterCursor keyCursor = client.openKeyCursor((short) 1000);
System.out.println("KeyCursor: "+keyCursor.getResult());
keyCursor = client.nextCursor(keyCursor);
System.out.println("KeyCursor: "+keyCursor.getResult());
keyCursor.close();

ClusterClient.ClusterCursor keyValueCursor = client.openKeyValueCursor((short) 40);
System.out.println("KeyValueCursor: "+keyValueCursor.getResult());
keyValueCursor = client.nextCursor(keyValueCursor);
System.out.println("KeyValueCursor: "+keyValueCursor.getResult());
keyValueCursor.close();

ClusterClient.ClusterCursor scanCursor = client.openScanCursor((short) 750, k1, k4);
System.out.println("ScanCursor: "+scanCursor.getResult());
scanCursor = client.nextCursor(scanCursor);
System.out.println("ScanCursor: "+scanCursor.getResult());
scanCursor.close();

In the above examples, we can see the cursors take in parameters in shorts, and in the case of scanCursor keys as well. The short represents the batch size while the keys in scanCursor represent the key range to scan over.

3.4.7. Stored Procedures

Stored procedures are able to be defined using stored procedure files, the Invoker Class, and the invoke function. First, Some sort of stored procedure must be defined. In this example we will be using a StoreProc.groovy file (.groovy stored procedures are defaulted to config/script) to define our stored procedure:

class StoreProc implements StoreMap {
    String city = "irvine"
    ConcurrentMap<String, RemoteStore> storeMaps

    def sayHello(String name) {
        println( "hello new -- $name")
        "hello new -- "+name
    }

    @Override
    void setStoreMap(ConcurrentMap<String, RemoteStore> storeMaps) {
        this.storeMaps = storeMaps
    }
}

Now that we have our stored procedure defined in a .groovy class file, we need to initialize our Invoker class with this file:

Invoker invoker = new Invoker("StoreProc.groovy", "sayHello", new Object[] {"test" } );

The last step is to actually call, or invoke, this Invoker object:

String result = (String) client.invoke(invoker);
System.out.println(result);

Using stored procedures in Java is very similar. First, a stored procedure must be created in Java then put into a .jar file. The .jar file must be linked on startup (usually by placing into lib or dist folder).

public class JavaStoreProc implements StoreMap{
    ConcurrentMap<String, RemoteStore> storeMaps;
    public void helloWorld(String name){
        System.out.println("Hello, " + name);
    }
    @Override
    public void setStoreMap(ConcurrentMap<String, RemoteStore> storeMaps) {
        this.storeMaps = storeMaps;
    }
}

Creating and calling the invoker is the same except the fully qualified class name must be specified in place of the Groovy script.

Invoker invoker = new Invoker("com.mycompany.app.JavaStoreProc", "helloWorld", new Object[] {"test" } );

The invoker will initialize the class based using the default constructor without arguments. Arguments can be passed to the method being called.

3.5. Key Sequencing

CacheStore supports the use of sequencing data, which is recommended to keep keys organized and avoid key collision problems. To get started with sequencing, first create a store with keys that contain integer data = 0. Whenever accessing this store, use the client function "getSeqNoInt(Key key)". This will automatically increment the data by 1. Other than sequencing, this function can be used for however the developer wants.

3.6. Serialization

CacheStore supports multiple types of serialization for storing and retrieving objects. It is defaulted to use Hessian, but can be implemented with custom serialization.To specify what kind of serialization to use, change the <className> field in the stores.xml File. Serialization must occur on both the server and client side so the serializer must be present on both ends.

3.6.1. Hessian Serialization

The default Hessian serialization is included in the CacheStore package. More information on Hessian Serialization can be found Here.

3.7. Map Reduce

Exection of the Map Reduce from the client-side is the same as executing a Stored Procedure. The only difference is the implementation of the Stored Procedure, in this case by implementing MapReduce in addition to StoreMap.

The following is an implementation of the Map Reduce Stored Procedure, an example model (Value) it uses, and an Invoker call to execute it.


public class AggregateMR implements StoreMap, MapReduce<Value> {
    Map<String, RemoteStore> storeMaps;
    Serializer serializer = new HessianSerializer();
    public Value aggregate(String store, int threadCount) {
        ExecMapReduce execMapReduce = new ExecMapReduce(storeMaps, Value.class, this, serializer);
        Value ag = (Value) execMapReduce.execute(store, threadCount);
        return ag;
    }
    @Override
    public void beforeMapStart(Value record, int taskNo, Map<String, Object> context) {
    //Do something before map starts
    }

    @Override
    public void map(Pair<Key, Object> pair, Value record, Map<String, Object> map) {
        record.count();
    }

    @Override
    public void afterMapComplete(Value record, int taskNo, Map<String, Object> context) {
    //Do something after map completes
    }

    @Override
    public Value reduce(List<Value> list) {
        System.out.println("reduce");
        Value total = new Value();
        for(Value each:list){
            total.addCount(each.getCount());
        }
        total.setTotal(total.count);
        return total;
    }
    
    @Override
    public void setStoreMap(ConcurrentMap<String, RemoteStore> storeMaps) {
        this.storeMaps = storeMaps;
    }
}
            

The Map Reduce is executed with execMapReduce. The Map Reduce functions do exactly what they say. beforeMapStart occurs before the map starts, map is the map function, afterMapComplete occurs after the map, and reduce is the reduce function that happens after all the map functions finish.


public class Value implements Serializable {
    int count ;
    int total ;

    public Value() { }

    public Value(int count, int total) {
        this.count = count;
        this.total = total;
    }

    public void count() {
        count++;
    }
    public  void addCount(int count) {
        this.count += count;
    }

    public int getCount() {
        return count;
    }

    public void setCount(int count) {
        this.count = count;
    }

    public int getTotal() {
        return total;
    }

    public void setTotal(int total) {
        this.total = total;
    }
    

    @Override
    public String toString() {
        return "Value{" +
                "count=" + count +
                ", total=" + total +
                '}';
    }
}
            

The Map Reduce can be implemented however it is needed, but this example shows Value as being a simple aggregate counter.


Invoker invoker = new Invoker("com.cachestore.example.AggregateMR", "aggregate", new Object[] {"workers", 2} );
Object object = client.invoke(invoker);
            

The Map Reduce is called client-side the same way Stored Procedures are. The Map Reduce and model object(s) must be on both the client and server when executing.

4. Store and Cluster Config Files

The Config files allow users to setup and customize CacheStore servers, clusters, and stores. CacheStore defaults to reading config files from the config folder. You may need to create a config folder and place stores.xml in this folder.

4.1. Store Config

The stores.xml is used by every instance of CacheStore. Store configuration allows users to setup how they want their data to be stored and accessed.

4.1.1. The stores.xml File

Store metadata is defined in the stores.xml file in the config folder.

A stores.xml file with all tags filled with their data type:

<clusters>
  <port>int</port>
  <className>String</className>
  <maxQueue>int</maxQueue>
  <maxThread>int</maxThread>
  <replicaPort>int</replicaPort>
  <freq>int</freq>
  <useNio>boolean</useNio>
  <store>
    <name>String</name>
    <path>String</path>
    <delay>boolean</delay>
    <mode>int</mode>
    <freq>int</freq>
    <batchSize>int</batchSize>
    <logPath>String</logPath>
    <replicaUrl>List</replicaUrl>
    <replicaTimeout>long</replicaTimeout>
    <sorted>boolean</sorted>
    <getTrigger>String</getTrigger>
    <putTrigger>String</putTrigger>
    <deleteTrigger>String</deleteTrigger>
    <useMaxCache>boolean</useMaxCache>
    <maxCacheMemory>int</maxCacheMemory>
    <useLRU>boolean</useLRU>
    <pstReplicaUrl>List</pstReplicaUrl>
  </store>
</clusters>

And the following is a sample stores.xml file:

<clusters>
  <port>4200</port>
  <replicaPort>4204</replicaPort>
  <store>
    <name>test</name>
    <path>data</path>
    <delay>false</delay>
    <mode>0</mode>
    <freq>10</freq>
    <batchSize>1</batchSize>
    <sorted>true</sorted>
    <getTrigger>com.mycompany.app.GetTrigger</getTrigger>
    <putTrigger>com.mycompany.app.PutTrigger</putTrigger>
    <deleteTrigger>com.mycompany.app.DeleteTrigger</deleteTrigger>
  </store>
  <store>
    <name>store</name>
    <path>data</path>
    <delay>false</delay>
    <mode>0</mode>
    <freq>10</freq>
    <batchSize>1</batchSize>
    <sorted>true</sorted>
  </store>
</clusters>

Each tag contains information on store metadata.

4.2. Cluster Config

Unlike remotes and local stores, configuration of clusters are handled in three files: clusters.xml, stores.xml, and node.properties. The clusters.xml contains metadata for the cluster as a whole. The stores.xml contains metadata for the stores contained within the cluster. The node.properties contains metadata specifically for the current node. All nodes within the same cluster should share the same clusters.xml and stores.xml, but have unique node.properties. The corresponding config files should be in the same config path, defined when initializing the cluster.

4.2.1. The clusters.xml File

Cluster metadata is defined in the clusters.xml file in the config folder. The following is a sample clusters.xml file:

<clusters>
  <cluster>
      <no>1</no>
      <servers>ash2-voxd014.sm-us.sm.local:6172, ash2-voxd004.sm-us.sm.local:6182</servers>
      <partitions>0,1</partitions>
  </cluster>
</clusters>

Each tag contains information on cluster metadata.

4.2.2. The stores.xml File

Cluster setups also require a stores.xml file on each node. Please refer to "Store Config - The stores.xml File" for information regarding stores.xml configuration.

4.2.3. The node.properties File

Node properties are stored within this file. The node.properties file designates which node in the cluster the specific node is. The following is a sample node.properties file:

host=ash2-voxd004.sm-us.sm.local
port=6182

Notice how unlike the other two config files, node.properties does not use XML. It is a .properties file that just defines the host and the port number that the current node is using.

5. JMX Monitoring

CacheStore uses JMX monitoring. To implement and access JMX monitoring for CacheStore, first set up your jmxremote.password and jmxremote.access files.

Once those are set up, place the following code snippet into your startup script (we recommend using StartCachestore.sh):

if [ -z $OPTS ]; then
    OPTS="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=8999 \
        -Dcom.sun.management.jmxremote.protocol.port=8998 \
        -Dcom.sun.management.jmxremote.ssl=false \
        -Dcom.sun.management.jmxremote.authenticate=true \
        -Dcom.sun.management.jmxremote.password.file=./bin/jmxremote.password \
        -Dcom.sun.management.jmxremote.access.file=./bin/jmxremote.access"
fi

Once this has been setup, start CacheStore. Then you can open a new terminal and type jconsole to connect.

More information on JMX monitoring can be Here.

6. Maven

CacheStore libraries have been uploaded to the Maven Central Repository.

More information about Maven can be found Here.

6.1 Maven Dependencies

CacheStore can be used as a standalone system or used as a plugin. To get CacheStore in your own project, add the following dependency in your pom.xml file:

    <dependency>
        <groupId>com.viantinc.cachestore</groupId>
        <artifactId>cachestore-client</artifactId>
        <version>1.5.6</version>
    </dependency>
    <dependency>
        <groupId>com.viantinc.cachestore</groupId>
        <artifactId>cachestore-core</artifactId>
        <version>1.2.5</version>
    </dependency>
    <dependency>
        <groupId>com.viantinc.cachestore</groupId>
        <artifactId>cachestore-server</artifactId>
        <version>1.6.5</version>
    </dependency>
    <dependency>
        <groupId>com.viantinc.cachestore</groupId>
        <artifactId>objectquery</artifactId>
        <version>1.5.8</version>
    </dependency>
    <dependency>
        <groupId>com.viantinc.cachestore</groupId>
        <artifactId>replica</artifactId>
        <version>2.2.3</version>
    </dependency>
    <dependency>
        <groupId>com.viantinc.cachestore</groupId>
        <artifactId>transport</artifactId>
        <version>2.2.3</version>
    </dependency>
    <dependency>
        <groupId>com.viantinc.voldemort</groupId>
        <artifactId>cachestore-storage</artifactId>
        <version>4.0.4</version>
    </dependency>
    <dependency>
        <groupId>com.viantinc</groupId>
        <artifactId>hessian-sm</artifactId>
        <version>4.1.0</version>
    </dependency>

More information about Maven can be found Here.