In order to build a Apache Cassandra [1] cluster exclusively for availability and replication testings, here is a simple solution, based on a single Linux instance, with no virtualization at all.
The idea is to initialize every node, run a testing client, and then manually kill some nodes processes in order to check the service continuous availability and data replication with several replication factors.
Cassandra's last stable version at this post date was 0.6.5, and this is the released considered in the study.
Thinking of a cluster using 3 nodes, we'll create the following file system structure on Linux:
/opt/apache-cassandra-0.6.5/nodes | |-- node1 | |-- bin | |-- conf | |-- data | |-- log | `-- txs | |-- node2 | |-- bin | |-- conf | |-- data | |-- log | `-- txs | `-- node3 |-- bin |-- conf |-- data |-- log `-- txs
That is, each node has its own set of directories, namely:
- bin: binaries (mostly Shell Scripts) tuned for the instance
- conf: configuration files for the node
- data: database binary files
- log: system log in text format
- txs: transaction logs (i.e. commit logs)
In the following sections the steps will be described in detail.
Cassandra installation
First of all, get the required packages for Cassandra from its download site. Extract the file contents into /opt/apache-cassandra-0.6.5/ directory (you'll need administrator privileges in order to do so). Make sure your regular user owns this entire directory tree (use chown if needed).
Certify that you already has Java Virtual Machine (JVM) installed on the system, at least version 1.6. Cassandra is implemented in Java, so it needs JVM to be executed.
Supplying additional network interfaces
As we are building a pseudo-cluster, consider there are no more network interfaces than existing ones. Since Cassandra is a distributed system, their participant nodes must have an exclusive IP address each one. Fortunately we can simulate it on Linux using aliases to the existent loopback interface (i.e., "lo").
Therefore, using root user, we'll issue the instructions below in order to create the aliases lo:2 and lo:3, respectively pointing to 127.0.0.2 and 127.0.0.3:
# ifconfig lo:2 127.0.0.2 up # ifconfig lo:3 127.0.0.3 up
You can check the overcome of these instructions by invoking ifconfig using no arguments:
$ ifconfig lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:30848 errors:0 dropped:0 overruns:0 frame:0 TX packets:30848 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:2946793 (2.9 MB) TX bytes:2946793 (2.9 MB) lo:2 Link encap:Local Loopback inet addr:127.0.0.2 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:16436 Metric:1 lo:3 Link encap:Local Loopback inet addr:127.0.0.3 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:16436 Metric:1
That way, node1's IP is 127.0.0.1, node2's 127.0.0.2, and so forth.
Registering node hostnames locally
Instead of using IP addresses in the sequential steps, we'll assign hostnames to them and use those names when referencing a cluster node.
As we have no DNS in this configuration, we need to edit /etc/hosts file and assign those IP mappings there. Change this file according to the example below:
/etc/hosts: 127.0.0.1 localhost node1 127.0.0.2 node2 127.0.0.3 node3
From now on, we'll call the nodes simply as node1, node2, and node3. In order to finally test hostnames, try pinging the nodes as exemplified:
$ ping node2 PING node2 (127.0.0.2) 56(84) bytes of data. 64 bytes from node2 (127.0.0.2): icmp_seq=1 ttl=64 time=0.018 ms 64 bytes from node2 (127.0.0.2): icmp_seq=2 ttl=64 time=0.015 ms ^C --- node2 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 0.015/0.016/0.018/0.004 ms
Creating the first node structure
We'll first create the directory structure for node1 and after replicate it to remaining nodes. Switch to Cassandra's root directory, create subdirectories and copy files by issuing the following shell commands using your regular Linux user:
$ cd /opt/apache-cassandra-0.6.5/ $ mkdir -p nodes/node1 $ cp -R bin/ conf/ nodes/node1/ $ cd nodes/node1
The next step is to modify some default settings in the configuration files. Here I opted to use relative paths instead of absolute paths for simplification purposes.
Edit a single line in conf/log4j.properties file.
# Edit the next line to point to your logs directory log4j.appender.R.File=./log/system.log
Now in conf/storage-conf.xml several tag values must be changed. Be sure to modify the correct tags, as indicated below:
<Keyspaces> <Keyspace Name="Keyspace1"> ... <ReplicationFactor>2</ReplicationFactor> ... </Keyspace> <Keyspaces> ... <CommitLogDirectory>./txs</CommitLogDirectory> <DataFileDirectories> <DataFileDirectory>./data</DataFileDirectory> </DataFileDirectories> ... <ListenAddress>node1</ListenAddress> <StoragePort>7000</StoragePort> ... <ThriftAddress></ThriftAddress> <ThriftPort>9160</ThriftPort> <ThriftFramedTransport>false</ThriftFramedTransport> ...
As we are building a special directory arrangement, we'll need to adequate some paths in the scripts. Thus, edit the file bin/cassandra.in.sh by providing the lines below appropriately:
for jar in $cassandra_home/../../lib/*.jar; do CLASSPATH=$CLASSPATH:$jar done
The first node is finally concluded. So, let's turn our attentions to the other two nodes.
Creating the remaining nodes
We are gonna take advantage of the first node directory structure to create the two remaining nodes. Thus, issue the instructions below in order to perform the cloning:
$ cd /opt/apache-cassandra-0.6.5/nodes/ $ mkdir node2 node3 $ cp -R node1/* node2 $ cp -R node1/* node3
When you're done, call a tree command (if available) as illustrated below to check the whole structure:
$ tree -L 2 . |-- node1 | |-- bin | `-- conf |-- node2 | |-- bin | `-- conf `-- node3 |-- bin `-- conf
If you compare this structure to the other one in the beginning of this article, you should note that there are some subdirectories missing (i.e., log, data, txs). That's because they will be created automatically when the server starts up the first time.
Final settings on nodes
The first node is deemed a contact point, i.e., a seed. Other nodes should start the Gossip-based protocol by connecting to it.
Besides, we should make specific settings on each node. Firstly, each node must listen to it's own hostname (i.e., node1, node2, node3). Thus, edit the nodeX/conf/storage-conf.xml for each node by replacing the line below (this example applies to node2):
<ListenAddress>node2</ListenAddress>
Monitoring on Cassandra is provided through JMX, listening to every host by default on TCP port 8080. As we are building the cluster on the same real instance, this is not possible anymore. Thus, JMX interfaces must be bound to the same host, but we must change the port on each node. Therefore, first node will be listening on 8081, node2 on 8082, and node3 on 8083.
This parameter is configured in nodeX/bin/cassandra.in.sh, so we must change only the port as exemplified below (applying to node2):
# Arguments to pass to the JVM JVM_OPTS=" \ -ea \ -Xms1G \ -Xmx1G \ -XX:+UseParNewGC \ -XX:+UseConcMarkSweepGC \ -XX:+CMSParallelRemarkEnabled \ -XX:SurvivorRatio=8 \ -XX:MaxTenuringThreshold=1 \ -XX:+HeapDumpOnOutOfMemoryError \ -Dcom.sun.management.jmxremote.port=8082 \ -Dcom.sun.management.jmxremote.ssl=false \ -Dcom.sun.management.jmxremote.authenticate=false"
Starting up every server
For this step it is very interesting to open a new Linux terminal for each node. Thus, issue the instructions below:
$ node1/bin/cassandra -f $ node2/bin/cassandra -f $ node3/bin/cassandra -f
Pay attention to the console output on each node. A simple misconfiguration will be related there. Also note the relationship between the nodes. They must discover one another automatically in seconds.
Check services availability
Since every node in the cluster is up and running, their respective TCP ports must appear to the system when executing a netstat command.
In order to check listening TCP ports on Cassandra, one must search for 9160 (Thrift service), 7000 (internal storage), and 808X (JMX interface). Take a look at the example below of a successful cluster startup.
$ netstat -lptn Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN - tcp6 0 0 127.0.0.3:9160 :::* LISTEN 8520/java tcp6 0 0 127.0.0.2:9160 :::* LISTEN 8424/java tcp6 0 0 127.0.0.1:9160 :::* LISTEN 8336/java tcp6 0 0 :::46954 :::* LISTEN 8424/java tcp6 0 0 :::53418 :::* LISTEN 8336/java tcp6 0 0 :::49035 :::* LISTEN 8520/java tcp6 0 0 :::80 :::* LISTEN - tcp6 0 0 :::42737 :::* LISTEN 8520/java tcp6 0 0 :::8081 :::* LISTEN 8336/java tcp6 0 0 :::8082 :::* LISTEN 8424/java tcp6 0 0 :::8083 :::* LISTEN 8520/java tcp6 0 0 :::60310 :::* LISTEN 8424/java tcp6 0 0 :::46167 :::* LISTEN 8336/java tcp6 0 0 ::1:631 :::* LISTEN - tcp6 0 0 127.0.0.3:7000 :::* LISTEN 8520/java tcp6 0 0 127.0.0.2:7000 :::* LISTEN 8424/java tcp6 0 0 127.0.0.1:7000 :::* LISTEN 8336/java
The last test was from a network perspective. You might want to ask Cassandra whether these nodes are talking to each other and participating on a healthy partitioning scheme. In order to do so, another checking is to invoke nodetool's ring command.
$ cd /opt/apache-cassandra-0.6.5/ $ ./bin/nodetool -h localhost -p 8081 ring Address Status Load Range Ring 142865723918937898194528652808268231850 127.0.0.1 Up 3,1 KB 39461784941927371686416024510057184051 |<--| 127.0.0.3 Up 3,1 KB 54264004217607518447601711663387808864 | | 127.0.0.2 Up 2,68 KB 142865723918937898194528652808268231850 |-->|
Here it is, the local cluster is up and running with three nodes! :D
You can also check Cassandra's availability by using its simple client application:
$ ./bin/cassandra-cli --host node1 --port 9160 Connected to: "Test Cluster" on node1/9160 Welcome to cassandra CLI. cassandra> set Keyspace1.Standard1['rowkey']['column'] = 'value' Value inserted. cassandra> get Keyspace1.Standard1['rowkey']['column'] => (column=636f6c756d6e, value=value, timestamp=1285273581745000) cassandra> get Keyspace1.Standard1['rowkey'] => (column=636f6c756d6e, value=value, timestamp=1285273581745000) Returned 1 results. cassandra> del Keyspace1.Standard1['rowkey'] row removed.
As we set replication factor to 2 for this given keyspace Keyspace1, it is expected that each value saved in the Cassandra cluster is replicated to two nodes.
Do you really believe it is happening? If not, save another value (using set) and then kill one of the three node processes. Do not kill more than one! Then, retrieve that key again (using get). The system should be capable of automatically recovering from this intentional disaster.
You can easily make a node participate in the cluster again just by executing its respective bin/cassandra command. Take a look at all console outputs after every step, they are very interesting!
References
[1] Apache Cassandra