quinta-feira, 2 de setembro de 2010

Simplifying Cassandra interactions with Helena



Introduction


In the previous article "Talking to Cassandra using Java and DAO pattern" [1] we saw an introduction to NoSQL-based databases and especially one called Apache Cassandra [2]. After a quick explanation of its data model, we analyzed what it takes to implement a Java code which goal was to access and yet modify key-value pairs stored on a Cassandra keyspace. At that time we used a low-level interface API named Thrift.

You might have noticed that a large amount of code was needed to build an entire and yet simple persistence class, namely a structure based on Data Access Object design pattern. If we compare it to currently most used persistence technologies and frameworks in Java (i.e. JPA and Hibernate), we would probably get crazy!

Is this the price we pay for high-availability and scalability when changing from a traditional relational database model to a newly key-value data store? Poor developers... Well, that's not the end, fortunately! There are already available several high level clients for Cassandra in multiple programming languages (see [3]).

One of these clients, HelenaORM [4] (created by Marcus Thiesen), is the study subject this time. As this library is addressed to Java language, our aim will be to implement the persistence of a simple object using Helena's support.


Creating the entity


As in the other article, we'll choose the "group" entity to exemplify our codes. This time, in addition to the plain old Java class elements, in the Group class we need to annotate it with @HelenaBean and @KeyProperty.

The annotation @HelenaBean is used to specify keyspace and column family for the class, whereas @KeyProperty must indicate which field in the class should be considered the row key in Cassandra - currently annotation only works at getter or setter methods, not in the variable itself. Both annotations are analogous to JPA's @Entity and @Id annotations.

Take a look at the Group entity class properly prepared to be used by HelenaORM:


@HelenaBean(keyspace="ContactList", columnFamily="Groups")
public class Group {

private Integer id;
private String name;

public Group() {
}

public Group(Integer id, String name) {
this.id = id;
this.name = name;
}

@KeyProperty
public Integer getId() {
return id;
}
public String getName() {
return name;
}
public void setId(Integer id) {
this.id = id;
}
public void setName(String name) {
this.name = name;
}

@Override
public String toString() {
return "Group [id=" + id + ", name=" + name + "]";
}

}



Creating the tests


The most interesting new is about to come: with Helena there's no need of traditional building a DAO interface and class!

Helena brings a factory HelenaORMDAOFactory designed to create ready-to-use DAO classes type-safely pointed to a given entity. The classes it produces, of HelenaDAO type, provides these methods: insert(), delete(), and get(). They are out-of-box implementations respectively for inserting (or editing), removing, and retrieving entire object instances from Cassandra.

So, here is the corresponding unitary test class in JUnit reserved for invoking inserts, deletions, and retrievals of Java object instances into Cassandra with the support of Helena:


public class GroupTest {

static HelenaORMDAOFactory factory;
private HelenaDAO<Group> dao;

private static final Integer GROUP_ID = 123;
private static final String GROUP_NAME = "Test Group";

@BeforeClass
public static void setUpBeforeClass() throws Exception {
factory = HelenaORMDAOFactory.withConfig(
"localhost", 9160, SerializeUnknownClasses.YES);
}

@AfterClass
public static void tearDownAfterClass() throws Exception {
factory = null;
}

@Before
public void setUp() throws Exception {
dao = factory.makeDaoForClass(Group.class);
}

@After
public void tearDown() throws Exception {
dao = null;
}

@Test
public void testSave() {

System.out.println("GroupDAOTest.testSave()");

Group group = new Group();
group.setId(GROUP_ID);
group.setName(GROUP_NAME);

System.out.println("Saving group: " + group);
dao.insert(group);
Assert.assertTrue(true);

Group retrieved = dao.get(GROUP_ID.toString());
System.out.println("Retrieved group: " + retrieved);
Assert.assertNotNull(retrieved);
Assert.assertEquals(GROUP_ID, retrieved.getId());
Assert.assertEquals(GROUP_NAME, retrieved.getName());
}

@Test
public void testRetrieve() {

System.out.println("GroupDAOTest.testRetrieve()");

System.out.println("Saving groups");
for (int i = 1; i <= 10; i++) {
Group group = new Group();
group.setId(GROUP_ID * 100 + i);
group.setName(GROUP_NAME + " " + i);
dao.insert(group);
}

List<Group> list = dao.getRange("", "", 10);
System.out.println("Retrieving groups");
Assert.assertNotNull(list);
Assert.assertFalse(list.isEmpty());
Assert.assertTrue(list.size() >= 10);

System.out.println("Retrieved list:");
for (Group group : list) {
System.out.println("- " + group);
}
}

@Test
public void testRemove() {

System.out.println("GroupDAOTest.testRemove()");

Group group = new Group();
group.setId(GROUP_ID);
group.setName(GROUP_NAME);

System.out.println("Saving group: " + group);
dao.insert(group);

System.out.println("Removing group: " + group);
dao.delete(group);
Assert.assertTrue(true);

Group retrieved = dao.get(GROUP_ID.toString());
System.out.println("Retrieved group: " + retrieved);
Assert.assertNull(retrieved);
}

}



Checking the results


After properly executing the tests, you should check the keyspace contents inside Cassandra. In order to do that, you can use Cassandra's client by issuing the instructions described below:


cassandra> count ContactList.Groups['12305']
1 column

cassandra> get ContactList.Groups['12305']
=> (column=name, value=Test Group 5, timestamp=1283287882927)
Returned 1 result.

cassandra> get ContactList.Groups['12305']['name']
=> (column=name, value=Test Group 5, timestamp=1283287882927)



Conclusions


Traditional relational databases took more than 20 years of evolution to reach the state they are now. Aside, object-oriented programming languages and techniques conquered companies and developers forcing the creation of persistence frameworks in order to link both worlds. Soon as Internet usage grew, RDBMSs were not able to efficiently scale, thus a new paradigm was conceived (or reborn): the key-value distributed database model!

Cassandra, one of those distributed databases, is not trivial to talk to, from a developer perspective. In opposition to use a low-level interface API (i.e. Thrift), a lot of high-level clients were created by individual developers (thanks to open source initiatives!), and one of them was HelenaORM.

Thus, in the present article we saw how to leverage and simplify our Java code which handles persistence in Cassandra with the aid of HelenaORM libraries. A lot of work was reduced, isn't it? :D


References

[1] Talking to Cassandra using Java and DAO pattern
[2] Apache Cassandra
[3] Cassandra high level clients
[4] HelenaORM