2013-11-07

Evil Cassandra Unavailable Exception and a few Solutions

This is a nasty situation when learning cassandra: You have a cassandra cluster up, you try running your clients (server or command line) and nothing seems to work (example from cassandra-cli):

[default@blub] list user;
Using default limit of 100
Using default column limit of 100
null
UnavailableException()
   at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassa....

Basically the reason is that your query's consistency level cannot be satisfied by the cassandra cluster. Couple of fixes, depending on your situation:

  1. If your configured replication factor is 1 or your cluster has too few nodes: Use a lower consistency level. Test in cassandra-cli:
    [default@blub] consistencylevel as one;
    Consistency level is set to 'ONE'.

  2. If you started with a single node and recently added another, or you had to replace a node, or you changed the replication factor: Run repair on the node(s):
    apache-cassandra-1.2.5$ ./bin/nodetool -h localhost repair -l
    [2013-11-07 11:09:04,369] Starting repair command #1, repairing 1 ranges for keyspace blub
    [2013-11-07 11:09:04,376] Repair command #1 finished
    [2013-11-07 11:09:04,387] Nothing to repair for keyspace 'system'
    [2013-11-07 11:09:04,393] Starting repair command #2, repairing 1 ranges for keyspace system_auth
    [2013-11-07 11:09:04,394] Repair command #2 finished
    [2013-11-07 11:09:04,402] Nothing to repair for keyspace 'system_traces'

  3. If you run with
    org.apache.cassandra.locator.NetworkTopologyStrategy and multiple data centers: Careful with consistency level QUORUM: In my experience it requires quorums in all data centers! So if you have a data center with replication set to 1 you will always get UnavailableException:
    CREATE KEYSPACE vcodeks     with placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy'
        strategy_options = {dc1:2, dc2:1};

    In this case it is better to use consistency level TWO. If you only care about one data center you might be happy with LOCAL_QUORUM if you configure all your client's DCs. But be aware that in the upper example LOCAL_QUORUM clients with dc set to dc1 won't be available if only one node in dc1 failed, even though you have a combined 3 replicas over all.