lundi 18 août 2014

Comparison between caching systems for Java


Servers are getting more and more powerful with a lot of RAM (up to hundred to thousands of giga bytes). However, it is still not possible to use most of the available capacity directly in java applications due to inherent limitations of the GC (Garbage Collector) on JVM that may pause the application for a long time (even up to many minutes) to move objects between different generations.

Follows is the description/comparison between some solutions, also called data grids like, that can be used to face this problem like the Infinispan project of JBoss (ex. JBoss Cache), DirectMemory (an Apache proposal), EhCache (of terracotta), etc.

Caches

1. Infinispan (JBoss Data Grid Platform)
  • Don't provide support for expiration events as disscussed in the forum.
  • SingleFileCacheStore a cache loader from a file stores that manages the data activation (loading from store to cache) and passivation (saving data to store).
  • List of possible attributes in the XML configuration for infinispan 4.0 and infinispan 6.0.

2. MapDB
  • Exists only in the embbeded mode
  • Enables the creation of on heap and off-heap collections (map, queue), as well as file-backed collections
  • Listeners registerd to cache events are notified in the main thread (i.e. should implement async notifications)
  • Can be used for lazy loading (e.g. Lazily_Loaded_Records.java).
  • Provides means for pumping the integral data available on memory to disk (e.g. Pump_InMemory_Import_Then_Save_To_Disk.java).
  • Transaction isolation level is Serializable which is the highest level and means a new transaction can be initiated only if previous one was committed. 
  • Transactions uses a global lock which reduce considerably the cache performance.

3. Akiban's Persistit - github
4. JCS (Java Caching System)
5. Hazelcast
6. GridGain



5. Others: LArray, Cache2K, DirectMemory (initial project on github, apache proposal for incubation) an off-heap memory storage, MVStore the storage subsystem of the H2 database, Spring cache, HugeCollections.

Search
Resources
  • A good explanation of the use of ByteByffer to build non-heap memory caches by Keith Gregory: blog post, JUG presentation, another one.
  • An article on InfoQ about HashMap implementation for off-heap map.
  • An ibm red book on capacity for big data and off-heap memory.
  • Examples related to the use of EhCache from a Devoxx 2014 presentation.
Benchmarks
  • Cache2K vs Infinispan/EhCache/JCS - bench
  • Radargun a framework for benchmarking data grids
Memory storage

In-memory databases (a detailed description can be found at Information Week):
  • NoSQL approaches (covers the class of nonrelational and horizontally scalable databases) like Aerospike.
  • NewSQL approaches (emerging databases offerting NoSQL scalability but with familiar SQL query capabilities, i.e. SQL-compliant) like VoltDB, Oracle TimesTen, IBM solidDB, MemSQL.
Companies like Microsoft, Oracle and IBM choosed to add the in-memory support for their traditional databases (e.g. moving tables to memory), whereas SAP adopted another approach with its Hana platform that aims to put everything in-memory.


Some traditional RDBMS can be configured to store their data in-memory instead of disk storage like sqlite, MySQL, etc.