Storage Balancing in P2P Based Distributed RDF Data Stores


Centralized RDF repositories have been designed to support RDF data storage and retrieval. However, they suffer from the traditional limitations of centralized approaches which are scalability and fault tolerance. Peer to Peer (P2P) networks can provide the scalability, fault-tolerance and robustness, features that the current solutions to local RDF storage do not provide which are needed by the existing Semantic Web applications. A common strategy from state-of-the-art P2P-RDF data stores is to store triples at three locations so each triple can be found using a look-up by subject, predicate, or object identifier. One major issue of this strategy is the lack of load-balancing, since occurrences in triples are not uniformly distributed. Consequently, this issue leads an unbalance query processing load distribution and unfair storage load in the network. To solve this problem caused by load imbalance, we propose new scheme to split the data in the stressed nodes which is based in evenly distributing excess of data across neighboring nodes providing a Prefix Hash Table for fast accessing to such data. We provide an empirical evaluation of our novel approach and compare with other state of the art systems for storage balancing showing the feasibility of our approach

Workshop on Decentralizing the Semantic Web 2017 co-located with 16th International Semantic Web Conference