Complex Queries Optimization And Evaluation Over Relational And NoSQL Data Stores In Cloud Environments

Complex Queries Optimization And Evaluation Over Relational And NoSQL Data Stores In Cloud Environments

The production of huge amount of data and the emergence of cloud computing have introduced new requirements for data management. Many applications need to interact with several heterogeneous data stores depending on the type of data they have to manage : relational and NoSQL (i.e. document, graph, key-value, and column) data stores. Interacting with heterogeneous data models via different APIs and query languages, and multiple data store applications imposes challenging tasks to their developers. Indeed, the execution of complex queries over heterogeneous data models cannot, currently, be achieved in a declarative way as it is used to be with mono-data store application, and therefore requires extra implementation efforts. In this paper we propose a mediation based component to optimally execute complex queries over multiple integrated data stores in Cloud environments. This component is referred to as virtual data store. The key ingredients of our solution are (1) a catalog containing information about the integrated data stores and their capabilities, (2) a cost model to evaluate the cost of the operations, (3) an inter-parallelism between sources to execute operations, and (4) a dynamic programming based approach to generate optimal execution plan. Quantitative and qualitative experiments are conducted to validate our approach.