PhD thesis
Transcription
PhD thesis
N◦ d’ordre 05ISAL0055 Année 2005 Thèse Conception et mise en oeuvre de mécanismes sécurisés d’échange de données confidentielles ; application à la gestion de données biomédicales dans le cadre d’architectures de grilles de calcul/données présentée devant L’Institut National des Sciences Appliquées de Lyon pour obtenir le grade de docteur École doctorale : Informatique et Information pour la Société (EDIIS-EDA 335) Spécialité : Documents Multimédia, Images et Systèmes D’Information Communicants (DISIC) par Ludwig SEITZ Soutenue le 11 Juillet 2005 devant la Commission d’examen Jury BERTINO Elisa Professeure PUCHERAL Philippe Professeur BRUNIE Lionel Professeur PIERSON Jean-Marc Maı̂tre de Conférence MULMO Olle Chercheur ROCH Jean-Louis Maı̂tre de Conférence Rapporteure Rapporteur Directeur de thèse Co-directeur de thèse Examinateur Examinateur Ordering number 05ISAL0055 Year 2005 Thesis Design and Implementation of Secure Mechanisms for Sharing Confidential Data; Application to the Management of Biomedical Data in a Grid Computing Environment Submitted to the National Institute of Applied Sciences of Lyon In fulfillment of the requirements for a Doctoral Degree Doctoral school of Computer Science and Informatics (EDIIS-EDA 335) Affiliated Area: Computer Science Defended at 11th Prepared by Ludwig SEITZ July 2005 in front of the Examination Committee Committee Members BERTINO Elisa Professor Reviewer PUCHERAL Philippe Professor Reviewer BRUNIE Lionel Professor Supervisor PIERSON Jean-Marc Associate Professor Co-supervisor MULMO Olle Researcher Examiner ROCH Jean-Louis Associate Professor Examiner Résumé Les grilles de calcul sont devenus une des architectures de choix, pour des applications qui consomment un grand volume de données et qui demandent beaucoup de puissance de calcul. Les grilles permettent de partager des ressources multiples et hétérogènes, comme la puissance de calcul, l’espace de stockage et les données, a travers d’une architecture qui permet de faire interopérer ces ressources d’une manière transparente pour l’utilisateur. Des applications récentes des grilles sont les réseaux de soin. Le but d’un tel réseau est d’une part de permettre aux médecins d’utiliser la puissance de calcul des grilles pour leurs algorithmes d’analyse d’images médicales et d’autre part de permettre le partage transparent et multi-institutionnel de données de patient distribuées. Contrairement aux premières applications des grilles (par exemple la physique de particules ou l’observation terrestre), la sécurité est très importante pour les applications médicales. Les données des patients doivent être protégées contre des accès illicites, tout en étant en même temps accessibles par des personnes autorisées. Des mécanismes de protection de données classiques ne sont que d’une utilité limitée pour cette tâche, à cause des nouveaux défis posés par la Grille. Le plus grand problème pour la sécurité des données sur une grille, est le fait que des données peuvent être répliquées en dehors du domaine de leur possesseur pour les rapprocher d’une unité de calcul censé les traiter. Pour cette raison un système de contrôle d’accès doit être décentralisé et le possesseur d’une donnée doit avoir le contrôle sur qui a accès à ses données. Puisque il peut être nécessaire d’accéder rapidement à des données, le système de contrôle d’accès doit permettre de la délégation immédiate de droits. Finalement le fait que les données en question peuvent être très confidentielles fait que l’on peut pas se fier uniquement à un mécanisme de contrôle d’accès, puisque un attaquant ayant accès au materiel physique de stockage peut contourner ce contrôle. Dans la thèse ici présente, nous proposons une architecture pour la protection de données confidentielles sur une grille. Cette architecture comprend un système de contrôle d’accès et un système de stockage chiffré. iii Le mécanisme de contrôle d’accès Sygn proposé, permet le stockage et la gestion décentralisée de permissions. Toutes les permissions en Sygn sont encodées dans des certificats, qui sont stockés par leurs possesseurs et utilisés quand nécessaire. Des permissions peuvent êtres créées à tout moment par les possesseurs des ressources ou par des administrateurs auxquels ce droit a été délégué. Pour cette création de permissions aucune interaction avec un système de stockage de permissions centralisé n’est nécessaire. La délégation en plusieurs étapes est réalisée en Sygn par des chaı̂nes de certificats. Pour ces raisons, Sygn permet la gestion de ressources et de permissions changeant dynamiquement. Les serveurs de contrôle d’accès de Sygn stockent un minimum d’informations critiques à la sécurité. Ils sont mis en place proches des ressources auxquels ils contrôlent l’accès pour minimiser l’impact d’une attaque réussie. Sygn évite l’utilisation de services centralisés et minimise les tiers de confiance. Sygn a été intégré avec succès dans une architecture minimale de grilles. Le système proposé pour le stockage chiffré CryptStore est conçu pour permettre à une communauté dynamique d’utilisateurs le stockage chiffré de données, l’accès à ces données et leur modification. Pour réaliser cette fonction, CryptStore met en œuvre des serveurs de clefs distribués qui donnent accès aux clefs de déchiffrage à des utilisateurs autorisés. Pour minimiser l’impact d’une attaque réussie sur un serveur de clefs, aucun serveur ne stocke une clé entière. Les clefs sont partagées à l’aide d’algorithmes classiques de partage de secrets et les parts sont distribuées sur plusieurs serveurs de clefs. Pour éviter de rajouter une couche supplémentaire et potentiellement incohérente de contrôle d’accès, l’accès aux parts de clefs est contrôlé à travers le mécanisme de contrôle d’accès aux données de la grille. Pour cela les serveurs de clefs ont une interface générique qui peut être adaptée à n’importe quel mécanisme de contrôle d’accès sur la grille. Une adaptation de cette interface qui permet l’utilisation de Sygn pour le contrôle d’accès aux clefs a été implémentée pour CryptStore. Abstract Grid computing has become the architecture of choice for applications that process a large amount of data and require a lot of computing power. Indeed Grids allow users to share multiple heterogeneous resources, such as computing power, storage capacity and data, and provide an architecture for transparent interoperation of these resources from the user’s point of view. An upcoming application for Grids is health-care, with the goal of giving medical doctors the computing power of Grids to speed up and improve their diagnosis software, or to gain a transparent, cross-institutional access to distributed patient records. More than for the first applications of Grids (e.g. particle physics, terrestrial observation), security is a major issue for medical applications. Conventional data protection mechanisms are only of limited use, due to the novel security challenges posed by Grids. The most important challenge is that on demand of the middleware data on a Grid may be copied outside the home domain of their owner in order to be stored close to some distant computing resource. To respond to these challenges we propose an access control system that is decentralized and where the owners of some data are in control of the permissions concerning their data. Furthermore data may be needed at very short notice, the access control system must support a delegation of rights that is effective immediately. Grid users also need delegation mechanisms to give rights to processes, that act on their behalf. As these processes may spawn sub processes, multi-step delegation must be possible. In addition to these useability requirements, the transparent storage and replication mechanisms of Grids make it necessary to implement additional protection mechanisms for confidential data. Access control can be circumvented by attackers having access to the physical storage medium. We therefore need encrypted storage mechanisms to enhance the protection of data stored on a Grid. In this thesis we propose a comprehensive architecture for the protection of confidential data on Grids. This architecture includes an access control v system and an encrypted storage scheme. The proposed access control mechanism Sygn provides a decentralized permission storage and management system. All permissions in Sygn are encoded in certificates, which are stored by their owners and used when required. Permissions can be created on demand, by the owners of the resources or by administrators to whom this responsibility has been delegated, without the need to contact a central permission storage system. Multi-step delegation of permissions is realized in Sygn through the use of certificate chains. Thus, Sygn allows an efficient decentralized administration of dynamically changing resources and permissions. The format of Sygn permissions allows fine-grained specification of resources to be protected, in order to give each resource owner the possibility to express specific authorizations on his or her resource. The Sygn access control servers are deployed close to the resources they control, and store only minimal security critical information, in order to minimize the impact of a successful attack. So Sygn avoids the use of centralized services and minimizes the use of trusted third parties in order to enhance security and extensibility. The proposed encrypted storage scheme CryptStore is designed to allow users to manage dynamically changing data sets, by dynamically changing user communities. To achieve this goal, CryptStore relies on distributed keyservers that allow dynamic sharing of decryption keys based on file access permissions. In order to minimize the impact of a successful attack on a keyserver, no single key-server stores full encryption keys. Instead keys are split, using a classical secret sharing algorithm, and distributed among several keyservers. To avoid adding a duplicate and possibly incoherent layer of access control, access to key-shares stored on CryptStore key-servers is granted according to file-access permissions of the Grid access control system. Thus the key-servers have a generic interface that can be adapted to interact with any Grid access control system. Sygn has been successfully integrated in a lightweight grid middleware for access control to files and an instantiation of the generic CryptStore access control interface has been implemented, that allows to use Sygn for key access control. Acknowledgments First and foremost I would like to thank my supervisors Lionel Brunie and Jean-Marc Pierson. Thanks to them I have discovered Grid computing as an interesting field of research for security applications. Through all the years their advice and expertise have been very valuable for me and have helped me to succeed in this work. I would also like to thank Elisa Bertino and Philippe Pucheral who have accepted the hard task of reviewing my work and being members of my examination committee. My thanks to Olle Mulmo and Jean-Louis Roch for accepting to be members of my examination committee too. I am deeply grateful to my girlfriend Elin A. Topp, who has supported me even though thousands of kilometers separate us and who has raised my morale when I was frustrated. Many thanks also go to the students Dan Hididis and Didier Oriol, who have directly contributed to my work in their projects and who both did an excellent job. Johan Montagnat has also given valuable contributions both through advice, questioning and by providing his excellent software libraries. Furthermore Olle Mulmo and Thomas Sandholm from KTH as well as Babak Sadighi Firozabadi and Erik Rissanen from SICS in Sweden have all helped me a lot through comments, pointers to literature and suggestions. Pierre Maret and Jacques Calmet have made this work possible by establishing the contact to INSA Lyon. And finally I wish to thank my colleagues from the LISI/LIRIS laboratory with whom I have discussed some of the gruesome details and implications of my thesis: Solomon Atnafu, David Coquil, Girma Berhe, Sonia Ben Mokhtar, Amine Demidem, Hector Duque, Rami Rifaieh, Yonny Cardenas, Ny-Haingo Andrianarisoa, Dejene Ejigu, Marian Scuturici, Rachid Saadi and Julien Gossa. vii Contents 1 Résumé Français 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 État de l’art . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Modèles de contrôle d’accès . . . . . . . . . . . . . . . 1.3.2 Séquence de messages pour le contrôle d’accès . . . . . 1.3.3 Langages d’expression de politiques de contrôle d’accès 1.3.4 Certificats . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.5 Systèmes de contrôle d’accès . . . . . . . . . . . . . . . 1.3.6 Pourquoi le stockage chiffré ? . . . . . . . . . . . . . . . 1.3.7 Systèmes de stockage chiffré . . . . . . . . . . . . . . . 1.4 Le système de contrôle d’accès Sygn . . . . . . . . . . . . . . . 1.4.1 Aperçu de Sygn . . . . . . . . . . . . . . . . . . . . . . 1.4.2 Le langage de Sygn . . . . . . . . . . . . . . . . . . . . 1.4.3 Les meta-données de Sygn . . . . . . . . . . . . . . . . 1.4.4 L’algorithme de décision de Sygn . . . . . . . . . . . . 1.4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Le stockage chiffré avec CryptStore . . . . . . . . . . . . . . . 1.5.1 Concepts de base de CryptStore . . . . . . . . . . . . . 1.5.2 Architecture de CryptStore . . . . . . . . . . . . . . . 1.5.3 Les meta-données de CryptStore . . . . . . . . . . . . . 1.5.4 Les algorithmes de CryptStore . . . . . . . . . . . . . . 1.5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Sygn et CryptStore intégrés dans une Grille . . . . . . . . . . 1.6.1 µgrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.2 Les standards OGSA et WSRF . . . . . . . . . . . . . 1.6.3 Intégration de Sygn dans une grille . . . . . . . . . . . 1.6.4 Intégration de CryptStore dans une grille . . . . . . . . 1.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1 1 2 5 5 6 7 7 9 9 9 12 12 12 14 15 17 19 20 20 22 22 23 25 25 26 27 28 28 2 Introduction 2.1 Security aspects of resource sharing on a Grid . . . . . . . . . 2.2 Why Grids pose novel security challenges . . . . . . . . . . . . 2.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 32 34 35 3 Motivation 3.1 Use-Cases . . . . . . . . . . . . . . . . . . . . . . . 3.2 General principles of good security . . . . . . . . . 3.3 Constraints of the Grid environment . . . . . . . . 3.4 Constraints of the application . . . . . . . . . . . . 3.5 Legal issues dealing with medical data . . . . . . . 3.5.1 European laws concerning privacy protection 3.5.2 French Law concerning privacy protection . . . . . . . . 37 37 39 41 43 44 44 47 . . . . . . . . . . . . . . . . . . . . . . . . . 49 49 50 50 51 52 53 54 57 57 57 59 59 60 60 61 61 62 62 62 63 64 64 64 65 65 4 Related Work in Access Control 4.1 Terminology . . . . . . . . . . . . . . . . . 4.2 Access Control Models . . . . . . . . . . . 4.2.1 Discretionary Access Control . . . . 4.2.2 Mandatory Access Control . . . . . 4.2.3 Role Based Access Control . . . . . 4.2.4 Current directions in access control 4.3 Authorization Frameworks . . . . . . . . . 4.4 Authorization Expression Languages . . . 4.4.1 KeyNote . . . . . . . . . . . . . . . 4.4.2 XACML . . . . . . . . . . . . . . . 4.4.3 XrML . . . . . . . . . . . . . . . . 4.4.4 General remarks . . . . . . . . . . 4.5 Standards for authorization assertion . . . 4.5.1 SAML . . . . . . . . . . . . . . . . 4.5.2 X.509 Attribute Certificates . . . . 4.5.3 SPKI . . . . . . . . . . . . . . . . . 4.6 Access Control Systems . . . . . . . . . . . 4.6.1 Shibboleth . . . . . . . . . . . . . . 4.6.2 Akenti . . . . . . . . . . . . . . . . 4.6.3 PERMIS . . . . . . . . . . . . . . . 4.6.4 CAS . . . . . . . . . . . . . . . . . 4.6.5 VOMS . . . . . . . . . . . . . . . . 4.6.6 Cardea . . . . . . . . . . . . . . . . 4.6.7 PRIMA . . . . . . . . . . . . . . . 4.6.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Related Work in Storage Security 5.1 Overview of encryption algorithms for storage 5.2 Standardization . . . . . . . . . . . . . . . . . 5.3 Encrypted storage systems . . . . . . . . . . . 5.3.1 CFS . . . . . . . . . . . . . . . . . . . 5.3.2 TCFS . . . . . . . . . . . . . . . . . . 5.3.3 CryptFS . . . . . . . . . . . . . . . . . 5.3.4 P. Gutmann’s SFS . . . . . . . . . . . 5.3.5 WinEFS . . . . . . . . . . . . . . . . . 5.3.6 SNAD . . . . . . . . . . . . . . . . . . 5.3.7 Cepheus . . . . . . . . . . . . . . . . . 5.3.8 J.P. Hughes’ SFS . . . . . . . . . . . . 5.3.9 C-SDA . . . . . . . . . . . . . . . . . . 5.3.10 Summary . . . . . . . . . . . . . . . . 6 Sygn access control 6.1 Sygn overview . . . . . . . . . . . 6.2 Syntax and semantics of the Sygn 6.2.1 Subjects . . . . . . . . . . 6.2.2 Objects . . . . . . . . . . 6.2.3 Actions . . . . . . . . . . 6.2.4 Capabilities . . . . . . . . 6.2.5 Authorization Certificates 6.2.6 Certificate Paths . . . . . 6.2.7 User requests . . . . . . . 6.2.8 Sygn-PDP responses . . . 6.2.9 Extensibility . . . . . . . . 6.2.10 Formal representation . . 6.3 PDP meta-data . . . . . . . . . . 6.4 PDP algorithm . . . . . . . . . . 6.5 Sygn performance . . . . . . . . . 6.6 Discussion . . . . . . . . . . . . . . . . . . . Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 CryptStore encrypted storage 7.1 Basic concepts of CryptStore . . . . 7.2 Architecture and use of CryptStore 7.3 CryptStore meta-data . . . . . . . 7.4 CryptStore algorithms . . . . . . . 7.4.1 Cryptographic algorithms . 7.4.2 Request handling . . . . . . 7.5 Security Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 73 77 77 77 78 78 78 79 79 80 81 81 82 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 85 87 87 89 91 91 92 94 95 95 96 97 99 101 110 112 . . . . . . . 115 . 115 . 117 . 121 . 122 . 122 . 125 . 126 . . . . . . . . . . . . . 7.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 8 Sygn and CryptStore in a Grid 8.1 µgrid . . . . . . . . . . . . . . . . . . . . 8.2 OGSA/WSRF standardized Grids . . . . 8.3 Integrating Sygn in a Grid . . . . . . . . 8.4 Setting up CryptStore as a Grid service . 8.5 Using Sygn for CryptStore access control 8.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 131 132 136 138 138 139 9 Conclusions and Future Works 141 A XML Schema for the Sygn language 145 B XML Schema for CryptStore 153 C Sygn permission creation GUI 155 List of Figures 1.1 Sygn algorithm, automaton representation, french version . . . 16 4.1 4.2 Authorization Message Sequences . . . . . . . . . . . . . . . . 55 Example of an XACML Policy . . . . . . . . . . . . . . . . . . 58 5.1 5.2 5.3 5.4 Cipher block chaining mode . . . Ciphertext stealing in CBC mode Cipher-feedback mode . . . . . . The lockbox concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 75 76 80 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.12 6.13 6.14 6.15 Sygn deployment and interactions . . . . . Example of Sygn user identifiers . . . . . . Example of Sygn role identifier . . . . . . Example of Sygn file-set identifier . . . . . Example of Sygn action . . . . . . . . . . Example of Sygn capability . . . . . . . . Example of Sygn add to set capability . . Example of Sygn authorization certificate . Example of Sygn certificate path . . . . . Example of Sygn request . . . . . . . . . . Example of a Sygn-PDP response . . . . . Example of Sygn administrative command Complex certificate path example . . . . . Sygn algorithm, automaton representation Sygn-PDP performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 88 89 90 91 92 92 94 95 96 97 100 102 104 111 7.1 7.2 7.3 7.4 7.5 7.6 Simple authorization example . . . . . . . . . . Authorization concerning sets of files . . . . . . Authorization concerning groups of users . . . . Authorization concerning sets of files and groups CryptStore usage . . . . . . . . . . . . . . . . . Example of a CryptStore meta-data header . . . . . . . . . . . . . . . . . . of users . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 117 118 118 120 123 xiii . . . . . . . . . . . . . . . . 7.7 7.8 7.9 The concept of secret sharing . . . . . . . . . . . . . . . . . . 124 Examples of CryptStore file owner requests . . . . . . . . . . . 126 Examples of CryptStore file user requests . . . . . . . . . . . . 126 8.1 8.2 Web service invocation . . . . . . . . . . . . . . . . . . . . . . 133 Relationship between OGSA, WSRF and Web services . . . . 133 C.1 Sygn Certificate Creation Tool . . . . . . . . . . . . . . . . . . 157 List of Tables 4.1 4.2 4.3 5.1 5.2 Summary of how different architectures respond to requirements of a medical application. . . . . . . . . . . . . . . . . . 67 Summary of how different architectures follow principles of good security. . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Summary of how different architectures respond to requirements of a Grid environment. . . . . . . . . . . . . . . . . . . 69 Summary of block cipher modes . . . . . . . . . . . . . . . . . 76 Summary of encrypted storage systems. . . . . . . . . . . . . . 83 xv Chapter 1 Résumé Français Ce résumé est destiné aux lecteurs francophones. Il a pour but de leur donner une idée précise du contenu de cette thèse. En raison de sa brièveté, les détails et explications précises de nombreux points ne peuvent être abordés dans ce résumé. Nous prions le lecteur intéressé de consulter la partie anglaise de ce document. 1.1 Introduction Le partage de ressources connaı̂t une popularité croissante depuis la création de l’Internet. L’utilisation de ressources matérielles, et surtout de la puissance de calcul est souvent caractérisée par de longues périodes d’inactivité, interrompues par de courts intervalles d’activité intensive. En réunissant de telles ressources pour les partager, chaque utilisateur peut avoir à sa disposition une puissance matérielle très importante, au moment où il en a besoin, pourvu que les utilisateurs n’aient pas tous besoin des ressources en même temps. Des applications consommant et produisant de grandes quantités de données peuvent ainsi profiter du partage de l’espace de stockage, surtout si ce partage est combiné avec un partage de ressources de calcul. L’espace de stockage partagé peut permettre notamment de conserver une copie des données près de l’application qui en a besoin. Le partage de données est lui aussi d’un grand intérêt, qu’il s’agisse d’objectifs informationels ou pour supporter des projets distribués ou des coopérations. Un problème qui apparaı̂t souvent dans le partage de ressources a travers un réseau est l’hétérogénité des systèmes utilisés qui les rend incapables d’interopérer. Même en présence de systèmes d’exploitation identiques, des 1 2 CHAPTER 1. RÉSUMÉ FRANÇAIS détails de configuration peuvent aussi faire échouer l’utilisation de ressources distantes. Une résolution de ce problème nécessite souvent et un travail compliqué de configuration manuelle et une excellente connaissance de spécificités de la ressource distante. Les grilles de calcul proposent une nouvelle approche pour faciliter le partage de ressources (comme la puissance de calcul, l’espace de stockage, les données ou les capteurs) et pour vaincre les difficultés liées à l’interoperabilité. Une architecture de grille de calcul implémente une plate-forme commune de partage de ressources. L’allocation et l’utilisation de ressources sont ainsi gérées de manière transparente pour l’utilisateur. Dans cette thèse nous traitons le problème de la sécurité de données confidentielles partagées à travers une grille de calcul. Les premières applications sur des grilles abordèrent des sujets qui nécessitent une grande puissance de calcul comme la physique de particules et l’observation terrestre. Les questions de sécurité, en particulier celles liées à la protection des données ont moins d’importance dans ces domaines. Plus récemment , plusieurs projets ont été menés pour le déploiement de grilles biomédicales dédiés à la mise en œuvre d’applications manipulant les données biologiques et médicales (réseaux de soins notamment [70]). En effet ce type d’applications manipulent des volumes considérables de données1 , distribuées (hôpitaux, centres de soins, médecins traitant etc.) et génèrent des traitements très coûteux en terme de puissance de calcul (par exemple de l’imagerie 2D et 3D, des études épidémiologiques sur des cohortes très importants). Les grilles de calcul constituent une solution architecturale (matérielle et logicielle) très prometteuse pour ce type d’application, à condition qu’elles garantissent un haut-niveau de confidentialité. Cette thèse propose une architecture de contrôle d’accès aux ressources d’une grille. La protection de données stockées nous a également amenés à nous intéresser au stockage chiffré. 1.2 Motivations Pour pouvoir évaluer l’état de l’art et pour décider des améliorations nécessaires, nous avons, dans une première étape listé les conditions et les contraintes liées au contrôle d’accès que devrait satisfaire une grille utilisée dans le cadre d’applications médicales. Les résultats de ces réflexions peuvent êtres classifiés en trois domaines : les principes de bonne sécurité en général, les spécificités des grilles de calcul 1 Un hôpital universitaire de taille moyenne génère chaque année de l’ordre de 1 à 10 To de données numériques (images médicales, dossiers patients, analyses biologiques, etc.) 1.2. MOTIVATIONS 3 et les conditions et contraintes liées aux applications médicales. En ce qui concerne la sécurité en général, nous sommes arrivés aux conclusions suivantes : – Il est préférable d’éviter une centralisation des services qui gèrent des fonctions ou données relatives à la sécurité. En effet, non seulement ce type de services passent mal à l’échelle, mais aussi ils représentent une cible idéale pour des attaques. – Il faut minimiser le nombre de tiers de confiance, pour réduire les possibilités d’attaques. – Concernant plus spécifiquement le contrôle d’accès, nous considérons qu’il est important d’utiliser des permissions minimales pour l’exécution de toute opération. Ceci réduit les dommages que peuvent faire des processus malhonnêtes agissant au nom et avec les permissions d’un utilisateur tiers honnête. – La séparation des tâches et des permissions qui y sont reliées est un autre principe important auquel nous adhérons. Ceci facilite la gestion des permissions et évite des abus résultants de combinaisons inattendues de permissions. – Nous considérons important de préserver la cohérence des permissions concernant des objets identiques. Lorsque de multiples copies d’une donnée peuvent exister, il doit être possible d’appliquer les mêmes permissions à chacune de ces copies. Ceci est d’autant plus important dans une grille, où des mécanismes de réplication peuvent générer automatiquement des copies de données pour les rapprocher d’un noeud de traitement. – Il nous semble essentiel de sécuriser les permissions, surtout pour le stockage à long terme. Cette sécurisation doit dans l’hypothèse où une permission serait interceptée, empêcher un pirate de l’utiliser pour luimême. Les contraintes apportées par les environnements de grille de calcul forment le deuxième ensemble thématique de contraintes, que nous avons examiné : – Les grilles réunissent des communautés d’utilisateurs qui évoluent dynamiquement. Ainsi les services qui nécessitent une pré-configuration avec les identités des utilisateurs ne sont pas viables. Des mécanismes de délégation de droits peuvent aider un système de contrôle d’accès à réagir d’une manière flexible à ces changements dynamiques. – Les ressources d’une grille de calcul (capacité de calcul, espace de stockage et données) sont soumises à une disponibilité dynamique. Le système de sécurité doit prendre en compte le fait que des ressources peuvent soudainement ne plus être disponibles. 4 CHAPTER 1. RÉSUMÉ FRANÇAIS – Les grilles de calcul réunissent des systèmes informatiques hétérogènes et fournissent un accès transparent aux ressources de ces systèmes. Il est donc essentiel que les services de sécurité sur une grille soient génériques et ne dépendent pas d’une architecture matérielle ou logicielle spécifique (par exemple au niveau du système d’exploitation). – Une grille de calcul permet le partage des ressources à travers des limites institutionnelles. Un système de sécurité ne peut donc pas être imposé au niveau de la grille, il doit rester sous le contrôle individuel des institutions participant à la grille (principe de subsidiarité). Le contrôle d’accès doit permettre d’appliquer à la fois la politique de sécurité de l’organisation de l’utilisateur et du propriétaire de la ressource concernée. – Les permissions concernant une donnée doivent être indépendantes de son lieu de stockage (cf. migration ou duplication de la donnée). – Enfin, les grilles de calcul mettent en œuvre un nombre important de ressources et d’utilisateurs. Des solutions qui fonctionnent bien a petite échelle peuvent se montrer défaillantes à grande échelle. Il est donc important que tout système utilisé sur une grille passe bien à l’échelle (propriété d’extensibilité). Un dernier groupe de contraintes concerne notre domaine ciblé : les applications biomédicales. Il est important de remarquer que les contraintes évoquées ci-dessous pourraient être dérivées de nombreuses autres applications qui manipulent des données confidentielles. – Les utilisateurs d’applications médicales ont des tâches structurées qui nécessitent des permissions spécifiques. Fréquemment, ces permissions ont une structure hiérarchique, dans laquelle des utilisateurs d’un niveau hiérarchique plus élevé héritent des permissions des niveaux hiérarchiques inférieurs (par exemple : Chef de clinique > Médecin en chef d’un service > Infirmière). Le contrôle d’accès par rôles (RBAC) est une approche très efficace pour gérer une telle structure de permissions. – Pour le traitement de données personnelles, et spécialement pour le traitement de données médicales, des conditions très sévères de protection sont imposées légalement. Une condition sine qua non est la traçabilité non-répudiable de tout accès à ces données. – Vues les lourdes responsabilités des propriétaires de données médicales il n’est pas envisageable qu’une autre personne qu’eux-mêmes en soit les sources d’autorité. Toute autorisation permettant l’accès à une donnée doit avoir une source qui remonte au propriétaire. Des systèmes de contrôle d’accès mettant à disposition des mécanismes de délégation peuvent aider à apporter une solution à cette condition. 1.3. ÉTAT DE L’ART 5 – Puisque des ressources de stockage sur une grille peuvent aussi être accédées localement, des mesures doivent êtres prises pour éviter un accès aux données confidentielles stockées sur ces ressources qui contournerait le système de contrôle d’accès. 1.3 État de l’art Dans la section suivante nous présentons un court état de l’art sur le contrôle d’accès puis le stockage chiffré, tout en motivant le lien entre ces deux domaines de la sécurité2 . 1.3.1 Modèles de contrôle d’accès Dans le domaine du contrôle d’accès, trois modèles sont généralement reconnus : – Le contrôle d’accès discrétionnaire (DAC). – Le contrôle d’accès obligatoire (MAC). – Le contrôle d’accès basé sur des rôles (RBAC). Dans le modèle DAC, les permissions sont représentées par une matrice, dans laquelle chaque ligne correspond à un utilisateur et chaque colonne à une ressource. Le contenu de chaque élément de cette matrice définit les droits d’accès pour l’utilisateur correspondant à la ligne sur la ressource correspondant à la colonne. Le modèle MAC attribue un niveau de sécurité à chaque utilisateur et à chaque ressource. On accorde l’accès à un utilisateur seulement si son niveau de sécurité est supérieur ou égal au niveau de la ressource à laquelle il veut accéder. De plus, pour éviter des fuites d’informations vers des niveaux moins sécurisés on interdit à un utilisateur qui fait usage de son niveau d’accès décrire des données d’un niveau inférieur. Ce concept peut être enrichi en utilisant une classification (par exemple armée, marine, force aérienne) des données et des utilisateurs en plus de leur niveau de sécurité. Des utilisateurs n’ont alors droit qu’aux ressources correspondant aux mêmes classes auxquelles ils appartiennent. Enfin le modèle RBAC a pour but de faciliter la gestion de permissions associées à des tâches. Des permissions sont regroupées par tâches et assignées à un rôle, qui sera attribué aux utilisateurs qui devront remplir cette tâche. Ainsi les changements de permissions lorsqu’un utilisateur est attribué à une nouvelle tâche deviennent plus facilement gérables, puisqu’il suffit de 2 Nous renvoyons le lecteur au corps du manuscrit pour une étude plus détaillée de l’état de l’art 6 CHAPTER 1. RÉSUMÉ FRANÇAIS changer les attributions de rôles. De même, si les permissions liées à une tâche changent, il suffit de rajouter ou d’enlever ces permissions au rôle. Le modèle RBAC introduit deux autres concepts importants : celui des rôles hiérarchiques (un rôle peut hériter de l’ensemble des permissions d’un rôle hiérarchiquement inférieur) et celui de séparation des tâches (l’utilisation de deux rôles simultanément peut être interdite pour éviter des abus résultant de la combinaison des droits associés). La flexibilité du modèle DAC pour des permissions ad-hoc et la gestion des droits performante du modèle RBAC sont deux attributs souhaitables pour notre application. 1.3.2 Séquence de messages pour le contrôle d’accès L’IETF et l’ISO ont définit des architectures (“frameworks”) pour le contrôle d’accès. Conceptuellement similaires, elles diffèrent surtout dans le choix du vocabulaire. Le RFC 2904 [96] propose trois séquences d’échange de messages entre les utilisateurs, les ressources et les serveurs d’autorisation : la séquence agent, la séquence pull et la séquence push. La séquence agent, fait intervenir le serveur d’autorisation comme agent entre l’utilisateur et la ressource. L’utilisateur interagit donc uniquement avec le serveur d’autorisation. Celui-ci transmet ses demandes à la ressource, après vérification des droits et se charge de faire parvenir la réponse de la ressource à l’utilisateur. La séquence pull charge la ressource de gérer toute l’interaction avec le serveur d’autorisation. L’utilisateur soumet sa requête à la ressource qui, elle, demande au serveur d’autorisation si cette demande est autorisée. Si la réponse est positive la ressource donne accès à l’utilisateur. La séquence push découple la ressource du serveur d’autorisation. Pour exécuter une requête, un utilisateur demande d’abord au serveur d’autorisation de lui certifier qu’il a droit à cette requête et présente ensuite cette certification à la ressource avec la requête. Ces trois séquences de messages sont illustrées dans la figure 4.1, page 55. La séquence push a l’avantage de permettre de découpler temporellement la certification d’une permission et son utilisation. La charge sur les serveurs d’autorisation (pour la séquence agent) et sur les ressources (pour la séquence pull) est réduite ce qui rend le système plus extensible. De plus, la séquence push permet de mettre en œuvre de manière sûre l’utilisation de permissions minimales, puisque l’utilisateur garde le contrôle des permissions qu’il fournit au service de contrôle d’accès. Si l’utilisateur a besoin de plusieurs autorisations de sources différentes ce modèle lui permet facilement de les récupérer séparément et ne nécessite pas de protocole de coopération entre les serveurs 1.3. ÉTAT DE L’ART 7 d’autorisation. L’inconvénient de la séquence push est qu’elle nécessite la mise en place d’un mécanisme de révocation de permissions, car il n’est pas possible de retirer une permission à un utilisateur avant sa date d’expiration, une fois qu’elle lui a été attribuée. 1.3.3 Langages d’expression de politiques de contrôle d’accès Pour exprimer des autorisations d’accès et définir les principes généraux d’une politique de contrôle d’accès, un langage formel est nécessaire. Plusieurs propositions pour de tels langages existent. Nous avons examiné KeyNote [14], XACML [52] et XrML [28] pour déterminer leurs apports possibles à nos travaux. Le langage défini par KeyNote [14] permet de lier des autorisations à des clés publiques, à l’instar de l’approche SPKI (voir section suivante). Il permet la délégation au travers de certificats. L’inconvénient du langage défini par KeyNote est qu’il ne prévoit pas de support pour le modèle RBAC, puisqu’il est orienté spécifiquement vers le modèle DAC. Le langage XACML [52] est une proposition de standard issue du consortium OASIS pour un langage générique de définition de politiques de contrôle d’accès. Basé sur XML, ce langage propose une grande variété de types de données et de fonctions pour combiner ou comparer les données. Par contre la généricité de XACML fait que même les politiques les plus simples sont très longues et difficiles à lire, comprendre et à modifier. De plus la délégation n’est pas prévue dans la version courante de XACML. Le langage XrML [28] est aussi un langage en format XML qui sert à décrire des politiques de contrôle d’accès. Son approche est fondamentalement la même que XACML. Par contre il est moins générique car orienté vers la gestion de restrictions numériques (DRM). De plus, XrML, n’a pas de support spécifique pour le modèle RBAC. 1.3.4 Certificats Certaines architectures de contrôle d’accès implémentent des mécanismes de stockage de permissions qui ne les protègent pas contre les modifications frauduleuses. Nous sommes de l’avis que pour une application nécessitant une forte sécurité ceci n’est pas satisfaisant. Dans un tel système, un maximum de données liées à la sécurité et spécialement au contrôle d’accès devraient êtres encodées dans des certificats portant des signatures digitales. Des séquences ordonnées de certificats d’autorisation permettent de former des chemins de certification qui peuvent êtres utilisés pour la délégation 8 CHAPTER 1. RÉSUMÉ FRANÇAIS de permissions. Les certificats permettent ainsi une gestion flexible et sécurisée de droits d’accès dynamiques. Nous avons examiné trois approches pour l’encodage de données d’autorisation sous forme de certificats : SAML [68], X.509 AC [45] et SPKI [39]. SAML est une proposition de standard du consortium OASIS, tout comme XACML. SAML est basé sur XML et propose des formats pour demander et fournir des certificats qui confirment une authentification, des attributs ou des permission d’un utilisateur. SAML tout comme XACML est très générique et met à disposition un grand nombre de types de données et de fonctions de comparaison. Ceci rend le langage SAML presque aussi difficile à lire que celui de XACML. En outre la spécification de SAML ne traite pas la délégation. Diverses propositions récentes ([77, 97]) proposent des extensions au standard SAML pour remédier ce manque. Les certificats d’attributs (AC) sont une extension du format de certificat X.509 utilisé pour l’authentification. Le but de cette extension est de permettre d’encoder des informations liées à l’autorisation dans des certificats X.509. La proposition de format est très limitée, car elle contient la recommandation de ne pas supporter la délégation par chaı̂nes de certificats jugée trop complexe. De plus, pour chaque ensemble d’attributs que peuvent certifier les AC X.509, il doit y avoir qu’une seule et unique autorité qui émet des certificats. Une telle restriction est fortement préjudiciable à une gestion décentralisée du contrôle d’accès, et impose des limites sévères au niveau du passage à l’échelle. Les RFCs 2692 et 2693 [38, 39] proposent une infrastructure SPKI qui est simple et basée sur des clés publiques pour la gestion de confiance (authentification et autorisation). Dans SPKI une liste de sources d’autorité est associée à chaque ressource, spécifiant les entités qui peuvent émettre des certificats de permissions concernant la ressource. SPKI définit un système de délégation utilisant des chaı̂nes de certificats. Toute entité est identifiée par sa clé publique, ce qui facilite la vérification de signatures digitales et évite les confusions qui peuvent avoir lieu à cause des homonymes. Le désavantage de ce système (comparé au fait de lier des permissions à des noms d’utilisateurs comme dans X.509), est que si une clé est révoquée toutes les permissions délivrées pour cette clé doivent êtres révoquées aussi. Si, par contre, on utilise un nom d’utilisateur on peut lui attribuer une nouvelle clé, sans pour autant devoir changer le nom. Dans ce cas il n’est pas nécessaire de révoquer les permissions qui sont associées à ce nom. Nous sommes de l’avis que les avantages de lier des permissions à une clé compensent largement ces inconvénients. Le travail sur la standardisation de SPKI a cessé depuis 2001 et donc quelques questions importantes comme le support de RBAC dans SPKI n’ont pas été traitées. 1.3. ÉTAT DE L’ART 1.3.5 9 Systèmes de contrôle d’accès Nous avons étudié un grand nombre de systèmes de contrôle d’accès distribués certains dédiés aux grilles de calcul. Il s’agit d’Akenti [91, 92] du Distributed Systems Department des laboratoires Lawrence Berkeley aux ÉtatsUnis, de PERMIS [23] de l’Information Systems Security Research Group de l’Université de Salford au Royaume-Uni, de CAS [79, 78] qui est un système spécifique aux grilles développé par la Globus Alliance, de VOMS [2] qui est aussi spécifique aux grilles et qui à été développé au cours du projet européen DataGrid, de Cardea [65] du NASA Advanced Supercomputing (NAS) Division au NASA Ames Research Center aux États-Unis et de PRIMA [67, 66] du Department of Computer Science au Virginia Polytechnic Institute and State University aux États-Unis. Confrontés aux conditions et contraintes que nous avons établis préalablement, aucun de ces systèmes ne s’est montré satisfaisant. Les résultats de cette analyse sont présentés dans les tableaux 4.1–4.3 pages 67– 69. 1.3.6 Pourquoi le stockage chiffré ? Une question que l’on doit se poser si l’on met en œuvre des mécanismes de contrôle d’accès est comment éviter que ces mécanismes soient contournés. Pour le contrôle d’accès aux données, ceci est particulièrement problématique lorsqu’un adversaire peut avoir accès au matériel physique de stockage. Dans un tel cas, il est toujours simple de désactiver les mécanismes de contrôle d’accès. Nous avons donc conclu qu’une solution satisfaisante pour la sécurité des données devait empêcher un tel accès direct aux données brutes. Pour cela, nous avons décidé de coupler à notre système de contrôle d’accès des mécanismes qui permettent le chiffrement des données. Le problème auquel il faut répondre pour qu’un système de stockage chiffré soit utilisable sur une grille, est celui du partage des fichiers chiffrés. Les groupes d’utilisateurs qui ont accès à de tels fichiers sont dynamiques, tout comme les ensembles de fichiers chiffrés partagés. Il faut donc gérer cette fluctuation de membres ou d’éléments d’une manière qui ne ralentisse pas l’accès aux fichiers de manière significative, ce qui exclut une distribution manuelle des clé de chiffrement. 1.3.7 Systèmes de stockage chiffré Nous avons examiné des systèmes des stockage chiffré, principalement en fonction de leurs mécanismes de partage de clés. CFS [12] développé par Matt Blaze en 1993 aux laboratoires AT&T est un des plus anciens systèmes 10 CHAPTER 1. RÉSUMÉ FRANÇAIS disponibles pour le stockage chiffré. CFS n’a aucun mécanisme de partage de clés. TCFS [21, 22] développé à l’Université de Salerne en Italie en 1997 apporte des améliorations vis-à-vis de CFS, mais ne possède toujours aucun mécanisme de partage de clés. CryptFS [103] est une architecture proposée dans le but d’améliorer les fonctions de CFS en les rendant plus efficaces et plus résistantes contre des attaques de personnes ayant des connaissances précises du système. Tout comme CFS et TCFS, CryptFS ne possède pas de mécanisme de partage de clés. SFS de Peter Gutmann [56], créé en 1995, est un autre système de stockage chiffré. Tout comme les systèmes présentés ci-dessus il ne propose pas de mécanisme de partage de clés, par contre il possède une fonction intéressante : pour pouvoir accéder à une clé en cas de perte, SFS permet de la diviser en morceaux en utilisant l’algorithme de partage de secrets de Shamir [90] et de distribuer ces morceaux à des tiers de confiance. En cas de perte, la clé peut être reconstituée en assemblant un certain nombre de morceaux transmis par les tiers de confiance. Le nombre des morceaux générés et le nombre de morceaux nécessaires à la reconstruction peuvent être librement choisis par l’utilisateur. Ils n’ont aucune influence sur la nature des morceaux que l’on utilise pour la reconstitution, dés lors qu’est réuni le nombre nécessaire. Par contre, un tiers de confiance seul ne peut déduire aucune information sur la clé complète à partir de la part qui lui a été confiée. Nous avons adapté cette idée pour notre approche (cf. section 1.5). WinEFS (Windows Encrypting File System) [73] est un système de stockage chiffré qui offre la possibilité de partager des clés. Pour chaque utilisateur qui a accès à un ficher, la clé de chiffrement est déposée dans l’entête du fichier, chiffrée avec la clé publique de cet utilisateur. Cette information (appelé un lockbox en anglais) permet à cet utilisateur (et à lui uniquement) d’accéder à cette clé de déchiffrement. Il est clair qu’un tel système passe mal à l’échelle, si l’on est confronté à des mises à jour fréquentes des permissions et à une communauté dynamique et large d’utilisateurs. SNAD [74] développé à l’Université de Californie en 2002 est un autre système de stockage chiffré qui utilise le concept de lockbox. Il a donc les mêmes limitations que WinEFS par rapport à notre application. Le système Cepheus [50] fut développé au MIT aux États-Unis entre 1998 et 1999 par Kevin E. Fu. Il est basé sur SFS de David Maziéres [71] et propose un serveur de groupes, pour gérer des fichiers chiffrés partagés par des groupes d’utilisateurs. Un fichier partagé par un groupe est chiffré avec une clé de groupe. Pour chaque membre du groupe, un exemplaire de cette clé est stocké dans un lockbox sur le serveur de groupes. Ce nécessite que l’administrateur du fichier chiffré connaisse tous les membres du groupe et qu’il mette à jour manuellement le serveur de groupes à chaque changement. 1.3. ÉTAT DE L’ART 11 Une telle approche est clairement inefficace dans un environnement distribué avec des groupes dont les membres changent dynamiquement. Hughes et al. ont proposé un autre système SFS, qui améliore le concept de serveur de groupes. Pour des utilisateurs individuels, SFS propose de stocker la clé de chiffrement dans l’entête du ficher à l’aide d’une lockbox. Pour le partage par groupes, l’entête du fichier contient des permissions signées par le propriétaire du fichier, qui spécifient quels sont les groupes qui ont accès au fichier, ainsi qu’une lockbox qui peut être déchiffrée uniquement par le serveur de groupes. Un membre du groupe qui veut avoir accès à la clé de déchiffrement doit envoyer les permissions et la lockbox au serveur de groupes. Celui-ci vérifie les permissions et, si elles sont correctes, déchiffre la lockbox et transmet la clé à l’utilisateur demandant l’accès. Le problème de SFS est que le serveur de groupes est un tiers de confiance, et donc une cible importante pour des attaquants. Le problème commun à tous les systèmes de stockage chiffrés qui autorisent le partage de fichiers est qu’ils introduisent une nouvelle couche de contrôle d’accès au système. Un utilisateur peut donc se retrouver dans une situation incohérente, dans laquelle le système de contrôle d’accès lui donne accès à un fichier, mais le système de partage de clés lui refuse l’accès à la clé de déchiffrement. Bouganim et al. ont proposé un système C-SDA, qui réalise un stockage chiffré à l’aide d’une carte à puce. Les clés auxquelles l’utilisateur a accès sont stockées sur la carte. Cette même carte gère aussi le déchiffrement des données : les clés ne quittent donc jamais la carte. Les données auxquelles un utilisateur peut avoir accès au travers de C-SDA peuvent êtres générées dynamiquement (une vue sur une base de données par exemple). De ce fait, les clés de chiffrement peuvent ne pas correspondre aux permissions d’accès. La carte gère donc aussi les permissions d’un utilisateur et les données dynamiques crées à partir des données brutes déchiffrées auxquelles l’utilisateur veut accéder. Le problème que nous voyons avec C-SDA est que la carte est considérée inviolable, même par l’utilisateur. Or depuis l’invention des attaques utilisant des canaux cachés par P. Kocher [62, 63], la communauté cryptographique ne cesse de développer des attaques contre des cartes à puces basées sur ce principe (par exemple [81] ou [69]). Il est donc nécessaire de souvent mettre à jour les mécanismes de protection des algorithmes utilisés sur une carte à puce, ce qui représente un effort considérable. 12 CHAPTER 1. RÉSUMÉ FRANÇAIS 1.4 Le système de contrôle d’accès Sygn Pour répondre aux conditions et contraintes que nous avons établies pour le contrôle d’accès sur une grille pour des applications médicales, nous avons conçu le système de contrôle d’accès Sygn 3 . Nous commençons par un aperçu du système, ensuite nous présentons le langage dans lequel des autorisations sont exprimées dans Sygn, puis les meta-données utilisées par Sygn et enfin l’algorithme de décision. Nous concluons par une discussion des principes de Sygn en regard de notre problématique et de l’état de l’art. 1.4.1 Aperçu de Sygn Puisque nos buts principaux sont d’offrir un support pour la gestion de décisions ad-hoc de contrôle d’accès et un système de délégation de permissions décentralisé, nous avons décidé d’utiliser des chaı̂nes de certificats d’autorisation pour attribuer des permissions. Nous avons choisi d’utiliser une séquence de messages push pour les raisons présentées en section 1.3.2. Les utilisateurs de Sygn vont donc obtenir et stocker eux-mêmes les certificats qui leur sont attribués. Ces certificats sont protégés contre des manipulations illicites par leur signature digitale. Le processus d’accès et de permissions se déroule de la manière suivante, illustré dans la figure 6.1, page 87 : l’administrateur d’une ressource la met à disposition sur la grille et s’inscrit comme source d’autorité (SOA) pour cette ressource dans la base de meta-données du serveur Sygn local (étape 1). Ensuite, il créé un ou plusieurs certificats d’autorisation, qui donnent le droit d’accéder à la ressource et les transfère aux utilisateurs concernés (étape 2). Ces utilisateurs stockent les certificats et les utilisent au moment où ils veulent accéder à la ressource, pour prouver au serveur Sygn local à la ressource leur droit d’accès (étape 3). 1.4.2 Le langage de Sygn Le langage de Sygn introduit différents éléments pour définir des permissions et des requêtes. Les utilisateurs sont identifiés par un uid. Un uid est une clé publique (Sygn suit l’approche de SPKI et KeyNote en liant des permissions à des clés publiques). Les identifiant de sujets (SID) servent à faire référence à un utilisateur (uid) ou à un rôle (rid). Les SID sont utilisés pour identifier le ou les posses3 Dans la mythologie nordique, Sygn est une déesse de la vérité, mais aussi des portes et des verrous. Elle garde l’entrée du palais Wingolf et ne laisse entrer que les personnes honnêtes. 1.4. LE SYSTÈME DE CONTRÔLE D’ACCÈS SYGN 13 seur(s) d’un certificat d’autorisation ainsi que le ou les source(s) d’autorité d’une ressource. Les identifiants d’objets (OID) servent à faire référence aux différents types de ressources pouvant êtres concernées par des permissions : les fichiers (fid), les collections de fichiers (fsid), les ressources matérielles (resid) et les rôles quand ils sont objets d’une permission (roid). Sygn définit des actions (action) qui peuvent être exécutées par un utilisateur sur une ressource. En utilisant ces éléments, il est possible de définir une capacité (CAPABILITY) qui consiste en un objet (OID) et une action sur cet objet. Ces capacités sont attribuées à des utilisateurs dans un certificat d’autorisation (AC). Il contient, en plus de la capacité, l’uid du créateur (CREATOR) du certificat, le SID du propriétaire du certificat (OWNER), des dates de limite de validité (NOT BEFORE, NOT AFTER), une limite de la profondeur de délégation (DELEGATION), des restrictions (NOT WITH) énumérant les rôles qui ne peuvent pas être utilisés en une même requête avec cet AC ainsi qu’une signature digitale par le créateur du certificat. Une chaı̂ne d’ACs (AC CHAIN) vise à autoriser une capacitée ciblée (TARGET) pour un certain utilisateur. On parle alors de chemin d’autorisation (PATH). Un utilisateur (ISSUER) peut soumettre à un serveur Sygn une requête (SURF) demandant la validation de plusieurs capacités ciblées par plusieurs chemins d’autorisation contenus dans la requête. Le langage Sygn (en version légèrement simplifiée) est décrit par la grammaire suivante, comprenant les symboles terminaux uid, rid, fid, fsid, resid, roid et action comme définis ci-dessus, ainsi que timestamp qui représente une heure et une date, integer value qui représente une valeur entière égale ou supérieure à zéro et signature qui représente une signature digitale. SID -> uid | rid OID -> fid | fsid | resid | roid CAPABILITY -> OID, action CREATOR -> uid OWNER -> SID NOT_BEFORE -> timestamp NOT_AFTER -> timestamp DELEGATION -> integer_value NOT_WITH -> rid | NOT_WITH, rid 14 CHAPTER 1. RÉSUMÉ FRANÇAIS AC -> CREATOR, OWNER, CAPABILITY, NOT_BEFORE, NOT_AFTER, NOT_WITH, DELEGATIONS, signature | CREATOR, OWNER, CAPABILITY, NOT_BEFORE, NOT_AFTER, DELEGATIONS, signature TARGET -> CAPABILITY AC_CHAIN -> AC | AC, AC_CHAIN PATH -> TARGET, AC_CHAIN ISSUER -> uid PATHES -> PATH | PATH, PATHES SURF -> ISSUER, PATHES 1.4.3 Les meta-données de Sygn Un serveur Sygn nécessite un certain nombre de meta-données pour son fonctionnement. Ces meta-données sont stockées dans une base de données située près du serveur. Puisque les informations contenues dans la base de données sont critiques pour la sécurité de Sygn, il est important de bien protéger l’accès à cette base. Sygn permet d’administrer à distance certaines de ces meta-données. Pour cela, un utilisateur soumet une commande administrative et un chemin de certificats qui autorise l’exécution de cette commande au serveur Sygn, qui vérifie les permissions et puis met à jour sa base de données. Les meta-données les plus importantes, stockées avec les serveurs Sygn sont les sources d’autorité (SOA) des ressources (matériels et fichiers) sous le contrôle du serveur. Les SOA sont les racines de toute délégation de droits sur leurs ressources. Les identifiants des SOA sont utilisées par le moteur de décision pour amorcer le processus de traitement d’une requête. Pour des ressources matérielles (puissance de calcul et espace de stockage), le serveur Sygn stocke aussi l’utilisation faite par les diverses entités autorisées, afin de faire respecter des quotas d’utilisation. Il est de la responsabilité de mécanismes extérieurs à Sygn de mesurer et transmettre ces données d’usage de ressources au serveur Sygn. Sygn maintient aussi une liste d’utilisateurs bannis de tout accès sur un site (blacklist). L’administrateur local peut ainsi exclure toute personne qui perturbe délibérément le fonctionnement du système. Les certificats d’autorisation créés par une personne bannie d’un site ne sont pas reconnus sur ce site. Pour supporter la révocation de certificats d’autorisation, Sygn maintient 1.4. LE SYSTÈME DE CONTRÔLE D’ACCÈS SYGN 15 aussi une liste des identifiants de certificats invalidés. Un certificat peut être invalidé soit par son créateur, soit par le SOA de la capacité qu’il délègue. Si le traçage est activé, le serveur Sygn se charge aussi de sauvegarder toutes requêtes qui lui sont soumises. 1.4.4 L’algorithme de décision de Sygn L’algorithme de décision de Sygn traite les chemins de certificats et décide si ceux-ci donnent droit à la capacité ciblée. Il est le cœur de l’architecture de Sygn. Fondé globalement sur le principe de l’induction complète, il utilise une mémoire globale. Nous décrivons informellement cet algorithme à l’aide d’un automate. Les paramètres de départ de l’algorithme sont une capacité : cible, composée d’une action cible et d’un objet cible ; et d’un émetteur requête pour lequel le chemin doit autoriser la capacité cible. L’algorithme utilise la variable cible actuelle qui peut varier de cible si objet cible est rajouté à une collection. cible actuelle consiste d’une action actuelle qui est toujours égale à action cible et d’un objet actuel qui est mis à jour lorsque objet actuel est rajouté à une collection. La valeur initiale de la variable cible actuelle est cible. L’automate qui représente l’algorithme est illustré par la figure 1.1. Il possède trois états de base ainsi que quatre états intermédiaires qui traitent la délégation d’un rôle et la déclaration d’une hiérarchie de rôles. Les transitions entre les états se font en fonction du certificat suivant dans le chemin, dont la fonction est décrite dans l’état de destination. Les transitions peuvent êtres liées à des conditions supplémentaires qui sont indiquées séparément dans la figure auprès des transitions. Les états de base sont : – l’attribution de la permission d’utiliser la capacité cible actuelle et l’état associé dans lequel l’automate passe si cette permission est attribuée à un rôle. – l’ajout d’objet actuel à une collection et l’état associé dans lequel l’automate passe si la source d’autorité (SOA) de cette collection est un rôle. La collection à laquelle on ajoute objet actuel devient le nouvel objet actuel. Des hiérarchies de collections peuvent êtres déclarées implicitement à partir de cet état de l’automate, si objet actuel est déjà une collection. – l’attribution de la permission d’ajouter objet actuel à une collection et l’état associé dans lequel l’automate passe si cette permission est attribuée à un rôle. L’algorithme a trois ensembles de conditions : les conditions de départ, les conditions d’induction et les conditions de fin. Pour qu’un chemin de certi- CHAPTER 1. RÉSUMÉ FRANÇAIS 16 Cette police : Variables Cette police : Conditions et actions des états SOA de objet_cible est un rôle Cette police : Commentaires attribution de la permission d’activer un rôle délégation ou déclaration d’une hierarchie de rôles attribution de la permission d’activer un rôle attribution à émetteur_requête délégation ou déclaration d’une attribution attribution de la attribution de la à un rôle permission de rajouter permission d’activer objet_cible à une un rôle collection délégation attribution à émetteur_requête permission d’utiliser délégation ou cible_actuelle déclaration d’une hierarchie de rôles délégation attribution de la déclaration permission d’activer hierarchie de collections un rôle attribution de la ajouter objet_cible SOA de hierarchie de rôles à une collection la collection objet_cible := collection est un rôle attribution à un rôle délégation ou déclaration d’une hierarchie attribution à émetteur_requête de rôles Fig. 1.1 – Représentation informelle de l’algorithme de décision de Sygn par un automate. Une transition vers un état est initiée par un certificat. Le texte d’un état indique la nature du certificat qui a initié la transition, les textes adjacents aux transitions sont soit des explications soit des conditions supplémentaires. 1.4. LE SYSTÈME DE CONTRÔLE D’ACCÈS SYGN 17 ficats soit valide, son premier certificat doit vérifier les conditions de départ, chaque paire consécutive de certificats doit vérifier les conditions d’induction et le dernier certificat doit vérifier les conditions de fin. Il vérifie quatre cas à chaque étape de la chaı̂ne de délégation : – La délégation simple de la capacité ciblée. – L’activation de rôles donnant la permission de déléguer la capacité ciblée (ceci peut inclure l’activation d’une hiérarchie de rôles). – La délégation de la permission d’ajouter l’objet de la capacité ciblée dans une collection. – L’ajout de l’objet cible soit rajouté à une collection. La collection devient alors le nouvel objet de la capacité ciblée. La collection peut par la suite elle-même être ajoutée à d’autres collections, créant ainsi des hiérarchies de collections. 1.4.5 Discussion Sygn utilise une séquence push pour transmettre les permissions au serveur de contrôle d’accès. Puisque les ACs de Sygn ne sont pas uniquement conçues comme permissions de courte durée de vie, mais peuvent aussi êtres utilisées pour stocker des permissions permanentes, il est nécessaire de mettre en place un système de révocation, capable d’invalider une permission avant que l’AC par laquelle elle a été attribuée arrive à son expiration. Cet inconvénient est propre aux séquences push et doit être comparé à leurs avantages. Un utilisateur est ainsi capable de soumettre exactement les ACs dont il a besoin pour autoriser une requête, ce qui lui permet de suivre le principe d’utilisation de permissions minimales. De plus l’utilisateur peut choisir exactement quelles seront les permissions qui seront exposées aux différents services de la grille. Le fait de relier des permissions à des clés publiques possède un inconvénient comparé aux approches utilisant des noms d’utilisateur : si la clé privée correspondant à la clé publique est volée, toutes les permissions liées à la clé publique doivent êtres révoquées. Par contre, ce système a l’avantage de rendre la vérification de signatures plus facile et d’éviter le transfert de chaı̂nes de certificats d’authentification pour lier un nom d’utilisateur à une clé publique. De plus il évite les problèmes qui peuvent avoir lieu avec des homonymes (noms d’utilisateur identiques). Une propriété centrale de Sygn est le support pour la création décentralisée de permissions. Différents SOAs peuvent administrer les permissions à un niveau très précis, sans l’intervention d’un tiers parti. Par contre, cette propriété induit qu’il est impossible d’être sûr de l’ensemble des permissions données à un utilisateur ou à un rôle. Les résultats de fonctions 18 CHAPTER 1. RÉSUMÉ FRANÇAIS d’aperçu de permissions (obligatoires dans le standard RBAC) ne sont pas nécessairement complets. Pour assurer des résultats complets, il faudrait faire valider par un service central toutes les permissions, ce qui mettrait en cause les avantages de l’attribution décentralisée et ad-hoc des permissions. Pour cela, les fonctions d’aperçu de permissions dépendent de la bonne volonté des créateurs de permissions, qui sont censés enregistrer toutes les permissions qu’ils créent pour un rôle dans un système d’aperçu. Une autre propriété centrale de Sygn est son mécanisme de délégation. Suivant l’approche de SPKI, nous avons examiné les trois choix suivants pour le contrôle de la délégation : 1. Pas de contrôle. Tout utilisateur peut déléguer l’ensemble des permissions qui lui sont attribuées. 2. Contrôle booléen. Chaque permission spécifie si l’utilisateur a le droit de la déléguer ou non. 3. Contrôle de la profondeur de délégation. Chaque permission spécifie à combien d’étapes de niveaux elle peut être déléguée. Les arguments pour la première option sont que si l’on restreint la délégation, les utilisateurs vont partager leurs authentificateurs pour réaliser une délégation en contournant les restrictions. D’après cette argumentation il est donc nuisible à la sécurité du mécanisme d’authentification d’imposer des contraintes sur la délégation de permissions. Nous avons choisi de ne pas suivre cette argumentation, car nous sommes de l’avis que l’éducation des utilisateurs à la sécurité devrait empêcher de telles aberrations. Si les utilisateurs d’un système ont de telles mauvaises habitudes concernant la sécurité, aucun système ne parviendra à protéger les ressources sur une grille contre des accès illicites. L’argument pour les deux autre solutions est qu’il peut être nécessaire de restreindre la délégation de permissions, pour des raisons de responsabilité (légale) du SOA. Si une permission déléguée est abusée, le SOA peut être tenu partiellement responsable, puisqu’il est au sommet de la chaı̂ne de délégation. Il peut donc être important de différencier si une permission peut être déléguée ou non. Les créateurs de SPKI argumentent qu’un contrôle de la profondeur de délégation ne donne aucun contrôle réel sur la prolifération d’une permission déléguée, car seule la profondeur de délégation et pas le nombre de délégations au même niveau peut être contrôlé. Ils optent donc pour un contrôle booléen. Nous admettons que cet argument est valide, mais nous sommes quand même de l’avis que le contrôle de la profondeur de délégation est préférable au contrôle booléen. En effet, le contrôle de la profondeur permet de restreindre la profondeur de l’arbre de délégation, permettant plus facilement 1.5. LE STOCKAGE CHIFFRÉ AVEC CRYPTSTORE 19 de trouver les responsables en cas d’abus de permission. De plus, les grilles utilisent souvent des mécanismes de proxy, pour créer des authentificateurs temporaires a partir d’authentificateurs à long terme (voir [93] pour plus de détails). Avec un contrôle de délégation booléen, il faudrait donc donner à un utilisateur le droit de délégation sur toutes ses permissions à longue durée de vie. Ceci rendrait le contrôle de la délégation presque inutile. Avec un contrôle de profondeur on peut restreindre la délégation à un niveau, permettant aux utilisateurs de déléguer leurs permissions à leurs proxys. Pour ces raisons, nous avons choisi le contrôle de profondeur de délégation pour Sygn. Sygn permet l’utilisation de RBAC pour la gestion de permissions, mais aussi l’utilisation de permissions du type DAC. Cette dualité permet d’adapter le type des permissions aux situations dans lesquelles elles sont les plus appropriées. Si une structure complexe de permissions est en place ou si les autorisations sont basées sur des tâches, Sygn permet l’utilisation de RBAC. Pour la création de permissions ad-hoc ou dans des situations similaires où RBAC est trop lourd à utiliser, Sygn gère des permissions DAC qui sont plus faciles à créer et à utiliser. Enfin, la structure de Sygn permet de supporter des scénarios dans lesquels des permissions multiples sont nécessaires simultanément. Un exemple simple serait la réplication d’un fichier sur la grille. Une telle opération nécessite la permission de lecture du fichier, ainsi que la permission d’utiliser un certain espace de stockage sur le site de réplication. En utilisant des requêtes Sygn avec plusieurs chemins de certificats, les autorisations pour de telles opérations peuvent êtres groupées dans une requête de contrôle d’accès de manière pratique. 1.5 Le stockage chiffré avec CryptStore Afin de répondre au problème de la protection du stockage de données et du partage de clés de fichiers chiffrés, nous avons conçu le système CryptStore. Nous présentons d’abord les concepts de base de CryptStore. Ensuite nous exposons l’architecture de CryptStore et son utilisation. Les meta-données de CryptStore sont le sujet de la section suivante. Les algorithmes de CryptStore sont présentés et finalement nous concluons avec une discussion des propositions de CryptStore. 20 1.5.1 CHAPTER 1. RÉSUMÉ FRANÇAIS Concepts de base de CryptStore CryptStore permet à un utilisateur qui contrôle un fichier de le chiffrer, avant de le stocker sur une grille. Les meta-données pour le traitement d’un tel fichier chiffré sont automatiquement générées par un outil qui fait partie de CryptStore et qui peut soit être intégré dans une interface à la grille permettant la gestion de fichiers en général, soit être utilisé séparément. Pour permettre le partage de fichiers chiffrés, CryptStore nécessite la mise en place de plusieurs serveurs de clés. Les utilisateurs voulant accéder à un fichier chiffré peuvent soumettre une requête aux serveurs de clés pour obtenir la clé de déchiffrement. Pour éviter que les serveurs de clés deviennent euxmêmes des cibles attractives pour des attaques, une clé destinée au stockage sur les serveurs est divisée en plusieurs parties grâce à l’algorithme de partage de secrets de Shamir [90]. L’outil d’administration de fichiers de CryptStore gère les tâches reliées au chiffrement du fichier, la génération des parties de la clé et le stockage des parties et des meta-données associées sur les serveurs de clés. Pour accéder à un fichier chiffré, CryptStore met à disposition un outil qui sert à retrouver les serveurs de clés dans les meta-données d’un fichier chiffré, à contacter les serveurs pour récupérer les parties de la clé, à reconstruire la clé à partir des parties et finalement à déchiffrer le fichier. L’accès aux parties de la clé est contrôlé en utilisant les permissions d’accès aux fichiers. Les serveurs de clés ont donc une interface générique, qui permet de les intégrer avec le système de contrôle d’accès de la grille. Si le système de contrôle d’accès fonctionne de manière décentralisée, une instance du système de contrôle d’accès peut être co-localisée avec le serveur de clés. 1.5.2 Architecture de CryptStore CryptStore nécessite le mise à disposition de trois composants pour être fonctionnel sur une grille : l’outil d’administrateur de fichier, l’outil d’accès aux fichiers chiffrés et les serveur de clés. L’outil d’administration gère les fonctions suivantes : – Chiffrement du fichier sur la machine de l’utilisateur. – Optionellement, la création d’un code d’authentification de message à partir de la clé de chiffrement pour sécuriser l’intégrité du fichier. – La génération des parties de clé et leur stockage sur des serveurs de clés avec les meta-données associées. – Le stockage, dans l’entête du fichier, de meta-données permettant de retrouver les serveurs de clés et de configurer l’algorithme de 1.5. LE STOCKAGE CHIFFRÉ AVEC CRYPTSTORE 21 déchiffrement. – La mise à jour des parties de clés et des autres meta-données sur les serveurs de clé en cas de renouvellement du chiffrement. L’outil d’accès aux fichiers chiffrés prend en charge les fonctions suivantes : – Extraction à partir de l’entête du fichier chiffré des adresses des serveurs de clés qui stockent des parties de la clé de déchiffrement . – Soumission de requêtes aux serveurs de clés pour récupérer les parties de clé. L’utilisateur doit intervenir pour l’authentification et il doit aussi fournir les permissions si le système de contrôle d’accès utilise une séquence de messages push. – La reconstruction de la clé en fonction des parties. – Le déchiffrement du fichier, comprenant l’extraction des paramètres de configuration de l’algorithme de déchiffrement. Les serveurs de clés mettent à disposition les services suivants : – Stockage et mise à jour de parties de clés et d’identifiants de fichiers auxquels correspondent ces parties. – Point d’accès pour l’outil d’accès aux fichiers, qui permet de soumettre des requêtes pour des parties de clés. – Interface générique avec le système de contrôle d’accès de la grille, qui sert à déterminer à quelles parties de clé un utilisateur peut accéder. L’utilisation de CryptStore est illustrée par la figure 7.5 sur la page 120. Elle se fait en sept étapes. La première étape est prise en charge par l’outil d’administration et consiste à chiffrer le fichier et à générer des parties de la clé de chiffrement. A la deuxième étape, l’outil d’administration contacte différents serveurs de clés pour stocker les parties de clé et l’identifiant du fichier associé. Ensuite, à la troisième étape, l’outil administrateur génère les meta-données qui servent à retrouver les serveurs de clés et stocke le fichier chiffré avec ces informations en entête sur un serveur de stockage de la grille. La quatrième étape se déroule hors des fonctions de CryptStore et consiste pour l’administrateur du fichier à donner des permissions d’accès au fichier à un utilisateur. Cet utilisateur récupère le fichier chiffré grâce à ses droits d’accès et utilise l’outil d’accès de CryptStore pour trouver les adresses des serveurs de clés dans l’entête du fichier (étape cinq). Dans une sixième étape l’utilisateur contacte les différents serveurs de clés et récupère autant de parties de clé que nécessaire pour la reconstruction de la clé. La septième et dernière étape de l’utilisation de CryptStore consiste à reconstruire la clé de déchiffrement grâce à l’outil d’accès et à déchiffrer le fichier. 22 1.5.3 CHAPTER 1. RÉSUMÉ FRANÇAIS Les meta-données de CryptStore CryptStore nécessite trois catégories de meta-données : les meta-données relatives aux paramètres de la fonction de chiffrement (exceptée la clé), les meta-données qui permettent de localiser les serveurs de clés pour un fichier chiffré, et les meta-données des serveurs de clés, qui leur permettent d’associer une partie de clé à un fichier. Les deux premiers types de meta-données doivent êtres stockés avec le fichier chiffré et peuvent aussi comprendre optionellement le code d’authentification de message qui sert à vérifier l’intégrité du message. Le design actuel de CryptStore prévoit de stocker ces informations dans l’entête du fichier chiffré. La taille de ces données est relativement faible et n’augmente donc pas beaucoup la taille du fichier. Le fichier chiffré peut ensuite être traité comme un fichier standard par les systèmes de stockage de la grille. Nous sommes néanmoins conscients d’applications où ce système ne pourrait pas être utilisable : si les données chiffrées sont stockées dans des tables de bases de données qui ont une taille fixe, le fait d’ajouter les informations d’entête peut accroı̂tre la taille de la donnée au-delà de la limite fixée. En un tel cas, CryptStore devrait être légèrement modifié pour permettre le stockage de ces meta-données à l’extérieur du ficher. Nous reviendrons sur ce problème lors de la discussion. Si un mécanisme décentralisé de contrôle d’accès est co-localisé avec les serveurs de clés, il peut s’avérer nécessaire de stocker aussi les SOA des fichiers pour lesquels le serveur stocke des parties de clé. Par contre ces meta-données sont gérées par le système de contrôle d’accès et non par CryptStore. 1.5.4 Les algorithmes de CryptStore Pour le chiffrement de fichiers, il faut tout d’abord choisir si l’on veut utiliser un algorithme de chiffrement de blocs ou un algorithme de chiffrement de flux. En général, les algorithmes de chiffrement de flux sont plus rapides que les algorithmes de chiffrement de blocs. Par contre il n’est pas sûr de réutiliser les clés avec un chiffrement de flux. Puisque dans CryptStore il est assez coûteux de renouveler une clé (nécessité de contacter tous les serveurs de clés concernés) il peut s’avérer utile de pouvoir re-chiffrer des données avec la même clé. Nous avons donc choisi d’utiliser AES qui est un algorithme de chiffrement de blocs. AES est le standard américain qui est aussi utilisé par la majorité des produits cryptographiques non-américains. Nous utilisons le mode cipher block chaining (CBC) qui cache les répétitions dans les différents blocs du fichier chiffré et qui permet un accès aléatoire (random access) aux blocs du fichier. Pour être capables de garder la taille du fichier constante en 1.5. LE STOCKAGE CHIFFRÉ AVEC CRYPTSTORE 23 vue de possibles restrictions de stockage, nous utilisons aussi la technique du ciphertext stealing (CTS) pour le chiffrement du dernier bloc du fichier (cf. figure 5.2 page 75. Pour la protection de l’intégrité du fichier, nous avons choisi d’utiliser des codes d’authentification de messages (MAC, ne pas confondre avec le contrôle d’accès obligatoire) plutôt que des signatures digitales. Contrairement aux signatures digitales, un MAC utilise une clé secrète pour générer le code d’authentification du message (empreinte). L’utilisation d’un MAC est plus pratique pour protéger l’intégrité d’un fichier qui peut potentiellement être modifié par plusieurs utilisateurs. Une signature digitale il rendrait nécessaire de fournir la clé publique du signataire pour pouvoir la vérifier. Avec un MAC il nous est possible d’utiliser la même clé qui a servi au chiffrement pour générer l’empreinte du fichier. L’algorithme de MAC que nous utilisons est le HMAC, car il est standardisé et utilisé par de nombreux systèmes. Les parties de clés utilisées pour le stockage sur les serveurs de clés sont créées avec l’algorithme de partage de secrets de Shamir [90]. L’algorithme permet à un utilisateur de choisir deux paramètres : le nombre n de parties qui seront générées, et le nombre m (n ≥ m) de parties qui seront nécessaires pour reconstruire le secret. N’importe quel ensemble de m parts ainsi générées permet de reconstruire le secret partagé. Aucun ensemble contenant moins de m parts donne une information qui permet de réduire la complexité d’une recherche exhaustive pour trouver le secret. Ce principe est illustré par la figure 7.7 sur la page 124. Pour plus de détails concernant les algorithmes cryptographiques nous renvoyons le lecteur à [87]. 1.5.5 Discussion Dans cette section nous discutons les choix algorithmiques et architecturaux faits pour CryptStore. Notre application concerne le traitement de données médicales, qui peuvent comprendre des images radiologiques, tomographiques, IRM etc. très volumineuses. Nous devons donc prendre en compte le fait que les fichiers manipulés peuvent avoir une taille très importante. Dans cette perspective les algorithmes de chiffrement de flux surpassent les algorithmes de chiffrement de blocs. A l’inverse ceux-ci permettent de re-chiffrer les données avec une même clé sans compromettre la sécurité. Un autre critère important est la capacité d’un algorithme de chiffrement de ne pas changer la taille de la donnée chiffrée. Cette propriété est inhérente aux chiffrements de flux puisqu’ils traitent des flux de bits (ou d’octets dans le cas de l’algorithme RC4) un par un. Pour les chiffrements de blocs, la propriété de ne pas changer 24 CHAPTER 1. RÉSUMÉ FRANÇAIS la taille de la donnée peut être obtenue en utilisant le mode de chiffrement ciphertext stealing (CTS) pour le dernier bloc de la donnée. Pour permettre des mises à jour fréquentes des données chiffrées par plusieurs utilisateurs différents, sans être obligé de changer la clé à chaque fois, nous avons choisi d’utiliser un chiffrement de blocs en mode CBC. Le CBC permet aussi l’utilisation du CTS, ce qui rend possible le chiffrement de données sans augmenter leur taille, si cela s’avérerait nécessaire. La décision de stocker les meta-données relatives au chiffrement et aux serveurs de clés dans l’entête des fichiers chiffrés a été prise dans le but de pouvoir traiter les fichier chiffrés comme d’autres fichiers du point de vue des serveurs de stockage de la grille. Nous sommes conscients que cela change la taille de la donnée, un fait qui peut poser problème si la donnée est stockée dans une base de données. Une extension du design actuel de CryptStore pour permettre le stockage externe de ces meta-données ne poserait pas de problème majeur, puisque l’architecture de grille doit de toute façon stocker des meta-données relatives aux fichiers sur la grille. Ces mécanismes de stockage de meta-données pourraient êtres utilisés pour stocker les meta-données de CryptStore. La décision d’utiliser un algorithme de partage de secrets pour le stockage des clés sur les serveurs de clés est motivée par le paradigme général de cette thèse d’éviter si possible les tierces parties qui peuvent être un point central de faille. Il semble impossible d’éviter d’utiliser une tierce partie si nous voulons supporter le partage de collections de fichiers chiffrés par des groupes d’utilisateurs dynamiques. Pour limiter l’impact d’une attaque sur un des serveurs de clés, nous avons choisi de ne pas leur confier les clés en entier. Grâce aux propriétés des algorithmes de partage de secrets nous bénéficions en outre d’autres avantages : CryptStore est robuste contre un certain nombre de failles des serveurs de clés, si le nombre de parties de clés créées est supérieur au nombre nécessaire pour la reconstitution de la clé. De plus le stockage de parties de clé peut servir comme sauvegarde de la clé, au cas où l’administrateur du fichier chiffré la perdrait. Pour l’utilisation de CryptStore, il est important de décider d’une politique de re-chiffrement. Si les permissions d’un utilisateur qui avait accès à la clé de déchiffrement sont révoquées, nous ne pouvons pas être sûr qu’il n’a pas gardé une copie de la clé de déchiffrement. Il existent trois possibilités pour traiter ce cas : La première est de ne rien faire et d’espérer que le système de contrôle d’accès empêchera l’accès au fichier, la deuxième est de re-chiffer le fichier avec une nouvelle clé, des qu’il est mis à jour (re-chiffrement paresseux ) et la troisième est de re-chiffrer le fichier immédiatement avec une nouvelle clé. Puisque nous ne pouvons pas empêcher des utilisateurs qui ont eu accès au fichier d’en faire des copies et de les diffuser à des personnes non- 1.6. SYGN ET CRYPTSTORE INTÉGRÉS DANS UNE GRILLE 25 autorisées, nous conseillons d’utiliser le re-chiffrement paresseux, qui empêche qu’un utilisateur dont les droits ont été révoqués prenne connaissance des mises à jour d’un fichier. Ce problème de re-chiffrement pourrait être évité si les utilisateurs n’avaient jamais accès à la clé de déchiffrement. Ceci nécessiterait la mise en place d’un service de déchiffrement. Par contre un tel service serait un tiers de confiance et un point central de faille. Nous avons donc décidé de laisser faire le déchiffrement sur la machine de l’utilisateur final de la donnée, où il est géré par l’outil d’accès de CryptStore. Le concept le plus important de CryptStore est l’interface générique avec le service de contrôle d’accès. La motivation pour cette approche est de garder les permissions d’accès aux fichiers cohérentes avec les permissions d’accès aux clés qui permettent le déchiffrement de ces fichiers. Nous avons donc choisi d’éviter de rajouter une deuxième couche de contrôle d’accès. L’interfaçage avec le service dédié au contrôle d’accès nous permet de prendre des décisions d’accès cohérente avec les décisions d’accès aux fichiers sur la grille. Cette approche nécessite que les propriétaires de fichiers soient aussi leur source d’autorité (SOA) pour toute décision de contrôle d’accès les concernant. Si comme dans le cas de VOMS les administrateurs des sites de stockage sont les SOA pour l’accès aux fichiers, notre approche n’apporte pas d’amélioration de la sécurité. Puisqu’une des conditions que nous défendons dans cette thèse est le contrôle d’accès par les possesseurs, cette contrainte est compatible avec l’approche générale que nous visons. 1.6 Sygn et CryptStore intégrés dans une Grille Nous présentons dans cette section nos travaux liés à l’intégration de nos systèmes Sygn et CryptStore dans un environnement réel de grilles de calcul. A titre d’exemple nous avons choisi deux architectures de grilles de calcul : µgrid, une architecture de grille minimale développée par Johan Montagnat et Diane Lingrant [88], et Globus Toolkit version 4, qui offre des services standardisés OGSA et WSRF. 1.6.1 µgrid L’architecture µgrid à été créée comme architecture de grille de calcul minimale pour des tests d’applications scientifiques sur une grille. L’idée de base de µgrid est d’être simple à installer, configurer et administrer, ce qui 26 CHAPTER 1. RÉSUMÉ FRANÇAIS n’est pas le cas des architectures de grilles utilisées à grande échelle pour des applications de production. L’architecture µgrid est composée de trois parties : le client utilisateur, qui permet aux utilisateurs d’accéder à la grille, le gestionnaire de ferme qui est le point d’entrée à la grille, groupant les ressources, gérant l’agencement des tâches, l’attribution des ressources et la répartition des données. Les ordinateurs qui fournissent des ressources à la grille sont dirigés par le troisième composant, le gestionnaire d’hôtes, qui prend en charge l’exécution des calculs et le stockage des données. Toute communication entre les composants est réalisée par des sockets, utilisant une architecture client/serveur. Ainsi µgrid permet le partage transparent de ressources tout en étant simple d’utilisation. Concernant les fichiers, µgrid permet de copier des fichiers d’un disque local sur la grille et vice-versa, de répliquer un fichier sur la grille et de supprimer un fichier sur la grille. Une API C++ permet d’utiliser ces commandes de manipulation de fichiers à partir d’un logiciel exécuté sur la grille. L’authentification est implémentée en utilisant OpenSSL et une infrastructure de clés publiques (PKI). Chaque utilisateur, chaque ferme et chaque hôte est muni d’un propre certificat leur permettant une authentification mutuelle. La version actuelle de µgrid part du principe qu’il existe une autorité de certification unique pour la grille entière. Dans sa version actuelle, µgrid ne passe pas bien à l’échelle, le gestionnaire de ferme étant vite surchargé si on lui attribue trop de ressources à gérer. Pour cette raison, les auteurs de µgrid envisagent d’ajouter une couche de serveurs au-dessus des gestionnaires de ferme, qui serviront aussi comme nouveaux points d’accès à la grille. 1.6.2 Les standards OGSA et WSRF L’architecture ouverte de services de grilles (OGSA) est un standard développé par le Global Grid Forum (GGF). OGSA se veut une architecture commune pour des applications s’exécutant sur une grille de calcul. OGSA nécessite une architecture de services de calcul distribué pour être mise en œuvre. Cette architecture est réalisée par des services web. Les services web utilisent le langage WSDL pour décrire et publier leurs interfaces exposées sur un réseau. Le protocole de communication généralement utilisé est SOAP. Le W3C a défini les services web comme étant sans état. De ce fait, les services web purs ne suffisent pas aux spécifications du standard OGSA, qui demande des services pouvant gérer des informations d’état. Pour cette raison, le consortium OASIS a développé le modèle de ressources web services WSRF. WSRF spécifie comment des services web 1.6. SYGN ET CRYPTSTORE INTÉGRÉS DANS UNE GRILLE 27 peuvent êtres augmentés d’informations d’état. Le standard OGSA n’adresse que très brièvement le contrôle d’accès (voir [76]) en précisant que chaque domaine aura généralement son propre service d’autorisation, et donc que le modèle d’autorisation d’une grille devra être basé sur des standards en cours de production, comme XACML, SAML et WS-Authorization pour garantir l’interopérabilité. 1.6.3 Intégration de Sygn dans une grille Pour rendre Sygn indépendant de l’architecture de grille sur laquelle Sygn est utilisée, nous avons choisi l’approche suivante : un module d’intégration est chargé d’imposer les décisions de Sygn aux utilisateurs. Ce module agit comme agent entre l’utilisateur et la ressource. A chaque requête sur la grille, l’utilisateur génère une requête Sygn qui autorise la requête sur la grille. Le module d’intégration reçoit les requêtes, les sépare et soumet la requête Sygn à l’algorithme de décision de Sygn. Si la réponse est positive, le module d’intégration vérifie que la requête Sygn provient bien du même utilisateur qui a soumis la requête grille. Pour cela, le module d’intégration doit interagir avec les mécanismes d’authentification de la grille, pour obtenir la clé publique de l’utilisateur. Le module d’intégration vérifie aussi que les ressources en question et les actions demandées sur ces ressources se correspondent dans la requête à la grille et la requête Sygn. Si ces vérifications sont positives, le module d’intégration transmet la requête à la ressource pour être traitée. Un tel module d’intégration pour Sygn et l’architecture µgrid a été implémenté par Didier Oriol, au cours de son projet de fin d’études à l’INSA de Lyon. Ce module permet l’utilisation de Sygn pour le contrôle d’accès aux fichiers dans µgrid, avec les fonctions décrites ci-dessus. Pour l’intégration de Sygn dans une grille standardisée OGSA il faut se poser la question, s’il est nécessaire de donner à Sygn une interface de service web. Puisque Sygn est conçu pour être co-localisé avec les ressources qu’elle contrôle, il semble possible de faire interagir les ressources localement avec le module d’intégration, au lieu d’implémenter une interface service web. Par contre si elle s’avérait nécessaire, l’implémentation d’une telle interface service web ne poserait aucun problème. Le moteur de décision de Sygn est sans état et pourrait donc être implémenté comme simple service web, sans qu’il soit nécessaire de prendre en compte des extensions de WSRF. Puisque toute communication avec Sygn est déjà encodée en XML, il serait simplement nécessaire de définir une description WSDL des interfaces et de générer le code pour la communication par le protocole SOAP. De nombreux outils pour générer ce code à partir d’une description WSDL existent, par 28 CHAPTER 1. RÉSUMÉ FRANÇAIS exemple l’outil de génération de services web gSOAP4 . Nous prévoyons de réaliser une telle intégration au cours des travaux futurs. 1.6.4 Intégration de CryptStore dans une grille Pour pouvoir mettre à disposition CryptStore sur une grille standardisée OGSA, les serveurs de clés doivent êtres munis d’une interface de service de grille. Puisque les serveurs de clés sont sans état, ils peuvent donc être implémentés comme simples services web. Les requêtes et réponses au/du serveur de clés sont déjà encodées en XML, il serait donc uniquement nécessaire d’écrire une description WSDL des interfaces et d’en générer le code pour la communication par le protocole SOAP. Cela peut être fait comme nous l’avons décrit dans la section précédente. Par contre, il faut aussi instancier l’interface générique des serveurs de clés avec le service de contrôle d’accès. Nous avons créé une telle instanciation pour utiliser Sygn en conjonction avec CryptStore. Dans cette approche, un serveur Sygn peut être co-localisé avec le serveur de clés. Le serveur Sygn se charge de stocker les sources d’autorité pour les fichiers chiffrés pour lesquels le serveur de clés stocke des parts de clés. Utilisant cette information, le serveur Sygn peut prendre des décisions de contrôle d’accès pour les parties de clés, sans devoir consulter un service externe. Les outils d’administrateur et d’accès de fichier encapsulent une requête Sygn dans chaque requête CryptStore, suivant les actions que l’utilisateur veut initier. L’interface avec le service de contrôle d’accès de CryptStore réalise ici les fonctions du module d’intégration que nous avons discutés en section 1.6.3. Une version fonctionnelle de CryptStore est implémentée avec une interface vers le contrôle d’accès Sygn. Ce logiciel peut être téléchargé à partir du site http ://liris.cnrs.fr/ lseitz. 1.7 Conclusion Au cours de la thèse, nous avons examiné l’utilisation de grilles de calcul pour des applications médicales sous l’angle de la sécurité des données. Nous avons démontré que les solutions classiques ne sont pas toutes directement utilisables, en raison des spécificités des grilles de calcul. Puisque le problème central des applications médicales est la confidentialité des données, nous avons choisi d’examiner le contrôle d’accès. 4 Disponible sur http ://www.cs.fsu.edu/∼engelen/soap.html 1.7. CONCLUSION 29 En nous appuyant sur un ensemble de cas d’utilisation liés au déploiement d’applications médicales sur des grilles de calcul, nous avons présenté une liste de conditions et de contraintes fondées sur des principes de bonne sécurité, sur la nature de l’architecture des grilles de calcul et sur les spécificités des applications médicales. Les points les plus importants sont l’administration décentralisée, la traçabilité et le stockage chiffré des données. La nécessité du stockage chiffré pour compléter le contrôle d’accès provient du fait que sans chiffrement des données, le contrôle d’accès peut être contourné par des utilisateurs ayant un accès physique au matériel de stockage. A l’inverse le stockage chiffré oblige à passer par le contrôle d’accès pour accéder au fichier. En regarde de ces considérations, nous avons examiné l’état de l’art sur le contrôle d’accès distribué et le contrôle d’accès dans les grilles. Nous avons trouvé qu’aucun des systèmes ne répond à toutes nos conditions, même si l’on omet la nécessité du stockage chiffré. Nous avons ensuite examiné l’état de l’art sur le stockage chiffré. Notre intérêt principal était d’analyser comment est géré le partage de clés chiffrant des données mises à jour souvent et partagées par des groupes d’utilisateurs soumis à des changements de membres fréquents. Nous avons analysé que les systèmes de stockage chiffré qui supportent le partage de clés n’ont pas un support satisfaisant pour des groupes dynamiques. De plus, la plupart de ces systèmes mettent en œuvre un mécanisme spécifique de contrôle d’accès aux clés, créant ainsi une couche redondante de contrôle d’accès aux fichiers, qui peut mener à des incohérences. Notre première contribution, le système de contrôle d’accès Sygn, est conçue pour la gestion décentralisée de permissions. Pour cela, Sygn implémente un concept décentralisé de rôles et de collections de fichiers, basé uniquement sur des certificats d’autorisation. La gestion décentralisée de permissions est aussi supportée par des mécanismes de délégation fondés sur des chemins de certificats d’autorisation. Cette décentralisation nous a aussi amenés à minimiser les informations relatives au contrôle d’accès qui doivent être présentes aux points de décision. La plupart des informations nécessaires pour les décisions de contrôle d’accès sont fournies par les utilisateurs qui demandent l’accès en présentant des chemins de certificats qui autorisent ces accès. Les points de décision doivent seulement connaı̂tre les sources d’autorité pour chaque ressource qu’ils contrôlent. Cette décentralisation aide au passage à l’échelle du système, mais surtout elle réduit l’impact d’une attaque réussie sur un serveur de contrôle d’accès, puisque seuls les ressources locales seront exposées. Sygn propose aussi des fonctions intégrées pour le traçage et peut être configuré pour garantir la non-répudiabilité des requêtes, qui peuvent être 30 CHAPTER 1. RÉSUMÉ FRANÇAIS utilisées comme preuves pour un audit. En intégrant la traçabilité au sein du contrôle d’accès, Sygn permet de mettre en œuvre facilement les deux fonctions. Le contrôle d’accès est un point idéal pour obtenir de l’information de traçage, puisque toutes les requêtes d’un système sont obligées d’y transiter. Notre seconde contribution, CryptStore, complète les fonctions du contrôle d’accès en protégeant les données contre le contournement du système de contrôle d’accès. CryptStore permet aux utilisateurs de stocker leurs données sous forme chiffrée et de partager les clés de déchiffrement avec des utilisateurs autorisés. Puisqu’il est nécessaire d’avoir la clé de déchiffrement pour accéder aux fichiers chiffrés, un utilisateur qui accède directement au média de stockage ne pourra pas prendre connaissance du contenu des fichiers. Pour avoir des permissions cohérentes sur les fichiers et les clés qui servent à les déchiffrer, CryptStore utilise les mécanismes de contrôle d’accès de la grille pour décider quel utilisateur aura accès à une clé. Pour cela CryptStore possède une interface générique qui peut être adaptée au système de contrôle d’accès présent sur la grille de calcul. Puisque les clefs elles-mêmes sont des données de valeur, aucun serveur de clefs ne stocke une copie entière d’une clé. Les clefs sont divisées en parties, générées par un algorithme de partage de secrets et distribuées sur plusieurs serveurs de clefs. Grâce à la possibilité de créer des parties de clé redondantes, CryptStore est robuste contre la défaillance d’un ou plusieurs serveurs de clefs. Comme travaux futurs sur Sygn nous prévoyons d’intégrer des mécanismes de contrôle d’accès sur des bases de données. Ceci permettrait d’exposer une base de données sur la grille, tout en contrôlant l’accès aux données contenues dans cette base, indépendamment de l’architecture de la base de données utilisée. Suivant la même direction générale nous prévoyons aussi une extension de Sygn pour le contrôle d’accès aux éléments d’un document XML. Une autre question intéressante que nous allons examiner est l’implication légale des grilles de calcul pour le traitement de données personnelles (cf. chapitre 3.5). Pour cette question, nous allons coopérer avec des experts juristes pour déterminer les conditions légales d’utilisation et pour valider que nos solutions techniques répondent à ces conditions. Chapter 2 Introduction Resource sharing has always been a central issue in computer science. At first crude time-sharing protocols were used, where a user had to reserve computation time on a central machine, and submitted his program written on punch cards to the operators of the computer. Then the Internet was created as an architecture to share data resources, which is still its main use today. However the use of the Internet is not limited to sharing data. Other resources such as computing power and storage space can also be shared, using Internet technologies. A problem that is often encountered when trying to share resources over the Internet is that the heterogeneous systems used to exploit the resources are not capable to interoperate. Even if the same operating system and application software are installed, minor differences in the configuration can cause major problems when one tries to use distant resources. Often painstaking manual configuration and detailed knowledge about the specifics of the distant resources is required to make them work. Grid computing [48] offers a framework that facilitates the sharing of resources and that aims to overcome interoperability problems. Grids provide a common resource sharing platform that handles the discovery, allocation and use of resources for the user in a transparent way. Grids support the sharing of resources such as sensors, computing power, storage capacity and data. First Grid applications concentrated on compute intensive applications such as particle physics and terrestrial observation. In these applications computing power and storage capacity is of paramount importance. Security aspects, especially related to data security are of lesser concern. As Grids move on to biomedical research projects, such as comparative genetics, data security aspects have gained more importance. Recently Grid have been identified 31 32 CHAPTER 2. INTRODUCTION as a possible architecture to support health-care networks [70]. In health-care applications, additionally to its role as provider of raw processing power, the Grid allows to share data resources of various formats across organizational boundaries. As more and more resources are exposed to the Internet, a major problem is to protect these resources against unauthorized use, while allowing authorized users to access them even when connecting from a distant site. The most important resources that need to be protected are clearly data. Data are the main asset of most applications and its misuse has the most dire consequences. This is especially true for personal medical data. Uncontrolled disclosure of medical data can make it impossible for the concerned person to get employed or even to get medical insurance. The medical community has seen the advantages but also the risks of information technology for health care [82]. The use of Grids in health-care could improve treatments in several ways, as for example through a significant speed-up in processing complex image analysis, or the possibility to make all medical data available, even when they are stored at geographically distant sites on heterogeneous systems. However we also have to guarantee the users that an adequate privacy protection is ensured, or this new technology will never be accepted. This thesis provides an architecture that controls access to resources in a Grid environment with a special focus on the protection of data resources. The framework for this research is provided by the application of Grids for the creation of health-care networks and the specific data protection requirements that arise from these applications. 2.1 Security aspects of resource sharing on a Grid The full spectrum of security related issues are applicable to Grids in general and to the use of Grids as framework for health-care networks. The issues that need to be addressed include: • Authentication of entities using the Grid. • Authorization of actions performed on the Grid, especially access to resources on the Grid. • Confidentiality of communications within the Grid. • Confidentiality of data stored on the Grid. • Integrity of data stored on the Grid. 2.1. SECURITY ASPECTS OF RESOURCE SHARING ON A GRID 33 • Auditing, Accounting and Non-repudiability. • Intrusion detection and other passive defense methods against attackers. • Robustness against errors, break-downs and malicious actions, such as denial-of-service attacks. Authentication deals with the ways of proving, possibly mutually, the identity of communicating entities. Authentication raises interesting technological problems in Grids related to the requirement of single-sign-on that states that a user should only have to authenticate once when using a Grid. Most implementations of Grid architectures favor certificate based authentication using a public key infrastructure (PKI). Authorization deals with the decisions who is allowed to use which resource in what way. The term access control refers to all methods that enforce authorization decisions. Confidentiality is the protection of data on an insecure medium against accidental or malicious disclosure to third parties. Basically we have to differentiate between confidentiality of communications (i.e. confidentiality of data being transmitted over a network), and confidentiality of storage (i.e. confidentiality of data on a storage medium). The main difference is the lifetime of the data, which is very short in communication compared to storage. Integrity of data refers to the protection of data against unauthorized modification. As a full integrity protection of data on re-writable media is not possible, most algorithms deal with making violations of integrity detectable to users. Integrity checking is a secondary topic of this work and is therefore only given limited consideration. Auditing systems collect data that allow to review actions of all entities concerning the Grid’s resources. Auditing can be valuable for post-mortem analysis in a case of suspected or actual misuse of resources. Auditing is closely related to accounting, which uses auditing systems for measuring and possibly invoicing the use of resources. The third element of this group is a more restricted form of auditing, where the audit data allows to prevent that an entity denies having undertaken actions. Non-repudiation uses techniques such as digital signatures in order to bind actions to the users who initiate them. The term intrusion detection refers to all activities aimed at an early discovery of unauthorized access to resources. The goal of intrusion detection is to limit the damage done by a successful attack by reducing the time an attacker remains undetected. Among others, intrusion detection uses integrity protection techniques in order to achieve its goals. 34 CHAPTER 2. INTRODUCTION Robustness of systems, especially against malicious actions has become a general requirement of networked services. Measures for ensuring robustness include redundancy of critical systems and data, cross-checking of critical information. It is therefore an aspect that has to be considered in all other topics above. In this thesis we mainly address authorization and access control to resources, as we believe that these are the scientifically most challenging parts of Grid security. We also show how confidentiality of storage is closely linked to data access control, as a way to protect data against the circumvention of the access control mechanisms. As it is practical and very easy to integrate together with confidentiality of storage, data integrity protection is also given limited consideration in this context. Auditing and non-repudiation is also well suited to be integrated with authorization services. This is due to the fact that all actions must pass through the authorization system and it is therefore useful to co-locate authorization and auditing. Authentication is not an issue studied in this thesis. We assume a working PKI infrastructure is available, that allows easy and secure authentication of the Grid users. These authentication mechanisms are a necessary prerequisite for authorization mechanisms. The confidentiality of communications and intrusion detection are not an issue for which Grids pose novel security challenges. As Grids use normal network communication mechanisms, existing protocols such as TLS/SSL or IPSec can be used to ensure the confidentiality of communications. Intrusion detection is a task related to a closed system and is therefore not applicable to Grids as a whole. In the closed components of a Grid conventional intrusion detection measures can be used. Robustness is not specifically addressed in this work, but rather considered as a requirement for every aspect of the other approaches. 2.2 Why Grids pose novel security challenges A large amount of previous work exists in the field of computer security. Some robust and widely tested standards have been created for which numerous support tools exist. We have therefore to consider the question how much of these tools we can re-use, which ones we can adapt to make then usable on Grids and where we have to develop new ones. In order to do this we have to examine the specifics of Grid computing architectures and applications. Grids are generally used by large, dynamic cross-organizational communities. In contrast to classical user communities, these are not centrally administrated and therefore the use of centralized authentication mechanisms 2.3. OUTLINE 35 is not possible. Currently public key infrastructures (PKI) are investigated as a method to provide cross-organizational authentication. However many organizational and technical problems still remain [55]. Resources offered on a Grid are subject to dynamical changes that can not be centrally predicted. For example clusters assigned to a Grid may be taken offline for maintenance or simply in order to use them for non-Grid activities. These resources are geographically and organizationally distributed and consist of heterogeneous hardware, software and data formats. Therefore it is infeasible to handle authorizations that allow access to these resources centrally. Furthermore a common format for communicating authorization and authentication information is needed, that works for all of these resources and that can be adapted, following the dynamical changes of availability. Several important aspects of Grid security come from the deployment of applications over a decentralized architecture: First there is no central point of access to a Grid, which requires a decentralized mechanism for authentication and authorization. The entry point of a user does rarely correspond with the point where the access to the used resources is controlled. Therefore mechanisms have to be established to allow users to transfer authorizations from his Grid entry point to the resource that is to be used. Grid resources are subject to different, sometimes overlapping security policies, that need to be combined to make them work together. Therefore the security architecture must provide mechanisms to resolve conflicts between security policies and must allow local administrators to apply their policy to their resources. A special problem is posed by the data, as with the transparent nature of storage, files may be stored outside the owner’s home domain. Nevertheless data owners want to be able to control access to their data. This requires a fine-grained access control, allowing users that are not previously known on a system to specify permissions on files that have been stored on this system. Using Grids for Biomedical applications poses further application specific security problems. These problems will be discussed in chapter 3. 2.3 Outline The remaining part of this thesis is organized as follows. Chapter 3 presents use-cases and motivates the design goals of our approaches with them. In chapter 4 we present related work in the domain of access control. Chapter 5 explores the related work in the domain of encrypted storage. We present our access control architecture Sygn in chapter 6. In chapter 7 we present our encrypted storage architecture CryptStore. Chapter 8 analyzes how both ar- 36 CHAPTER 2. INTRODUCTION chitectures can be integrated in Grids and finally chapter 9 draws conclusions and presents future works. Chapter 3 Motivation In this chapter we examine the resource sharing scenario presented in the introduction in relation to security aspects. We present some realistic usecases and threats within this scenario and use them to derive constraints and requirements for an effective and secure approach to access control for confidential data in a Grid environment. These requirements are divided in three thematic groups: General principles of good security, specifics of the Grid environment and specifics of applications dealing with confidential data. 3.1 Use-Cases 1. A medical doctor treats several patients and therefore has access to their medical files, including radiological images stored at a distant clinical database. The doctor wants to use Grid resources to perform a computation intensive image analysis on a specific radiological picture of one of those patients. To do this, the doctor uses a Grid interface to launch an application that performs the necessary operations on his behalf using his access rights to download the picture from the clinical database. In order to speed up the processing, the picture is replicated on Grid storage resources, near the processing units. The computation process accesses to this replica using the doctor’s permissions. As the doctor is legally liable for the confidentiality of the data he uses, he wants to have to trust as few entities as possible in the data processing. During the processing of the picture, a Grid storage resource on which a replica of the picture is stored breaks down. It is disconnected from 37 38 CHAPTER 3. MOTIVATION the Grid for maintenance and brought online again several weeks later. At this point in time, the patient whose picture is still on this storage resource has changed his doctor and the former doctor should not have access to the picture replica anymore. Since the patient suspects a misuse of his medical files by this former doctor, he wants a detailed information on how this doctor used his files. 2. A clinic employs several medical doctors, who are each responsible for several patients. These doctors also work at the clinic’s research laboratory, which cooperates with several other research centers for studies on genetically induced diseases. The clinic wants to enforce different permissions for the same doctor depending on the task he or she is currently executing in order to prevent accidental disclosure of confidential patient data within the research department. In order to effectively share resources within cooperative research projects, the research centers have installed a Grid architecture and formed virtual organizations (VO) to manage the Grid resources allocated to each project. Each VO distributes permissions related to its resources using a server, hosted by one of the participating centers. The clinic employs a significant number of trainees and assigns some of them to a workpackage of one of the common projects. These trainees change frequently and require a standard set of access rights to the resources of the VO. The tasks assigned to the trainees are not subject to major changes and therefore the permissions related to those tasks remain relatively similar. The clinic wants to administrate these permissions in an effective and simple way. The centers providing computing and storage resources to the VO have agreed to provide those resources via the Grid, on condition that they have the final administrative power to decide who may access them and especially that they are able to deny access to troublemakers. As the project evolves during the lifetime of the VO more personnel gets involved and the resources administrated by the VO are considerably expanded. 3. A tourist falls ill during his holidays. He goes to a local health-care center and wants to give some local doctors access to a part of his medical files so they can gather the necessary background information for effective treatment. 3.2. GENERAL PRINCIPLES OF GOOD SECURITY 39 The local health-care center has a medical information system that is incompatible with the systems on which the tourists data are stored. However it is connected to an international health-care network based on a Grid architecture and shares medical data using this network. This architecture should allow the tourist to give the doctors at the center authorization to access his files through the Grid. This authorization should be effective immediately. The center employs a small software company as subcontractor to maintain its medical application software. The employees of this company have administrator access to the centers hard disks used as Grid storage resources. However the health-care center does not want them to be able to read the patient data on those disks. 3.2 General principles of good security This section presents the requirements we found for our application with respect to some general principles of good security. We motivate each requirement in the context of the use-cases we have presented before. • S1 Least privilege: Since a malicious or faulty application that executes actions on behalf of a user could misuse his permissions, it should be possible to control the extend of permissions that are used for a specific action. In the first use-case, if the process that performs the analysis on the radiological picture had the medical doctor’s full permissions, it could access other patients’ data and breach their confidentiality. Grid users will most likely not have the technical knowledge or the time to check if software applications they use have no hidden malicious functionality. Therefore this requirement can help to reduce the damage done by such an application. • S2 Permission consistency: A permission on a data resource should be applicable to any replica of these data. While permissions on identical copies of the same data may have different permissions, depending on the context in which they are used, replicas are created automatically by the Grid middleware in order to speed up the access for applications using this data. For such replicas the access control system should not produce inconsistent access control decisions, where the same permissions give access to one replica of a file and do not give access to another replica of the same file. In the first use-case (second paragraph), the new replicas of the picture should be accessible for the process that is 40 CHAPTER 3. MOTIVATION to perform the computation on them. The process should not need any new permissions to access this replica, since those would require external intervention. Furthermore the replicas of the picture should not be accessible to any user that does not have the permission to access the original picture. • S3 Minimize the use of (trusted) third parties: In any distributed computing scenario, there are typically two actors: The one who requests an operation and the one who executes it. For every additional actor introduced in-between those two, the risk of something going wrong increases. Creating a trustworthy third party in a security critical protocol, requires a lot of effort and may often not be worth while. It is therefore best to avoid the necessity for trusted third parties when possible in the design of a security protocol. In our first use-case (third paragraph), the doctor will be more reluctant to use the Grid, if he or she has to trust several third parties (for example a centralized permission authority). • S4 Separation of duties: Users tend to have multiple permissions that are not necessarily all related to the same task. Unexpected combinations of permissions may enable users to commit fraudulent or erroneous actions that they should not be able to perform. However any of those permissions for itself may be necessary for some tasks assigned to the user. Therefore an access control system should permit to put restrictions on which permissions may be used simultaneously. In our second use-case (first paragraph) a doctor may accidentally copy confidential patient data to a publicly accessible server at the research laboratory. Separation of duties could help to avoid this, for example by preventing doctors from accessing confidential patient data, while using publicly available resources of the research laboratory. • S5 Secure permission storage: Permissions need to be highly accessible. The more permissions are stored together the higher is the value of such storage sites as target for attacks. If an access control system has no way to verify the integrity of the permission it uses, it becomes vulnerable to undetected modifications attackers may have made. In our second use-case (second paragraph) the cooperating centers forming the VO for the common project need a way to protect the permissions related to resources allocated to the VO otherwise the permission distribution server becomes a considerable security risk. • S6 Avoid centralized security services: Centralized services scale badly, 3.3. CONSTRAINTS OF THE GRID ENVIRONMENT 41 because they often become a bottleneck and a single point of failure, when the workload increases. Moreover they represent a security vulnerability, since attackers can disrupt the system by targeting those centralized services with denial of service attacks. Also centralized security services often hinder decentralized resource sharing. In our third use-case (first paragraph), the tourist might have trouble granting the foreign doctor access to his medical files if this would involve a centralized access control service, since the service may be down or have a long response time. 3.3 Constraints of the Grid environment This section presents the requirements for our system with regard to the specifics of the Grid environment. As before we motivate each requirement in the context of the use-cases we have presented in section 3.1. • G1 Handle ad-hoc permissions within dynamic user communities: In Grid environments resources are shared by cross-organizational communities that form virtual organizations (VO). These VO’s can have a high fluctuation in members and short term cooperations can require spontaneous access to some resources. The Grid access control system must be able to handle these situations for example by supporting flexible delegation mechanisms. In our second use-case (third paragraph) the clinic’s trainees work on short term projects related to the global goal of the VO. Frequently these trainees require an access to the Grid resources on short notice, in order to perform tests or calculations. This will require permissions to be created or modified. Such permission management should not be hindered by complex processes for the creation of new users or the need for intensive administrator intervention. The access control service should be able to handle single users and resources as well as groups of users and sets of resources. • G2 Manage dynamic resource availability: In a Grid environment, resources are subject to dynamic availability. Services may break down at any time or the connection to those services may become interrupted. The Grid access control system must handle such outages gracefully, even if a component of the access control system itself becomes unavailable. In our first use-case (paragraph four) this means that the data access control should not solely rely on locally stored permissions, since those may become invalid during an offline period. 42 CHAPTER 3. MOTIVATION • G3 Integrate heterogeneous environments: A Grid is intended to bring together resources originating from computers using different operating systems and application software configurations. These resources are transparently available through the Grid regardless of the underlying system. Therefore the Grid access control should require a minimum of specific application software to be deployed on the Grid elements and should provide a maximum of openness. In our third use-case (second paragraph) the health-care center should still be able to gain access to the patients’ files, even though it operates a medical information system that is not compatible with the one where the patients’ data are stored. • G4 Enable local control of hardware resources: Most system administrators would not accept to provide local hardware resources on a Grid, if that would mean to give up their administrative power over those devices. Therefore a Grid access control system must enable resource providers to control the access to their resources (i.e. when the resource may be used in which way and by whom). This need is illustrated in the use-case two (paragraph four). • G5 Transparency of the data resource location: Since the storage location of files is transparent for the Grid user, data access control permissions should not depend on the storage location of the data either. This requirement is somewhat parallel to requirement S2 and therefore concerns the same use-case, however whereas in S2 the emphasis is on replicas, here it is on the storage location of the data. • G6 Scalability of the access control system: A Grid is designed to provide a huge amount of resources to a vast community of users. Solutions that work well on a small scale may fail, when used at large scale. Therefore the Grid access control system needs to be applicable even in intensive usage scenarios involving lots of resources and users. Generally this implies that it should not rely on centralized services (see requirement S6) and that the system can be expanded easily by adding decentralized components. Use-case two (last paragraph) illustrates that the level of required scalability can not always be estimated correctly in advance and that the system must provide enough flexibility to deal with such situations. 3.4. CONSTRAINTS OF THE APPLICATION 3.4 43 Constraints of the application Since the predominant application examples in this thesis are Grid based health-care networks and Grid based medical research, we examine the specific requirements of these applications in the present section. Several of our conclusions apply more generally for applications that require sharing of confidential data on a Grid or on another distributed architecture. Again we take up the use-cases from section 3.1 to illustrate our points. • A1 Role based access control: A user working on a medical application has structured tasks and requires permissions related specifically to those tasks. Furthermore permissions may have a hierarchic structure, where users higher up in the hierarchy inherit all permissions of those that are lower (e.g. Clinic’s leading medical doctor > station’s medical Doctor > Nurse). Furthermore we have shown the necessity of separation of duties in requirement S4. Role based access control is ideal to respond to these requirements. It is presented in detail in section 4.2. In our second use-case (paragraph one) the permissions for the clinic’s staff could be very effectively managed using roles. • A2 Traceability: Medical applications are legally required to provide logs that allow to trace all use of confidential patient data. Therefore mechanisms to ensure non-repudiable tracing of all access attempts have to be provided. Since all access passes through the access control systems, they are ideally suited to host such a tracing service. In our first use-case (last paragraph), such logs could be an important evidence in case of a trial. • A3 Data access control by owners and delegation: When Grids are used to share large amounts of data, access permissions concerning specific data needs to be updated frequently (see requirement G1). An access control system that requires administrator intervention for each permission change would be a major hindrance for effective data sharing. Furthermore owners of sensitive data would be reluctant to have the data access rights administrated by somebody else. Therefore the access control system needs to support fine-grained access decisions and the owners of the data need to be the source of authority for access control decisions. Furthermore decentralized delegation mechanisms must be available to enable the owners of data to administrate the access permissions. In our third use-case (first paragraph) the tourist should be able to grant access to his medical data, without external administrative intervention. 44 CHAPTER 3. MOTIVATION • A4 Protection against circumventing access control: Data stored on physical devices are at risk of being disclosed by persons having an administrator access to these devices. Furthermore an attacker can easily gain an administrator access to a device, if he has physical access to it. Therefore measures must be taken to protect confidential data stored on the Grid. In our third use-case (paragraph three) the employees of the subcontractor should be able to perform their maintenance, and therefore need administrator access to the storage media. However they should not have access to the confidential patient data stored on that media. 3.5 Legal issues dealing with medical data In this section we give a short overview of the laws governing private data in general and more specially private health data. Readers should be aware that this overview is necessarily incomplete from a legal point of view. We have concentrated on laws of the European Union using as principal source of information the web-pages of the European Union itself [41]. 3.5.1 European laws concerning privacy protection Within Europe, the root of all legal actions for the protection of personal privacy comes from Article 8 of the European Convention for the Protection of Human Rights and Fundamental Freedoms, signed in Rome on November 4th 1950 [29]. This article lays down the principles of personal privacy protection and the need of a lawful basis for any interference in this personal privacy. The Treaty on the European Union declares that these rights shall be respected within Community law in Article F of the Common Provisions [44]. Furthermore the rights of privacy protection and of the protection of personal data have been laid down in the Charter of Fundamental Rights of the European Union on December 7th 2000 [43]. In order to formalize these goals and to achieve legal harmonization in all EU member states, the European Union has issued the directive 95/46 EC (directive on the protection of individuals with regard to the processing of personal data and on the free movement of such data) on the 24th of October 1995 [42]. The directive begins with various definitions, where the most important are personal data, controller and processor in Article 1: 3.5. LEGAL ISSUES DEALING WITH MEDICAL DATA 45 (a) ’personal data’ shall mean any information relating to an identified or identifiable natural person . . . . . . (d) ’controller’ shall mean the natural or legal person, . . . which alone or jointly with others determines the purposes and means of the processing of personal data; . . . (e) ’processor’ shall mean a natural or legal person, . . . which processes personal data on behalf of the controller. The definition of personal data implies that data that have been superficially anonymized are to be considered as personal data, if it is possible to infer the identity of the person they concern using secondary sources. In Grid environments controllers of some piece of data should be well defined as the owners of the data (see also requirement A3 of the previous section). However it is the definition of the processors that is difficult to handle, since in a Grid, these processors would be entities providing Grid resources such as storage space and/or processing power. Transparent resource sharing as used within Grids makes the identification of processors for a specific Grid job difficult, adding requirements to auditing and accounting systems. The directive explicitly mentions health data in Article 8: 1. Member States shall prohibit the processing of personal data . . . concerning health . . . 2. Paragraph 1 shall not apply where: (a) the data subject has given his explicit consent to the processing of those data, . . . 3. Paragraph 1 shall not apply where processing of the data is required for the purposes of preventive medicine, medical diagnosis, the provision of care or treatment or the management of health-care services, . . . Readers should note that this does not explicitly includes medical research, therefore the patients explicit consent has to be obtained in order to use medical data for research purposes. The directive gives the data subject several rights in articles 10–15, including the right to access to his data, the right of rectification, blocking or erasure of incomplete or inaccurate data. The most important part for Grids when dealing with personal information is Article 17, dealing with the Security of processing: 1. Member States shall provide that the controller must implement appropriate technical and organizational measures to protect personal data against loss, alteration, unauthorized disclosure or access . . . 46 CHAPTER 3. MOTIVATION Having regard of the state of the art and the cost of their implementation, such measures shall ensure a level of security appropriate to the risks represented by the processing and the nature of the data to be protected. 2. The Member States shall provide that the controller, where processing is carried out on its behalf, chooses a processor providing sufficient guarantees in respect of the technical security measures ... and must ensure compliance with those measures. 3. The carrying out of a processing by way of a processor must be governed by a contract or legal act binding the processor to the controller . . . These regulations have several implications for processing of medical data, using a Grid. The most constraining surely is the obligation to have a contract between processor and controller. To make a Grid workable under such constraints, we see two possibilities: • Resource providers wanting to make their resources accessible to medical applications make prior legal contracts, which regulate their role as processor. To allow more flexibility these contracts could be made with a medical resource broker, who can then make contracts with the home organizations of users treating medical data on a Grid. • The other possibility would be ad-hoc contracts concluded over the Internet similar to many e-commerce applications. The legal implications as well as the security requirements for such a contracting system are out of the scope of this thesis. The other obligations stated in this article also deserve some more thought. What are the risks represented by processing medical data and what is the nature of the data to be protected? We believe that the risks when dealing with medical data in general are very high. Disclosure or falsification of such data could make it difficult for the concerned person to get an employment or even a medical insurance. These problems are intensified by the use of a Grid. The transparent resource sharing and the fact that large communities of users have access to Grid infrastructures, make the task of privacy protection very difficult. Therefore the best possible technical measures have to be taken to protect the data. For more detailed considerations on the need for confidentiality in healthcare when using information technology, see [82]. For an in-depth analysis of the requirements of directive 95/46 EC with respect to a medical Grid, see [58]. 3.5. LEGAL ISSUES DEALING WITH MEDICAL DATA 3.5.2 47 French Law concerning privacy protection All directives of the European Union have to be implemented in national law within a certain period of time. For directive 95/46 EC this period was set to three years from the date of its adoption (Article 32). Therefore the member states have implemented new laws or modified existing ones to respond to the requirements of the directive. As an example we briefly examine the French implementation of directive 95/46 EC. It is codified in the law n◦ 2004-801 from the 6th of August 2004 that modifies the law n◦ 78-17 of the 6th of January 1978 [53]. We refer to the articles within the modified version of law n◦ 78-17 in the rest of this section. The law starts by taking over the definitions of Article 1 of the EU directive. However it specifically states that this law is not applicable to personal data that are temporarily stored to speed up access, for example in a cache (Art. 4). The law also specifies that personal data may be re-used beyond its original purpose for scientific research, if certain provisions concerning the rights of the data subject are respected (Art. 6 n◦ 2). This clearly goes further than the EU directive that does not specifically mention the use of personal data for scientific research. Chapter III (Articles 11-21) of this law establishes the National Commission for Liberties and Informatics (CNIL). The goals of the CNIL are to inform the public about the rights and obligations with respect to the processing of private data and to supervise that entities processing private data respect the provisions of this law. The law also considers the implications of anonymization of personal data in Article 8, paragraph III. It requires anonymization procedures to be authorized as conforming with this law by the the CNIL. The CNIL decides on case-to-case basis if the anonymized data can be used for such applications as for example medical research. Specific attention is also given to the transfer of personal data to States outside the European Union (Chapter XII). The bottom line of these regulations is that the controller of the data has to make sure that a sufficient level of privacy protection exists in the target state. This would allow international health-Grid applications, provided that all states in which Grid resources are located have sufficient privacy protection laws. In conclusion one can say that the legal implications of using Grids for the processing of medical data are not yet fully explored. Many pitfalls exist, and even though laws are harmonized throughout the European Union, differences in legal details can greatly influence the feasibility of medical Grids 48 CHAPTER 3. MOTIVATION from state to state. However most states have displayed a high level of interest in the use of Grids for medical applications. Therefore it can be expected that the governments will not let medical Grids fail on the basis of legal details. However the strict regulations concerning privacy protection require that all security aspects of medical Grids are treated with the greatest care from a technical point of view. This thesis contributes to the effort of making Grids secure for medical use and aims to satisfy legal requirements about technical measures to secure private data. Since the legal requirements are quite broad and do not address specific technical details of the security measures for the protection of private data, no direct impact on the contributions of this thesis could be derived. We do however note that our proposal is not contradictory which the limitations and requirements of the law dealing with privacy protection. Chapter 4 Related Work in Access Control In this chapter we address the connections between our work and the state of the art. We discuss general access control models, frameworks and standards, and actual implementations of access control systems. 4.1 Terminology In this section we introduce some vocabulary that we will use to describe the different concepts in this thesis. Definition 1 The natural persons acting on a Grid architecture are referred to as users. To specify a user, a user group or a process acting on behalf of a user we use the term entity. Definition 2 Data, storage space and computing power shared within a Grid architecture are referred to as resources. Storage space and computing power are also called hardware resources. Resources are provided on a Grid architecture by resource servers. Definition 3 To refer to an entity that is given an authorization, we use the term authorization subject or subject for short. To specify in authorizations what subjects can do on resources (e.g. read and write for data resources) we use the term actions. Resources within an authorization are referred to as authorization objects or objects for short. Definition 4 For each resource, an entity is identified as the resource’s source of authority (SOA). The SOA has the authority to issue and delegate authorizations that allow specific actions on the resource. 49 50 CHAPTER 4. RELATED WORK IN ACCESS CONTROL 4.2 Access Control Models In access control three general models are recognized: • Discretionary Access Control (DAC) • Mandatory Access Control (MAC) • Role Based Access Control (RBAC) In the following we give a short description of these access control models and evaluate which are their advantages and disadvantages with respect to our objectives. See [86] or [84] for a more detailed presentation of access control models. We then briefly present current directions in access control and examine their relevance for our objectives. 4.2.1 Discretionary Access Control In DAC all permissions can be represented by an access matrix, where each row of the matrix corresponds to a user and each column to a resource. The contents of the cells of the matrix are the actions the user specified by the row is allowed to perform on the resource specified by the column. This concept was first proposed in 1974 by Lampson [64], then refined by Graham and Denning in [54] and formalized by Harrison, Ruzzo and Ullmann in [57] Since in systems handling large numbers of users and resources the complete representation of the matrix is not feasible, several ways of representing the non-empty cells of the matrix are proposed: • Access Control Lists (ACL) correspond to storing the matrix by column. Each resource is associated with a list, containing the actions the various users may exercise on the resource. • Capabilities correspond to storing the matrix by row. Each user is associated with a list, containing the actions he or she may perform on the various resources. • Finally Access Control Relations store the non-empty cells of the matrix as three-tuples (user, resource, action) in a table. The advantage of DAC with respect to our objectives is that is permits a fine-grained access control and an easy ad-hoc permission granting, when combined with Authorization Certificates encoding tuples of Access Control Relations. An Authorization Certificate allowing ad-hoc access to a resource 4.2. ACCESS CONTROL MODELS 51 can be created and issued on demand by the resource’s SOA and transferred directly to the user specified in the relation. The user can present this certificate as proof of his permission to a resource site and thus gain access to the resource. Fine-grained access control permissions can be specified this way, provided that fine-grained resource identifiers exist on the Grid architecture. The disadvantage of DAC is that it can be cumbersome to manage, when permissions are assigned to users based on their tasks. This problem becomes even more obvious, when users are reassigned to new tasks, since in a DAC model this would mean to revoke every single permission related to the old tasks and reassign every single permission related to the new task. 4.2.2 Mandatory Access Control MAC typically deals with data resources. All resources are assigned a label specifying a classification (typically security levels like: top secret, secret, confidential, unclassified) which is stored as meta-data for the resource. Users are assigned clearances within this classification. Based on their clearance users are allowed to: • read all the data resources that are of the same or lower level. • write to all the data resources that are of the same or higher level. The first of those two rules is quite clear. The second one has the goal of preventing information to leak to a lower security level (e.g. by preventing a doctor who has access to confidential patient information to write this data to a publicly accessible medical database). If the user wants to write to data resources on a lower level he can log on the system using a lower clearance than his maximal allowed one. These two principles where first formulated by Bell and LaPaluda in 1973 [5] and then revised in [4] for the protection of confidentiality of a information. Based on the principles of Bell and LaPaluda, Biba [11] proposed a MAC model for the protection of integrity of information. The concept of MAC can be augmented by adding categories, where data and users are additionally assigned to one or more category (e.g. radiological, psychological, pharmacological). With this addition, access can also be restricted on a need-to-know basis. Users will only have access to data that belong to one of their categories. Clearances and categories can either be stored in secure permission repositories that are queried by the access control systems to retrieve information on a specific user or be distributed to the users in the form of certificates. 52 CHAPTER 4. RELATED WORK IN ACCESS CONTROL The problem with MAC is that even when using categories, it lacks granularity and flexibility of access permissions, since single objects cannot be specifically addressed and the set of actions on the objects is restricted . 4.2.3 Role Based Access Control Compared to DAC and MAC, Role Based Access Control (RBAC) is a relatively new paradigm. It was first introduced in 1992 by D.F. Ferraiolo and D.R. Kuhn at the 15th National Computer Security Conference [46]. A framework for RBAC was proposed by R. Sandhu et al. in [85]. The American National Standards Institute (ANSI) has adopted a consensus model for RBAC based upon [47] in 2004. RBAC is an effort to overcome the cumbersome administration of permissions inherent to DAC. The basic concept of RBAC is the role. A role is a named collection of permissions and possibly other roles, that are needed to perform a specific task. Users are assigned roles according to the tasks they have to perform. Therefore the management of permissions, especially when a user is (re-)assigned to new tasks, becomes much easier, since only the relations user-to-roles have to be changed. Furthermore the permissions related to a task can be changed globally without having to modify them for every user who is assigned to that task. The RBAC community differentiates the concept of a role from concept of a group in order to avoid confusion with the well-established meaning of groups in operating systems. A group is a named collection of users and possibly other groups and is therefore relatively similar to the concept of a role. Both can be used in access control to assign the same permissions to a group of users. However roles require some additional functionality that is not necessarily provided by groups. The core concept of the RBAC standard requires that each user and each permission can be assigned to multiple roles and each role can be assigned to multiple users and permissions. Furthermore review functions must be available, that allow to see the roles assigned to a user and the users assigned to a role, as well as the permissions assigned to a role and the roles assigned to a permission. Core RBAC also defines the concept of user sessions which allow to selectively activate and deactivate roles, in order to use the least privilege necessary. Finally users must be able to simultaneously exercise the permissions of multiple roles. The core concepts of RBAC are extended by hierarchical RBAC. In hierarchical RBAC, a partial order is defined between some roles, defining hierarchically superior and inferior roles. Superior roles inherit the permissions of inferior ones and users assigned to superior roles are automatically 4.2. ACCESS CONTROL MODELS 53 assigned to inferior roles too. Another extension of the RBAC concepts, constrained RBAC introduces the static and dynamic separation of duties (SSD and DSD). Hereby the combination of roles that can be assigned to (SSD) or simultaneously activated by (DSD) a user are subject to restrictions. Core RBAC achieves the objective of flexible and easy administration of permissions related to tasks. RBAC also allows to follow the paradigm of least privilege and to enforce an effective separation of duties. The limitations of RBAC lie in the necessity to assign all permissions through roles. Therefore ad-hoc permission granting would require the creation of a new role assigned to only one user, which is somewhat impractical. 4.2.4 Current directions in access control We have examined the following current directions in access control: • Policy composition frameworks • Attribute based access control • Trust negotiation Research on policy composition frameworks [16, 101] investigates how to integrate different, independent access control policies from multiple entities on a distributed system. A very important issue in policy composition frameworks are the possible inconsistencies, that can arise when trying to combine heterogeneous policies. Another interesting issue for policy composition frameworks, that applies directly to Grid computing, is mobile policies [95]. A mobile policy is associated with an access control object and follows the object if it is replicated or moved. Such mobile policies could be attached to data resources in order to regulate the fine-grained access control to data on a Grid. Attribute based access control (ABAC) is an approach, where authorization decisions are not based on the identity of the request issuers, but on a set of attributes that requesting users have to provide (usually proved through the possession of attribute certificates, see section 4.5). Attribute based access control can be seen as a generalization and extension of RBAC, where attributes (instead of roles) are assigned to users, and permissions depend on the attributes a user has. The differences between RBAC and ABAC are, that in attribute based access control a user may need multiple attributes in order to use a single permission. Such a construction is not supported by all RBAC systems. Furthermore in ABAC permissions 54 CHAPTER 4. RELATED WORK IN ACCESS CONTROL are not assigned to attributes, instead the access control objects specify the attributes that are required in order to access them. A framework for attribute-based access control specification and enforcement is presented in [15]. In [95] a good overview of current issues concerning policy composition frameworks and attribute based access control is given. Trust negotiation is another recent direction of research in access control. Trust negotiation systems handle the task of establishing mutual trust between two entities that have no previous relationship. This is achieved by providing credentials from a trusted third party, known to both entities (possibly through intermediaries). Based on the level of mutual trust that is established through the trust negotiation system, one entity may give the other certain access rights to its resources. An example of a trust negotiation system is presented in [83]. Trust negotiation is not limited to mapping one access control policy to another, as the negotiation process can lead to different results depending on the negotiation strategies adopted by the participants. In [8] a set of requirements for trust negotiation systems are proposed. Furthermore a good overview of existing trust negotiation systems is given. The article underlines the importance of credential chains for delegation, which supports our similar argument presented in chapter 3. Summarizing the current directions of access control one can say that the central goal of all new approaches are decentralization and cooperation between cross-organizational security systems. These general goals are reflected in the requirements of our application, and therefore the approaches presented in this thesis are oriented in the same direction. 4.3 Authorization Frameworks The Request For Comments 2904 (RFC 2904, August 2000) [96] and the International Organization for Standardization recommendation ISO/IEC 10181-3 [61] both define frameworks for Authorization systems. They are conceptually similar, but use a distinctly different terminology. As this may (and has) often lead to confusion we use the terms defined at the beginning of this chapter exclusively. The RFC framework proposes three message sequences that define how users, resources and servers handling authentication, authorization and auditing (AAA servers) interact: Agent, Pull, and Push. Figure 4.1 illustrates these message sequences. 4.3. AUTHORIZATION FRAMEWORKS 1 AAA Server User 4 2 55 AAA Server 2 3 3 1 Resource Resource User AAA Server 1 User 2 3 4 Resource 4 Agent sequence Pull sequence Push sequence Figure 4.1: Authorization Message sequences for an agent, pull and push authorization structure In the Agent message sequence, the user interacts only with the AAAserver for authorization. The AAA server relays the user’s requests to the resource server and notifies him once the service is ready. In a first step the user contacts the AAA server which then retrieves the user’s permission and checks if it allows the requested action. If the AAA server reaches a positive decision, it transmits the user’s request to the resource server in a second step. The resource server makes the requested available for execution and returns an acknowlegement to the AAA server in a third step. The AAA server informs the user that the request is ready at the resource server in a final step. In the Pull message sequence, the user only interacts with the resource server. The resource server handles all interactions with the AAA server. In a first step the user submits a request to the resource server. The server contacts the corresponding AAA server in a second step and asks for a decision whether the requested action should be allowed. The AAA server retrieves the user’s permissions to make its decision and communicates the result to the resource server in a third step. If the decision is positive, the resource server executes the user’s request and returns the result to the user in a final step. The Push message sequence puts all the burden of interaction on the user and thus separates the AAA server from the resource server. In a first step the user contacts the AAA server to retrieve assertions on his permissions. The AAA server retrieves the requested permissions and returns them to the user in a second step. Then, in a third step the user submits his request and the required permissions to the resource server. The server checks the submitted permissions and if they allow the requested action it proceeds to execute the request. In a final step, the resource server then return the results of the request to the user. Following the classification of RFC 2904 [96], the ISO framework is either agent or pull model, depending on where the function that enforces access control decisions is implemented (AAA server or resource). 56 CHAPTER 4. RELATED WORK IN ACCESS CONTROL RFC 2904 [96] also defines a set of architecture components that include the Policy Decision Point (PDP), where access control decisions are made based on the access control information provided in the message sequences presented before. The job of the Policy Enforcement Point (PEP) is to enforce the access decisions of the PDP with regard to the resources. Let us now consider a scenario with distributed resources, administrated by multiple different authorization authorities. In the case of data resources on a Grid, the SOA for the data may not be directly related to the storage resource on which the data are located. Using the pull sequence, a storage resource has the duty of contacting the different AAA servers for the data resources it stores. A malicious resource could violate the paradigm of using the least privilege by requesting more privileges than actually required from the AAA servers. The agent sequence imposes the additional duty of communicating with the resources on the AAA servers. If access to a resource involves authorizations between multiple AAA servers, a coordination mechanism between them is required. The push sequence allows a temporal decoupling of the authorization assertion from the actual request. The duty of querying the AAA servers and the resource is put on the user, reducing the load on AAA servers and resources. Furthermore the paradigm of using the least privilege can be securely enforced, provided that the AAA server allows the user to request assertions of a subset of his authorizations. Finally if the user’s request requires authorizations from multiple AAA servers the user can easily query them sequentially and combine their assertions to support his request. The disadvantage of the push sequence is that it requires an authorization revocation mechanism, since AAA servers have no longer access to authorizations once they are issued to their holders. Considering the drawbacks and advantages cited above, we believe the push sequence to be the best choice for this scenario. The RFC framework also discusses the use of attribute certificates (AC) to store authorization data. Their proposal is based upon the work on X.509 Attribute Certificates by the Public Key Infrastructure (PKIX) Working Group of the IETF and is discussed in section 4.5.2. The RFC framework explicitly states the necessity to ensure that the AC owner is also the request issuer in this context. 4.4. AUTHORIZATION EXPRESSION LANGUAGES 4.4 57 Authorization Expression Languages To express authorizations given to a user and general policies governing access control, a well defined language is needed. Several approaches have been proposed for such a language. In the following, we briefly examine the impact of KeyNote [14], XACML [52], and XrML [28] on our work. 4.4.1 KeyNote KeyNote [14] is a Trust-Management System and as such it combines authentication and access control in a unified framework for evaluating authorization requests. KeyNote defines an assertion language that allows to bind authorizations to entities. These entities may be represented by public keys, similar as in the SPKI approach (see section 4.5.3). If this is the case, those entities can delegate their authorizations by issuing digitally signed assertions. Currently KeyNote does not support the revocation of assertions, furthermore the KeyNote language has no support for RBAC since it is oriented towards DAC. As our application requires support for RBAC, KeyNote does not suit our requirements. 4.4.2 XACML The eXtensible Access Control Markup Language (XACML) [52] is a standard proposal by the OASIS consortium1 . It defines a general purpose language for specifying access control policies. XACML is highly expressive and offers a large variety of datatypes and functions to combine or compare them. XACML manages policy sets, that each consist of one or more rules. Each rule defines the actions a subject may perform on a resource. The XACML policy language is entirely written in XML [20] and is therefore more human readable than binary encodings such as ASN.1 [94]. This advantage is somewhat negated by the fact that XACML is very verbose and requires an enormous overhead for even the most simple policies. This makes policies difficult to write, understand and manage. Therefore special policy creation tools are needed to help untrained users to understand and create XACML policies. The PRIMA architecture presented in section 4.6.7 proposes such a tool. Figure 4.2 shows an example of a simple policy with a rule giving read access to a file. 1 OASIS is a non-profit, global consortium, that drives the development, convergence and adoption of e-business standards. Its foundational sponsors are Innodata Isogen, SAP and Sun Microsystems, Inc. 58 CHAPTER 4. RELATED WORK IN ACCESS CONTROL 00 <Policy PolicyId="FileAccessPolicy" 01 RuleCombiningAlgId="urn:oasis:names:tc:xacml:1.0: 02 rule-combining-algorithm:permit overrides"> 03 <Target> 04 <Subjects> <AnySubject/> </Subjects> 05 <Resources> <AnyResource/> </Resources> 06 <Actions> <AnyAction/> </Actions> 07 </Target> 08 <Rule RuleId="FileAccessRule" Effect="Permit"> 09 <Target> 10 <Subjects> <Subject> <SubjectMatch 11 MatchId="urn:oasis:names:tc:xacml:1.0: 12 function:string-equal"> 13 <AttributeValue 14 DataType="http://www.w3.org/2001/XMLSchema#string"> 15 /O=Grid/O=SomeVO/OU=liris.cnrs.fr/CN=LudwigSeitz 16 </AttributeValue> 17 </SubjectMatch> </Subject> </Subjects> 18 <Resources> <Resource> <ResourceMatch 19 MatchId="urn:oasis:names:tc:xacml:1.0: 20 function:string-equal"> 21 <AttributeValue 22 DataType="http://www.w3.org/2001/XMLSchema#string"> 23 SomeGridfileId 24 </AttributeValue> 25 </ResourceMatch> </Resource> </Resources> 26 <Actions> <Action> <ActionMatch 27 MatchId="urn:oasis:names:tc:xacml:1.0: 28 function:string-equal"> 29 <AttributeValue 30 DataType="http://www.w3.org/2001/XMLSchema#string"> 31 read 32 </AttributeValue> 33 </ActionMatch> </Action> </Actions> 34 </Target> 35 </Rule> 36 </Policy> Figure 4.2: An example of an XACML policy granting read access to a file. 4.4. AUTHORIZATION EXPRESSION LANGUAGES 59 Another drawback of XACML is that it has no explicit support for delegation. The PRIMA architecture has provided a workaround, by adding new actions, that give granting rights on existing actions. However this solution requires modification of the standard XACML PDP to take into account existing delegations when evaluating a subject’s access rights. Furthermore this solution mixes actions and delegations which are conceptually different parts of access control. 4.4.3 XrML The eXtensible rights Markup Language (XrML) [28] is a general purpose language in XML used to describe the rights and conditions for using digital resources. It has the same underlying goal as XACML since it was designed to answer the same question: “Is such-and-such a Principal authorized to exercise such-and-such a Right against such-and-such a Resource?” (see XrML Core Schema, p. 43 available from [28]), however XrML is focused on digital rights management (DRM). The core specification of XrML has no explicit support for RBAC and the XrML language has less expressive power compared to XACML. Contrary to XACML however XrML provides mechanisms for delegation of rights. XrML also supports binding authorizations to public keys similar to the SPKI approach (see section 4.5.3). The lack of support for RBAC and the strong focus on DRM make XrML not well suited for expressing complex access control scenarios. 4.4.4 General remarks We do believe that XML encoding is a reasonable approach to formulate access control decisions. First of all it is a human readable format and second there are a lot of XML support tools that allow to process XML encoded data. A standardized access control language is also desirable to achieve interoperability. It is obvious that an acceptable approach needs to be very generic. This somewhat explains the verboseness of XACML and other related standards. However in research, implementing such standards puts a huge workload on the researcher, and does not necessarily lead to scientifically significant results. We have therefore chosen not to implement standards in our approach, since we only target a proof of concept and standard conformance would have made the system very cumbersome to use and the access control data very difficult to understand. However our system can be easily adapted to support any XML based permission specification standard. 60 4.5 CHAPTER 4. RELATED WORK IN ACCESS CONTROL Standards for authorization assertion Several access control architectures store permissions unprotected (see section 4.6). We deem such an approach to be not appropriate for our application, since such permissions would be prime targets for hackers and a successful attack would give access to a wide range of resources. We believe that a maximum of security critical information should be stored securely encoded in digitally signed certificates. Ordered sequences of authorization certificates can be used to form certificate paths that allow delegation of authorizations. This allows for flexible and secure management of dynamic access rights. 4.5.1 SAML The Security Assertion Markup Language (SAML) [68] from the OASIS consortium defines an XML based syntax and protocols for requesting and providing authentication, attribute and authorization assertions. The authentication assertions contain descriptions on how subjects have been authenticated, attribute assertions bind certain attributes (e.g.roles) to subjects and authorization assertions convey an authorization decision for a specific request. If assertions need to be secured, SAML uses the XML digital signature recommendation of the W3C [33]. These signed assertions are equivalent to certificates. Due to its high level of expressiveness, SAML assertions are quite verbose and not easy to read and understand for an average user. The core specification of SAML does not address delegation of authorizations in any way. Recent proposals ([77, 97]) address this drawback by proposing extensions to the SAML specification. Both approaches extend some element of a SAML assertion in order to allow the expression a multi-step delegation in a single SAML assertion. In each delegation step, the access rights can be restricted by adding conditions or constraints. The approaches only differ in the choice of the SAML element to be extended for adding delegation and in the exact syntax of their delegation statement. SAML and XACML have some overlap, however while the focus of SAML is on conveying information such as user attributes, authorization decisions and authentication methods, XACML is centered on policies governing the request object. Therefore one could say that SAML is subject centered while XACML is object centered. This means that SAML can be used to provide PDPs using XACML with authorization information. 4.5. STANDARDS FOR AUTHORIZATION ASSERTION 4.5.2 61 X.509 Attribute Certificates The Request For Comments 3281 [45] defines a profile for the use of X.509 Attribute Certificates (ACs) for Authorization. ACs bind attributes to a user identity, which is to be authenticated by using a X.509 public key certificate. Since issuers of AC can define their own attribute types, any kind of authorization information can be encoded within an AC. The current profile is very limited, since it recommends not to support delegation because the administration and processing of AC paths is deemed to be too complex. Furthermore for each particular set of attributes only one source of authority may exist that functions as AC issuer. As an example this means that role memberships can only be issued by one single authority. Such a limitation would make scalable, decentralized authorization impossible and is therefore not suitable in a Grid architecture. The PERMIS and the PRIMA access control architectures make use of X.509 ACs (see 4.6). The authors of PRIMA have extended the X.509 AC specification in order to support certificate paths. 4.5.3 SPKI The Requests For Comments 2692 and 2693 [38, 39] define a Simple Public Key Infrastructure for trust management. SPKI introduces a simple format for authorization certificates. An access control list (ACL) that is co-located with each resource specifies the public keys of the resource administrators. These administrators may issue permissions on the resource and authorize other users to delegate them by issuing digitally signed certificates. SPKI uses public keys to identify entities and to create unique namespace identifiers. Furthermore it specifies a delegation mechanism through chains of certificates and details tuple reduction rules to produce an authorization decision out of such a certificate chain. Work on SPKI standardization has ceased since 2001 and thus important questions such as implementation of standard RBAC using SPKI have not been addressed. Binding permissions to public keys has several advantages with regard to binding permissions to user identities as in X.509 ACs. First of all it solves the problem of finding globally unique names for users and second it simplifies the integrity checking of SPKI certificates, since the creators public key is included in the certificate. The drawback of binding permissions to public keys is related to the revocation of a user’s private key. In such a case all authorizations related to this key have to be revoked too. When permissions are bound to user identities this is not necessary, since the user identity does not change if the 62 CHAPTER 4. RELATED WORK IN ACCESS CONTROL underlying authentication key is replaced. 4.6 Access Control Systems We now present existing access control architectures that are specifically designed for Grids or distributed resources and discuss how they relate to our requirements. 4.6.1 Shibboleth Shibboleth [40] is an access control architecture developed since 1999 by the Middleware Architecture Committee for Education (MACE) of the Internet2 consortium and supported by IBM. The distinct feature of Shibboleth are its mechanisms for user privacy and information release control. Shibboleth is specifically designed to control access to web based services. With respect to RFC 2904 (see section 4.3), Shibboleth uses the pull sequence, where the service provider is contacted by the user and then pulls the user’s attributes from the Attribute Authority (AA) of the user’s home organization. Based on local access control lists, the resource provider then decides which rights to grant to the user based on his attributes. Storage of the attributes is left to the discretion of AAs, attribute assertions are passed between the AA and the service providers using SAML (see section 4.5). The problem in the current design of Shibboleth is that the AA is considered to be located at the user’s home organization. This is not necessarily the case in Grid environments with multiple distributed sources of authority. This problem, the limitations of the current SAML specification concerning delegation, and the drawbacks of the pull sequence make Shibboleth unsuited for our requirements. In a recent spinoff project from Shibboleth, GridShib, the developers plan to integrate Shibboleth into the Globus Toolkit [99]. As the project is relatively recent and still in an early stage of development, nothing more precise can be said about its outcome. 4.6.2 Akenti Akenti [91, 92] is an access control system developed at the Distributed Systems Department of the Lawrence Berkeley Laboratory in the USA since 1998. Akenti uses signed certificates to store access control policies, resource 4.6. ACCESS CONTROL SYSTEMS 63 use-conditions and attribute assignments. This protects them against unauthorized modification and makes it possible to store them on less secured sites. However the policy certificates are self-signed and must therefore be considered as trusted information. They are co-located with the resources to which they apply and specify the sources of authority (SOA) for these resources. Akenti can used both the authorization push and pull sequence. In both cases the server is contacted when an access control decision is needed. The server then uses the relevant certificates (either submitted by the user or gathered by the server from the locations specified in the policy certificates) and makes its decision based on those. Currently Akenti uses a proprietary XML-based policy and assertion language, however the Akenti development team is considering the use of SAML as assertion language and XACML as policy language. Akenti does not support delegation of rights through paths of certificates. Instead resource owners who want to give administrative power to other users need to specify those in the policy certificates. Akenti therefore fails to meet some of our requirements. 4.6.3 PERMIS PERMIS [23] is also a certificate based access control system. It has been developed by the Information Systems Security Research Group of the University of Salford in the United Kingdom since 2001. It uses the authorization pull sequence and stores all relevant certificates in LDAP directories. Only the SOAs that may issue valid policies need to be stored locally with the PDP. PERMIS relies on the X.509 attribute certificates (AC) [45] to securely store role assignments and policies. This implies that PERMIS inherits some of the limitations of the X.509 AC specification as described in section 4.5, namely the limitation to one source of authority per set of attributes. Currently PERMIS uses a proprietary policy language but PERMIS developers are considering the use of the XACML policy language. PERMIS allows static delegation of roles from the SOA to a subordinate AA. This means that a central authority has to be contacted, and has to register subordinate AAs in its policy, before they are entitled to assign privileges. This delegation can be restricted by specifying a delegation depth limit in the role assignment. Once authorized to do so, AAs can assign roles by creating new role attribution ACs. PERMIS has no explicit support for ad-hoc permission granting, all permission assignments have to be done through the assignment of a user to a 64 CHAPTER 4. RELATED WORK IN ACCESS CONTROL role and permissions to the role. This, the limitations of the X.509 ACs and the reliance on the pull sequence make PERMIS unsuited for our application. 4.6.4 CAS The Community Authorization Service (CAS) [79, 78], developed since 2002 by the Globus Alliance, is an access control service for Grids. It builds on the concept of Virtual Communities (also called Virtual Organizations in other projects) that are defined as cross-organizational communities of users that share resources and cooperate for a common project. Each virtual community is granted bulk rights and runs a CAS that stores the information how these bulk rights are restricted for the individual members of its community. CAS uses an authorization push model, where users retrieve permissions from the CAS server acting as AAA-server. The fact that CAS centralizes access control information makes it a potential bottleneck and a trusted third party. A CAS can therefore become a single point of failure, e.g. if an attacker compromises a CAS server he has access to all resources granted to the community that this CAS manages. Furthermore the fact that each CAS is centrally managed and that resources grant bulk rights to the communities make fine grained data access control and ad-hoc granting of rights extremely difficult to manage. 4.6.5 VOMS The Virtual Organization Membership Service (VOMS) [2] was developed from 2001 to 2004 within the DataGrid project (IST-2000-25182). Its development continues within the EU project Enabling Grids for E-sciencE (EGEE, IST-2003-508833). VOMS is an access control service that is conceptually similar to CAS. A VOMS server stores group memberships for the members of a Virtual Organization (VO). The resource sites store the rights assigned to the various user groups. VOMS can be used in both authorization push and pull mode with the VOMS server acting as attribute authority. However VOMS does not provide the resource service to interpret the attribute statements it issues. It is therefore incomplete as access control service. Similar concerns as for CAS apply due to the centralization of access control information. 4.6.6 Cardea Cardea [65] is an access control solution for distributed systems. It has been developed since 2003 at the NASA Advanced Supercomputing (NAS) Divi- 4.6. ACCESS CONTROL SYSTEMS 65 sion of the NASA Ames Research Center in the USA. Cardea uses XACML as policy language and SAML to certify authorization information. Since the XACML is based on the pull model, the same concerns as for Akenti and PERMIS apply. This and the current limitations of XACML and SAML with regard to delegation mechanisms as described in sections 4.4 and 4.5 make this approach unsuited for our application. 4.6.7 PRIMA PRIMA [67, 66] is a Grid access control system, that has been developed since 2003 at the Department of Computer Science, Virginia Polytechnic Institute and State University, USA. PRIMA is a hybrid push/pull architecture, where user attributes are pushed to the PDP and global policies are pulled by the PDP. PRIMA specifically supports ad-hoc permission granting. It uses XACML as policy language and X.509 ACs for authorization. However the designers of PRIMA have implemented support for delegation through paths of certificates. PRIMA maps the data access permissions of a user to local POSIX.1e file system access control lists [27] or Grid Access Control Lists (GACL) [72]. This approach makes it more difficult to realize the Grid paradigm of integrating heterogeneous systems, since it requires one of those specific Systems to be deployed on all machines participating in the Grid architecture. 4.6.8 Summary In this section we discuss the access control architectures presented until now with regard to the constraints and requirements we have established in chapter 3. The results are presented in three tables. Table 4.1 summarizes the aspects related to the constraints of the medical application, table 4.2 shows the results relating to general principles of good security and table 4.3 shows the results with regard to the constraints of the Grid environment. Question marks in the tables indicate that the available documentation does not make it clear if the architecture fulfills the specific requirement. We now outline the reasons for the negative entries in the tables. • S1: PERMIS and Cardea are designed to use the authorization pull message sequence. Moreover PERMIS does not seem to support the RBAC concept of activating and deactivating roles. We must therefore assume that all roles are active at any time. Akenti can use both authorization pull or push message sequence. 66 CHAPTER 4. RELATED WORK IN ACCESS CONTROL • S2 and G5: In Shibboleth and VOMS, the local sites determine the access control policies related to all local resources (based on externally provided attributes). Therefore there may be different access rights to replicas of data stored at different sites and if a local storage site goes offline it may miss an update in the permissions concerning the stored data. • S3, S5 and S6: CAS and VOMS use a central server for the Community/VO. It stores authorization information for all members of the Community/VO in unprotected form and is therefore a trusted third party. While Shibboleth does not have such a centralized service, it assumes that the attribute authority for each user is his home organization. This hinders decentralized authorization systems, where attribute assertion may originate from distributed sources of authority. • G1: All the negatively rated systems require a source of authority to contact a permission storage in order to submit new permissions. Only then the PDP or the user can retrieve this authorization information in the process of an authorization decision. This procedure encumbers ad-hoc granting. • G2: CAS and VOMS require a centralized system in order to get access to authorization assertions. If the CAS or VOMS server breaks down, no user of the community/VO will be able to access any authorization assertions. • G3: The PRIMA system requires some specific software (POSIX.1e or GACL) to be deployed at system level on all machines providing Grid resources. • A1: The CAS documentation2 indicates that CAS supports user and object groups. However none of the requirements of RBAC (see section 4.2) are specifically addressed. • A3: In VOMS local sites have complete control over the permissions related to the data resources they store. Therefore the owner of a file who stores it on a Grid can not directly control its access permissions. • A4: It is impossible for an access control system alone to prevent circumvention of data access control by persons having access to the hard2 Available from http://www-unix.globus.org/toolkit/docs/development/4.0-drafts/security/cas 4.6. ACCESS CONTROL SYSTEMS 67 ware. This requires additional measures that are discussed in chapter 5. Shibboleth Akenti PERMIS CAS VOMS Cardea PRIMA Constraints of the application A1 A2 A3 A4 RBAC Traceability Owner managed circumvention data access control protection ? ? ? no yes yes yes no yes ? yes no no ? yes no yes ? no no yes yes ? no yes yes yes no Table 4.1: Summary of how different architectures respond to requirements of a medical application. CHAPTER 4. RELATED WORK IN ACCESS CONTROL 68 System Shibboleth Akenti PERMIS CAS VOMS Cardea PRIMA S1 Least privilege yes yes/no no yes yes no yes General principles of good security S2 S3 S4 S5 Permission Minimal use Separation Secure consistency of trusted of permission third parties duties storage yes ? ? yes ? yes yes yes yes no ? no no ? no yes ? ? yes yes ? no yes yes yes no yes yes S6 No centralized services no yes yes no no yes yes Table 4.2: Summary of how different architectures follow principles of good security. yes yes yes yes yes yes yes G6 Scalability Table 4.3: Summary of how different architectures respond to requirements of a Grid environment. Shibboleth Akenti PERMIS CAS VOMS Cardea PRIMA G1 Ad hoc permission granting no yes/no no no no ? yes Constraints of the Grid environment G2 G3 G4 G5 Dynamic Integration Local Transparency availability of heterogeneous hardware of data storage of resources systems control locations yes yes yes no yes yes yes yes yes yes yes yes no yes yes yes no yes yes no ? yes ? ? yes no yes yes 4.6. ACCESS CONTROL SYSTEMS 69 70 CHAPTER 4. RELATED WORK IN ACCESS CONTROL Chapter 5 Related Work in Storage Security When dealing with confidential data, the transparent and distributed nature of grid storage can become a problem. As described in chapter 3 an attacker who has physical or administrator access to the device providing the storage space is able to access the data using the local operating system. Such an access avoids the Grid access control mechanism. Integrating the Grid access control mechanism into the local file system would contradict the Grid paradigm of interoperating autonomous and heterogeneous resources without requiring fundamental changes in their operating systems. Furthermore such a mechanism would not prevent data disclosure from attackers having a physical access to the device, since the access control could be deactivated by mounting the disk under another operating system. Therefore additional protection is definitely needed for confidential data that are to be shared across a Grid. Some users believe that for our example application of medical data, anonymization and pseudonymization are sufficient measures of protection. To consider these arguments one has to differnciate between privacy protection and confidentiality. While a true anonymization would solve the problem of privacy protection, there may be cases, where confidentiality is nevertheless required, for example to protect some business information. Even if privacy protection is sufficient, one has to consider that true anonymization is hard to obtain, since confidential data can often be derived through secondary sources, which look innocuous at first. Total anonymization of medical data is often impossible, without loosing its usefulness. Good anonymization require painstaking case-to-case examination of the files and is therefore not feasible at the moment (efforts to automate this process are described in [25]). Encryption is therefore the best solution for storage security. However 71 72 CHAPTER 5. RELATED WORK IN STORAGE SECURITY only very limited algorithms exist to perform computations based on encrypted data (see [1], [32]). Therefore data will actually have to be decrypted before being used. Having decided to use encryption for secure storage, we have to deal with the following side-conditions: • The scientific issues related to the actual process of encryption and decryption of files for storage are out of the scope of this thesis. It is however important, that encryption is carried out before copying a file containing confidential information to the Grid and that the decryption happens after retrieving an encrypted file from the Grid. The files’ meta-data should contain all the necessary information about which encryption algorithm was used and all its parameters except the secret key. For ease of handling, it is preferable that these meta-data are contained in the header of the file. • Owners of encrypted files should be able to share them with user groups that are dynamically changing. This means that the users authorized to access the file are not known at moment of the encryption and may change during the lifetime of the encrypted file. This requires a mechanism that allows authorized users to access decryption keys when they need them. • Access to the decryption keys should be controlled via the normal finegrained file access control mechanisms of the Grid. This avoids inconsistent situations where a user is given access to the encrypted file by the Grid access control but is denied access to the decryption key. • As the loss of a decryption key also means losing the encrypted data, the storage of such keys needs to be fault tolerant and thus redundant. • Measures are to be taken to avoid collusion between the authorities that manage key storage and an attacker who has access an encrypted file. • When an access permission to an encrypted file becomes revoked, one has to decide how to deal with the encryption keys concerned by the revoked permission. Three options with increasing levels of security are available: The first option is to do nothing and rely on the access control mechanism to prevent access to the encrypted file. The second option is to do a lazy re-encryption. That means that the file is reencrypted with a new key, as soon as its content changes. The third option is an immediate re-encryption with a new key. When choosing 5.1. OVERVIEW OF ENCRYPTION ALGORITHMS FOR STORAGE 73 between those options one should consider that in a Grid environment no measure can protect against a malicious disclosure by an authorized user. Such a user can create unprotected copies of the file on the Grid. 5.1 Overview of encryption algorithms for storage In order to encrypt files for storage, the choice of the encryption algorithm has to be made. In this section we discuss some features of the available encryption algorithms that are relevant to our application. Given the fact that asymmetric encryption algorithms are by magnitudes slower than symmetric encryption of the same strength, we concentrate on symmetric algorithms for bulk file encryption. There are two types of symmetric encryption algorithms: block cipher algorithms and stream cipher algorithms. A block cipher applies a fixed, key dependent function on blocks of data (the size of these blocks is typically 64 or 128 bits although algorithms with variable block sizes exist). A stream cipher on the other hand uses the key to generate a pseudo-random stream of bits, that is XORed with the plaintext bits. The advantage a block cipher is that it allows random access to blocks of the encrypted data, whereas when using a stream cipher the entire previous cipher stream has to be calculated in order to access some specific piece of data. Furthermore when using a block cipher one can securely re-encrypt modified data with the same key, whereas this would be a major security risk with a stream cipher, since an adversary would be able to XOR both the original and the modified data together thereby eliminating the key stream and getting two plaintexts XORed to each other. Such a combination of two cleartexts is cryptographically easy to decipher. The advantage of stream ciphers is that generally stream ciphers are faster than equivalent block ciphers. In a test we ran with the Crypto++ library (version 5.1) on a 1.9 GHz Pentium 4, the stream cipher ARC4 encrypted at a rate of 24 MB/s, the stream cipher SEAL at 60 MB/s while the block cipher AES using a 128-bit key encrypted at a rate of 10 MB/s. Block ciphers can be operated in different modes that have several interesting characteristics for our application. We have examined the electronic codebook mode (ECB), the cipher block chaining mode (CBC) and the cipherfeedback mode (CFB). The ECB mode just encrypts the data block by block with no further modifications. 74 CHAPTER 5. RELATED WORK IN STORAGE SECURITY The advantages of ECB are that both encryption and decryption are parallelizable, that random access to blocks of an encrypted file is possible and that re-encryption of modified blocks with the same key is possible. The drawbacks of ECB is that it is relatively easy to make undetected manipulations of encrypted blocks of data and that this encryption mode does not conceal identical patterns between blocks of cleartext. This would allow an attacker to gain information about the content of the encrypted file, without having to decrypt it. Another drawback is that data must have a size that is a multiple of the cipher’s block size. Therefore the last block of data may be too short. The most common solution to this problem is to pad it with meaningless bits in order to make it fit. This means that the size of the encrypted data will be bigger than the size of the cleartext. Although the increase in size is very small (smaller than the block size of the cipher algorithm), sometimes this may still lead to problems, for example when the ciphertext is to replace plaintext stored in a database table cell having a fixed size. A method known as ciphertext stealing allows to keep ciphertext and plaintext the same size. It is presented for CBC mode in figure 5.2. For a description of ciphertext stealing in ECB mode, please refer to chapter 9 of [87]. The Cipher Block Chaining (CBC) mode, illustrated in figure 5.1 makes all ciphertext blocks dependent on the previous ciphertext blocks. The goal is to make manipulations of the plaintext detectable and to conceal identical patterns between blocks of plaintext in the ciphertext. The cost of this is that encryption is no longer parallelizable, however decryption still is, and random access to blocks of an encrypted file is still possible. Re-encryption under the same key is possible, however this requires the re-encryption of all following blocks. The Cipher Feedback (CFB) mode turns a block cipher into a stream cipher, by using the block cipher’s output as key stream. Block ciphers in CFB mode can operate on pieces of data smaller than the block size. This could be used for bit-by-bit encryption, however such a mode of operation would be very ineffective. The CFB mode is illustrated in figure 5.3. As for CBC it has the effect of concealing patterns in blocks of plaintext and making manipulations of the ciphertext detectable. As with CBC the encryption is no longer parallelizable, but decryption still is. Random access to blocks of an encrypted file is possible. Re-encryption requires to use a completely new initialization vector, since otherwise the same security issues as with normal stream ciphers would apply. The advantages and drawbacks of each encryption mode are summarized in table 5.1. For more details please refer to chapter 9 of [87]. 5.1. OVERVIEW OF ENCRYPTION ALGORITHMS FOR STORAGE 75 P1 IV P2 key Encrypt P3 key C1 C3 key key Decrypt Decrypt Decrypt P1 P2 P3 key Encrypt ... Encrypt ... IV C3 C2 C1 C2 key CBC Decryption CBC Encryption Figure 5.1: The cipher block chaining mode. Pi0 s are the plaintext blocks, the Ci0 s the ciphertext blocks and IV is a randomly generated initialization vector. Pn Pn−1 Cn−2 key Encrypt Cn−1 0 key key Decrypt Decrypt key Encrypt Cn Cn C’ Cn C’ Cn−1 Encryption Cn−2 0 Pn C’ Pn−1 Decryption Figure 5.2: Ciphertext stealing in CBC mode. Pi0 s are the plaintext blocks, the Ci0 s the ciphertext blocks. C’ is a temporary value that is not stored with the ciphertext. 76 CHAPTER 5. RELATED WORK IN STORAGE SECURITY Shift Register (with n−bit cells) key Shift Register (with n−bit cells) key Encrypt Select n leftmost bits Select n leftmost bits Pi Encrypt Ci Ci Pi CFB Encryption CFB Decryption Figure 5.3: An n-bit cipher-feedback mode. At the start of this operation, the shift register is filled with an initialization vector IV . The Pi and Ci are n bits long. The total size of the shift register is the blocksize of the encryption algorithm. ECB: - Plaintext patterns are not concealed. - Easy to manipulate encrypted blocks of data. + Parallelizable de/encryption. + Random access possible. + Re-encryption of modified blocks possible. CBC: + Plaintext patterns are concealed. +/- Plaintext somewhat difficult to manipulate. +/- Encryption is not parallelizable, decryption is. + Random access possible. +/- Re-encryption of all subsequent blocks after modification. CFB: + Plaintext patterns are concealed. +/- Plaintext somewhat difficult to manipulate. +/- Encryption is not parallelizable, decryption is. + Random access possible. - Re-encryption of the whole file after modification. Table 5.1: Summary of the advantages and drawbacks of block cipher modes with regard to encrypted storage. 5.2. STANDARDIZATION 5.2 77 Standardization The IEEE Computer Society has taken interest in storage security and has sponsored the Security in Storage Working Group (SISWG) 1 as a body to work on the definition of standards for cryptographic algorithms and methods for encrypting data for storage. The group has started its work in early 2004 and has produced three draft documents so far. A proposal for a key backup format [35], and two proposals of block-cipher modes for the AES algorithm, specifically suited for storage security [36], [34]. The publications presenting these new block-cipher modes suggest that they have the same positive properties as the CBC encryption mode and additionally they keep encryption and decryption parallelizable. However they double the necessary encryption operations, therefore it remains to be seen if the advantages of this proposal outweigh the loss of performance. We therefore prefer to let the cryptographic community study these proposals for some time, before taking the decision to use them. The global goal of SISWG’s efforts is to facilitate the interoperability of the encrypted data storage mechanisms and to reduce the risk of data loss through incompatibilities that may occur if encrypted data are accessed after a long period of encrypted storage. Since the work of the group is still relatively recent, and currently mainly deals with cryptographic algorithms it has no major impact on the proposals of this thesis. However one can expect that future standards issued by this working group will be more relevant to this approach. 5.3 5.3.1 Encrypted storage systems CFS The first system supporting encrypted storage was the Cryptographic File System (CFS) [12]. It was developed by Matt Blaze from AT&T Bell Laboratories in 1993. CFS provides encryption and decryption functionality at local file system level. It uses DES in a combination of the two encryption modes ECB and OFB (output feedback mode). The idea behind this encryption scheme is to allow modifications in encrypted file blocks without having to re-encrypt the rest of the file as would be necessary with the CBC encryption mode. The granularity of protection in CFS are directories, therefore fine-grained file encryption is not possible. No functionality for sharing encrypted data 1 http://www.siswg.org 78 CHAPTER 5. RELATED WORK IN STORAGE SECURITY and related decryption keys in distributed environments is provided. Later a key escrow system was introduced in[13]. It uses smartcards to build a bilaterally auditable escrow system, where both the key holder can verify that the escrow agent has not used the key and the escrow agent can verify that he holds a valid key without using it. Its purpose is to recover the key, should its main copy become unavailable. 5.3.2 TCFS The Transparent Cryptographic File System (TCFS) developed at the University of Salerno in Italy during 1997 [21, 22] works fundamentally the same way as CFS. TCFS was originally designed to use the DES cipher in CBC mode. However the newest version of TCFS [22] provides dynamic encryption modules that allow the user to choose the encryption algorithm. The latest version of TCFS also proposes a group sharing protocol. It allows a group of users to access a file, if a certain threshold number of group members participate in the access attempt. This mechanism does not work for distributed group access since all group members must log into the same workstation. 5.3.3 CryptFS CryptFS [103] is a stackable Vnode Level Encryption File System. It was developed around 1998 at the Computer Science Department of the Columbia University, USA. CryptFS extends the functionality of CFS to make it more efficient and resilient against insider attacks by integrating it in the kernel of the operating system. CryptFS uses the Blowfish [87] encryption algorithm in CBC mode on file data blocks of 4 of 8 KB. This means that each block is independent from the others and can be modified and re-encrypted separately. As CFS, CryptFS does not provide any file- and key-sharing mechanisms. 5.3.4 P. Gutmann’s SFS Peter Gutmann’s Secure FileSystem (SFS) [56] implements a cryptographic storage file system for MS-DOS. It was developed until 1995 while Gutmann was a graduate student at the University of Auckland, New Zealand. SFS uses the MDC/SHS encryption algorithm designed by Gutmann himself that turns a one-way hash function into a block cipher that runs in CFB mode. Schneier raises some concerns against this construction on p. 353 of 5.3. ENCRYPTED STORAGE SYSTEMS 79 [87], as hash functions are generally not designed to be used in that way and therefore their security for encryption use is not well researched. Although SFS has no support for file sharing, it has an interesting feature that makes it worth mentioning: an emergency key access mechanism, using Shamir’s secret sharing scheme [90] where the key is split into n shares which are distributed to trusted key escrow agents. The key can be recovered with any subset of m key shares. For any smaller subset, the recovery is computationally infeasible. Therefore at least m escrow agents must collude to gain access to the key. We have adapted this idea in our approach as described in section 7.1. 5.3.5 WinEFS The Windows Encrypting File System (WinEFS) [73], is delivered with Microsoft Windows NT 5.0/2000/XP and Windows Server 2003. WinEFS uses the DESX encryption algorithm with a 128-bit key or TDES with a 168-bit key. In export versions only 40 bits of the actual key are used. From the documentation of WinEFS it remains unclear which block-cipher mode is used. WinEFS uses the lockbox concept to make decryption keys available for authorized users. The idea of a lockbox is to store the decryption key for a file, encrypted with the public key(s) of the user(s) authorized to access the file. Figure 5.4 illustrates this concept. WinEFS stores these lockboxes in the file’s header. File sharing is done by adding a lockbox encrypted with the newly authorized user’s public key in the file header. This concept is infeasible to manage groups with dynamically changing membership, since it would require constant updates in the headers of the encrypted files. 5.3.6 SNAD The Secure Network Attached Disks (SNAD) [74] system has been developed around 2002 at the University of California, USA. SNAD uses the RC5 encryption algorithm with a key length of 128 bits in CBC mode. It handles file sharing in a similar way as WinEFS. The decryption keys are stored in lockboxes and associated to the files as meta-data. In contrast to WinEFS, a lockbox can be associated to multiple files (which means all those files have been encrypted with the same key). This design has the same limitations as WinEFS with regard to our requirements. 80 CHAPTER 5. RELATED WORK IN STORAGE SECURITY Symmetric key k public key pu Symmetric Encryption confidential file (a) private key pr Asymmetric Decryption Lockbox (c) Symmetric key k Asymmetric Encryption Lockbox encrypted file (b) Symmetric key k Symmetric Decryption Symmetric key k encrypted file (d) confidential file Figure 5.4: The lockbox concept. In step (a) a confidential file is encrypted with a symmetric key k. In step (b) this symmetric key k is encrypted with the public key of a user authorized to access the encrypted file. In step (c) this users retrieves the lockbox and opens it with his private key. Finally in step (d) this user decrypts the file with the symmetric key found in the lockbox. 5.3.7 Cepheus Cepheus [50] is cryptographic storage system supporting group sharing and random access. It was developed from 1998 to 1999 at the Massachusetts Institute of Technology (MIT) by Kevin E. Fu. Cepheus is based on parts of D. Mazières Self-certifying File System (also developed at the MIT) [71]. Cepheus uses the RC5 encryption algorithm in CBC mode. Similarly to TCFS, each file data blocks of 8 KB is separately encrypted in this mode using a different initialization vector each, thus allowing for random read and write access. In Cepheus, a file shared between a group of users is encrypted with a symmetric group key. A group database server maintains up-to-date group membership information for users. It stores the group key in lockboxes encrypted with the public keys of the group members. The group database server responds to requests from user agents and delivers group key lockboxes to group members. The file server communicates with the group database server to determine if a user has access to a specific file based on a group membership. This configuration requires the owner of the encrypted file to know all public keys of the group members with whom he wants to share the decryption key. If new users are added to the group, the owner of the file is 5.3. ENCRYPTED STORAGE SYSTEMS 81 required to update the group database by adding new key lockboxes. Such an approach is clearly too costly in distributed environments with dynamically changing user groups. 5.3.8 J.P. Hughes’ SFS The Secure File System (SFS)2 [60, 59] is a joint project between the University of Minnesota, USA and StorageTek Corp. which started in 1999. It aims at providing an easy to use cryptographic file system. The publications concerning SFS do not indicate which cryptographic algorithms are used within SFS. SFS proposes a group sharing mechanism that is also based on group servers. Each group server can manage key access for several subgroups. The header of an encrypted file contains an access control list (ACL) signed by the source of authority for that file. The ACL specifies groups and individual users that are allowed to access the file. For each individual entity that is allowed to access an encrypted file, a lockbox encrypted with that entities’ public key is provided in the ACL. The ACL may also contain shared authorizations, that require different entities to cooperate in order to gain access to the key. In this setting, the lockboxes contain shares of the file decryption key. For group access to encrypted files the header contains a lockbox encrypted with the public key of the group server. A user who wants to access to an encrypted file through a group membership must recover the ACL and send it to the corresponding group server. The group server uses the ACL to determine if the user is allowed to access to the file. If access is granted the server decrypts the lockbox and returns the file decryption key to the user. SFS also provides a smartcard interface for the secure storage of a user’s private key. All private key operations are performed on the smartcard. Therefore the private key never leaves the smartcard and is better protected than in classical password based protection schemes. The drawback of SFS is that the group server is a single trusted entity. It can decrypt any lockbox related to groups it manages and therefore is itself a valuable target for attacks. 5.3.9 C-SDA Chip-Secured Data Access (C-SDA) [18, 17] is a recent encrypted storage approach developed since 2002 at the PRISM laboratory of the University of 2 SFS by P. Gutmann, SFS by D. Mazières and SFS by J.P. Hughes et al. are completely unrelated 82 CHAPTER 5. RELATED WORK IN STORAGE SECURITY Versailles in France. It proposes sharing of encrypted files through the use of smartcards as tamper-resistant devices for storing access rights and decryption keys. In the C-SDA approach, every user must have a smartcard that stores the access rights and the decryption keys to the data he may access. If the user requests access to some encrypted data, the smartcart verifies if that user has the right to access that data and performs the decryption of the data for the user. The decryption keys never leave the smartcard and therefore there is no necessity to update the encryption keys, when access rights are revoked. The access rights and keys on the smartcard are updated from external servers at connection time (i.e. when the user inserts the smartcard in reader). The protection granularity of C-SDA are views of databases. As these are generated dynamically, no bijection between encryption and access rights exists. Therefore encryption must remain orthogonal to access rights in this approach. Our concern with this approach is that the smartcards are considered as tamper-proof devices. Since the discovery of side-channel attacks [62, 63], new methods of attacking smartcards and similar devices based on side-channel information are found at an alarming rate (e.g. [81], [69]) and therefore the protection mechanisms have to be updated frequently. 5.3.10 Summary In this section we have presented a summary of the main secure storage systems. We have concentrated on three characteristics: the granularity of encryption, the key sharing mechanisms and the special features (when present) that make the system noteworthy. The results of this summary are presented in table 5.2. Our conclusion is that a protection granularity of an encrypted storage system should allow to protect individual files and that current key sharing mechanisms are not suited for dynamically changing permissions. 5.3. ENCRYPTED STORAGE SYSTEMS Encrypted Storage System CFS Granularity of encryption Directory TCFS CryptFS Gutmann’s SFS WinEFS SNAD Cepheus Hughes’ SFS User Account User Account Partition File File File File C-SDA Database table views Key sharing No No No No Lockbox Lockbox Group server Group server & Lockbox Based on access rights 83 Special features Smartcard based escrow Threshold sharing Key sharing escrow Smartcard support Smartcard based management of keys and permissions Table 5.2: Summary of encrypted storage systems. 84 CHAPTER 5. RELATED WORK IN STORAGE SECURITY Chapter 6 Sygn access control In this chapter we describe the design and implementation of our access control architecture Sygn 1 . We first give an overview of the components and their deployment in section 6.1, then the syntax and semantics of the Sygn access control language is presented in section 6.2. Section 6.3 describes the meta-data used by the Sygn access control. A detailed description of the Sygn decision algorithm is given in section 6.4. The chapter finally closes with a discussion of Sygn in section 6.6. 6.1 Sygn overview As our main goal is to support ad-hoc access control decisions and decentralized delegation, we have decided to use authorization certificate paths for permission granting. We have opted for a permission push model for the reasons outlined in section 4.3, and users store all their permissions themselves. The permissions are protected against tampering by a digital signature. We distinguish between two kinds of users in the Sygn architecture: • Users owning a resource and acting as its source of authority: These users grant permissions that allow the use of their resource. For this purpose they use a Sygn owner client (SOC). • Users acting as resource consumers: These users want to access resources with their permissions. A Sygn user client (SUC) allows them 1 Sygn is a name from the Nordic mythology. It designs a goddess of truthfulness but also of doors and locks. She guards the entrance of the Wingolf palace and admits only the honest. 85 86 CHAPTER 6. SYGN ACCESS CONTROL to store owned permissions and retrieve resources according to these permissions. As a particular user may act both as owner or consumer, we have created a dual-use client that functions either as SAC or SUC, depending on the users actions. In figure 6.1, a resource owner installs a hardware resource or stores a data resource on a (possibly remote) resource server. The SAC contacts the resource server and registers the user as source of authority (SOA) for this resource in the local Sygn server’s meta-data-base (step 1). To grant access to this resource to a resource consumer (step 2) the resource owner issues authorization certificates that allow access to the resource. Note that this process involves only the owner and the consumer(s), and can be done offline. The SUC allows a user to store and retrieve authorization certificates as needed. To access the resource, the user contacts the corresponding resource server (we assume that the localization of the correct server is realized by the Grid middleware) and submits the request along with the needed certificates as shown in step 3. The resource server needs two different Sygn modules. The Sygn PDP that produces an access control decision based on the Sygn request provided by the user and a Sygn PEP. The PEP has two primary functions. First it has to make sure that the Sygn request corresponds to the Grid request the user submitted (otherwise a user could submit valid permissions for one resource together with a Grid request concerning another one). Second it uses the Grid authentication mechanism to verify if the request issuer really owns the permissions he submits. The Sygn PDP decides if the request is correct and authorized. For traceability the Sygn server can be configured to log all requests. Non-repudiation of those logs can be achieved by another server configuration option, that makes the timestamping and digital signature of requests by the issuer mandatory. As the Sygn PDP is completely separated from the Grid middleware, only the PEP has to be re-implemented for different Grid middlewares, depending on the requests this middleware allows and the used authentication procedures. Thus we have created a Sygn integration module for the µ-Grid middleware 2 , that supports basic file access requests like: get-file, delete-file and put-file. 2 Available from http://www.i3s.unice.fr/∼johan/ugrid/ugrid.html 6.2. SYNTAX AND SEMANTICS OF THE SYGN LANGUAGE SOC meta− data Sygn owner client 1) Stores resource and Sygn SOA metadata Sygn user client PDP meta− data Sygn PEP 3) Uses AC(s) to access resource 2) Issues AC(s) for resource Resource owner Sygn PDP 87 Resource server SUC meta− data Resource consumer Figure 6.1: Deployment of, and interaction between Sygn modules on a Grid. 6.2 Syntax and semantics of the Sygn Language As explained in the previous section the Sygn access control language is designed to support a decentralized management of permissions through the use of certificates. The Sygn access control model supports both role based access control for flexible management of permissions and discretionary access control for fine-grained, ad-hoc granting of permissions in inter-institutional resource sharing scenarios. For the reasons specified in section 4.4.4 we have chosen XML[20] to represent the Sygn access control language. Basically the role of every access control language is to express which actions an access control subject may perform on an access control object. The Sygn access control language is used for the expression of both, requests and permissions. A XML schema definition of the elements presented in this chapter can be found in appendix A. 6.2.1 Subjects Sygn recognizes two types of access control subjects: Individual entities and roles. Subjects can be granted permissions and can be sources of authority (SOA) for some access control object. Following the concept of SPKI (see section 4.5.3) Sygn identifies individual entities using their public key. Such an identifier is encoded as User identifier (UID) as shown in figure 6.2. The corresponding private key is used for authentication of the entity and for signing authorization certifi- 88 CHAPTER 6. SYGN ACCESS CONTROL cates if the entity acts as SOA for an access control object. Note that the public key is considered sufficient as unique identifier. Sygn does not associate a distinguished name to public keys for user identification such as it would be the case in X.509[80] certificates. Given the size of the namespace (valid public/private-key pairs) it is extremely improbable that two users will accidentally be assigned the same UID. 00 <USER_ID> 01 MIGdMA0GCSqGSIb3DQEBAQUAA4GLADCBhwKBgQDqmTTMboHuJ7 02 LuhajR/tdhu/WhdKLPca4b4LYFiOzkkB0aCa1KUBhoZAz0VU+R 03 xTvSx9cORUl3+t8rHwPPusq39RK+Sr3pPho+KL+IfzlhqRRx9O 04 TSiPgSvTEGXllVd2VYnjV8ssoguzsCsMZRKcQXXmreDHbWF9sK 05 KYT76aUraQIBEQ== 06 </USER_ID> Figure 6.2: An example of a user identifier (UID) in Sygn using a 1024 bit RSA public key. A role is identified by its name and the UID of its SOA. The UID of the SOA hereby forms a namespace prefix under which the role’s name must be unique (i.e. two roles are equal if they have the same name and the same SOA). The SOA can grant the role to other subjects, including other roles. Granting activation of a role A to a role B makes role A hierarchically inferior to role B thus B inherits all permissions of role A. The following example illustrates these facts. Consider the following permissions: • P1 permits read access on f ile I to role A • P2 permits write access on f ile I to role B • P3 permits activation of role A to role B Then role B is said to be hierarchically superior to role A, since users who can only activate role A can only read f ile I while users who can activate role B can also activate role A and thus read and write f ile I. If the SOA of a role could be another role, that would allow two paradoxical situations: First a role A that is SOA of role B and vice versa, which would make the hierarchy graph cyclic. Secondly an infinite recursion might be created, where ∀i ∈ N role Ai has role Ai+1 as SOA. While the first paradox simply makes no sense, the second one could be used to create unnaturally 6.2. SYNTAX AND SEMANTICS OF THE SYGN LANGUAGE 89 big certificates in order to crash the system. The benefits of allowing a first role to be SOA of second role, would be that every member of the first role would be SOA of the second. This can also be achieved more cleanly by giving the first role the right to delegate activation of the second role. We have therefore decided that only user identifiers (UID) may be used as SOA’s for a role. Sygn encodes a role in a Role identifier (RID) as shown in figure 6.3. The field containing the review-repository indicate a location where copies of all permissions concerning this role should be stored for review. Such review functions are required in standard RBAC(see section 4.2.3). Note that within architecture of Sygn it is impossible to enforce that every SOA who grants a permission to a role also sends a copy to the role’s review-repository. However such functionality could be integrated in a standard permissiongranting interface that is provided to the Grid users. 00 <ROLE_ID> 01 <ROLE_SOA> 02 <USER_ID> 03 MIGdMA0GCS...0/OfMwIBEQ== 04 </USER_ID> 05 </ROLE_SOA> 06 <ROLE_NAME> 07 nurse/station1B/SomeHospital 08 </ROLE_NAME> 09 <REVIEW_REPOSITORY> 10 http://repository.SomeHospital.fr:4711 11 </REVIEW_REPOSITORY> 12 </ROLE_ID> Figure 6.3: An example of a role identifier (RID) in Sygn. Sygn supports a special subject identifier, the <ANY SID/> tag. This identifier matches equal with any other subject identifier. It can therefore be used to grant permissions to everyone. 6.2.2 Objects Sygn currently allows to identify four types of access control objects. However the Sygn architecture allows to add more object types easily. The current object types allow to identify files, file-sets, role-objects (roles used as objects) and hardware resources. Every object is identified by a name and a SOA. The 90 CHAPTER 6. SYGN ACCESS CONTROL SOA can be any Sygn subject, with the exception of role-objects where the type is necessarily an UID. As with role subjects, the identifier of the SOA forms a namespace prefix under which an object’s name must be unique. Semantically an object’s SOA is considered to have all permissions on that object, and therefore the SOA is therefore never required to grant himself any such permissions explicitly. File identifiers (FID) are used to identify single files for fine-grained access control decisions. The logical filenames that are part of a FID can be generated by Sygn, as a result of a cryptographic hash applied to the files content. However this is not mandatory and can be replaced by any other mechanism for naming files. Sygn access control assumes that when files are replicated on Grid storage, the replicas get the same file identifier as the original. This allows Sygn permissions to be consistently applied to any replica of a file. File-set identifiers (FSID) allow to address a set of files. This makes it possible to group files together into a set and issue global permissions on that set. Single files identified by FID’s can be added to a set thus making all permissions concerning the set valid for that specific file. Furthermore a file-set can be added to another file-set, thus making the first a subset of the second (i.e. all permissions concerning the second file-set will also apply to the first). Resource identifiers (RESID) are used to identify hardware resources such as storage space on a disk or computing power provided by CPUs. Role object identifiers (ROID) are identical to roles, the different name is just used to distinguish between roles used as subjects and roles used as objects of Sygn permissions. Figure 6.4 shows a file-set identifier, that has a role as source of authority. 00 <FILE_SET_ID> 01 <SET_SOA> 02 <ROLE_ID> 03 ... 04 </ROLE_ID> 05 </SET_SOA> 06 <SET_NAME> 07 research_project_42B3701_files 08 </SET_NAME> 09 </FILE_SET_ID> Figure 6.4: An example of a file-set identifier (FSID) in Sygn. 6.2. SYNTAX AND SEMANTICS OF THE SYGN LANGUAGE 6.2.3 91 Actions Sygn supports a set of basic actions which can easily be extended if need arises. The different actions are specific to the object types and newly introduced actions should be assigned to one or more object types too. Currently file based objects (i.e. files and file-sets) support the actions read and write that allow to read or write that file-based object. For a file-set, that means the action is applicable to any file in the set. The action add to set allows to add a file-based object to a file-set or to grant the permission to do so. Consequently the action remove from set gives the right to revoke certificates that add file-based objects to a file-set. Roles currently only have the activate action that allows to activate the role and use its permissions. Hardware resources have the grant and the use actions. The grant action is used to grant a certain amount of hardware use (measured by an external metric) to a user. It therefore has an additional attribute specifying the numerical value that is granted to the user. The use action serves for the actual requests to use the granted resource. The use action therefore only appears in requests and not in permissions. It also has a numerical value attribute requesting the amount of the granted hardware resource. Figure 6.5 shows the encoding of a grant action with it’s additional attribute. 00 <ACTION SIZE="1000"> 01 grant 02 </ACTION> Figure 6.5: An example of a grant action of size 1000 in Sygn. 6.2.4 Capabilities Sygn defines a capability as a legal combination of an object and an action. Capabilities are used to grant or request specific actions on specific objects. To allow future versions of Sygn to ask users if they can produce certain capabilities without disclosing their content, each capability has a unique identifier generated by calculating a cryptographic hash over the XML-encoded data of the capability. Figure 6.6 shows the encoding of a capability giving read access to a file. 92 CHAPTER 6. SYGN ACCESS CONTROL 00 <CAPABILITY> 01 <CAPABILITY_ID> 02 hEqrpFH6tN1w0FRNjSI0EWIPRi4= 03 </CAPABILITY_ID> 04 <OBJECT> <UNIQUE_FILE_ID> 05 <FILE_SOA> <USER_ID>MIG...BEQ==</USER_ID></FILE_SOA> 06 <LOGICAL_FILENAME>+/AbBuY...xe88=</LOGICAL_FILENAME> 07 </UNIQUE_FILE_ID></OBJECT> 08 <ACTION> read </ACTION> 09 </CAPABILITY> Figure 6.6: An example of a capability allowing to read a file. A special type of capabilities allows to express that an object is added to a specific file-set. This capability has a second object which is the target file-set. Figure 6.7 shows the encoding of such a capability. 00 <CAPABILITY> 01 <CAPABILITY_ID> 02 hEqrpFH6tN1w0FRNjSI0EWIPRi4= 03 </CAPABILITY_ID> 04 <OBJECT> <UNIQUE_FILE_ID> 05 ... 06 </UNIQUE_FILE_ID></OBJECT> 07 <ACTION> add_to_set </ACTION> 08 <SECOND_OBJECT> <FILE_SET_ID> 09 ... 10 </FILE_SET_ID></SECOND_OBJECT> 09 </CAPABILITY> Figure 6.7: An example of a capability that adds a file to a file-set 6.2.5 Authorization Certificates The basic building block of a Sygn permission is the Authorization Certificate (AC). It permits to bind a capability to a subject and to specify various conditions related to the use of the capability. Basically a Sygn AC consists of: 6.2. SYNTAX AND SEMANTICS OF THE SYGN LANGUAGE 93 • An identifier generated by calculating a cryptographical hash of the other AC data. • A creator, who is identified by a UID, and whose private key is used to sign the AC. • An owner, also called the subject of the permission. • A capability that is given to the owner by the creator, it contains the object and the action of the permission. • Validity limits (not before and not after ), represented by two timestamps. • Restrictions on the use of the permission (not with). • A delegation depth limit and • a digital signature that makes unauthorized modifications of the permission detectable3 . Figure 6.8 shows an example Sygn permission certificate encoded in XML. Please note that the file’s SOA in this example (line 07) is not identical to the AC creator (line 02). Therefore this permission may only be validated by other ACs that give the creator of this AC the right to delegate read access on the file. The AC’s identifier (line 01) is used for revocation of authorization certificates. An AC may be revoked by its creator or by the SOA of its object. The AC’s owner (line 03) is the subject of the permission. This may either be an individual user represented by an UID or a role. If the AC’s owner is a role, any user who can activate that role may use the capability of this AC. If the certificate’s capability adds an object to a file-set, the owner of the AC must be the source of authority of this file-set (or the <ANY SID/> tag). Sygn ACs support restrictions (line 16) in form of a sequence of roles identified by their RID, that may not be used in requests together with this AC. This allows to enforce dynamic separation of duties (DSD) as described in section 4.2.3. These restricted roles are enclosed by <NOT WITH> tags in the XML representation of an AC. The delegation depth limit (line 17) is an integer that specifies, how many steps the AC’s capability may be delegated. A limit of zero means the capability may not be delegated at all. Any limit greater than zero allow the 3 We currently use the RSA signature algorithm with SHA-1 hashing and PKCS padding. However due to recent advances in cryptographic attacks on SHA-1 this may change in future versions of Sygn 94 CHAPTER 6. SYGN ACCESS CONTROL 00 <AUTHORIZATION_CERTIFICATE> 01 <ID>bA1lxGTYDd3eHT/gr/6B1N4dCWU=</ID> 02 <CREATOR><USER_ID>MIG...UraQIBEQ==</USER_ID></CREATOR> 03 <OWNER><USER_ID>MIG...wdUQIBEQ==</USER_ID></OWNER> 04 <CAPABILITY> 05 <CAPABILITIY_ID>8k9AiT...cMGbr8U=</CAPABILITIY_ID> 06 <OBJECT><UNIQUE_FILE_ID> 07 <FILE_SOA> 08 <USER_ID>MIG...j4jQIBEQ==</USER_ID> 09 </FILE_SOA> 10 <LOGICAL_FILENAME>+/Ab...e88=</LOGICAL_FILENAME> 11 </UNIQUE_FILE_ID></OBJECT> 12 <ACTION>read</ACTION> 13 </CAPABILITY> 14 <NOT_BEFORE>2003-10-01T10:23:02Z</NOT_BEFORE> 15 <NOT_AFTER>2004-10-01T12:22:03Z</NOT_AFTER> 16 <NOT_WITH> 17 <ROLE_ID> ... </ROLE_ID> ... 18 <NOT_WITH> 19 <DELEGATIONS>1</DELEGATIONS> 20 <SIGNATURE>qcIiRa...mbCZHCH6zBeOtc1OV5Byw=</SIGNATURE> 21 </AUTHORIZATION_CERTIFICATE> Figure 6.8: An example of an AC where the creator (line 02) grants read access (line 12) on a file (lines 06-11) to the owner of the AC (line 03), with the right to delegate this capability one step (line 19). AC owner to delegate the AC’s capability in a certificate path, by creating a new AC with a delegation depth limit reduced by at least one. It is enclosed by a <DELEGATIONS> tag in the XML representation of an AC. 6.2.6 Certificate Paths In order to permit multi-step delegation and the use of permissions assigned to roles (which require previous activation of the role) Sygn defines certificate paths that target a certain capability for a certain user. If a path is valid for a certain user, this means that he may use the path’s target capability. Each path has a target capability and an ordered set of ACs. In order to prevent denial of service attacks that make use of abnormally long certificate chains (and thus fill the available memory) a maximum length for a path can be 6.2. SYNTAX AND SEMANTICS OF THE SYGN LANGUAGE 95 configured. Figure 6.9 shows the XML encoding of such a certificate path with only two certificates. The target of this path is a capability that allows to read a file. If the path is to be valid, the two certificates should grant this capability to a certain user. 00 <PATH> 01 <TARGET><CAPABILITY> 02 <CAPABILITIY_ID>8k9AiT...cMGbr8U=</CAPABILITIY_ID> 03 <OBJECT><UNIQUE_FILE_ID>...</UNIQUE_FILE_ID></OBJECT> 04 <ACTION>read</ACTION> 05 </CAPABILITY></TARGET> 06 <AUTHORIZATION_CERTIFICATE> 07 ... 08 </AUTHORIZATION_CERTIFICATE> 09 <AUTHORIZATION_CERTIFICATE> 10 ... 11 </AUTHORIZATION_CERTIFICATE> 12 </PATH> Figure 6.9: An example of a certificate path containing two certificates, that grants read access (line 05) to a file (line 04). The certificate path verification algorithm is presented in section 6.4. 6.2.7 User requests In order to gain access to a resource the user must submit a request to the Sygn Policy Decision Point (PDP). This request follows the Sygn standard user request format (SURF). A request contains the UID of the request issuer and one or more request paths. As each path targets one capability it is possible to gain access to resources that need multiple capabilities simultaneously (e.g. two roles activated in parallel). Optionally, the Sygn-PDP can be configured to require the request issuer to sign his request. To avoid replays of old requests, the request is assigned a timestamp. Figure 6.10 shows such a signed user request. 6.2.8 Sygn-PDP responses The Sygn-PDP responses are also structured XML documents. They consist of three blocks: 96 CHAPTER 6. SYGN ACCESS CONTROL 00 <SURF> 01 <REQ_ISSUER> 02 <USER_ID> ... </USER_ID> 03 </REQ_ISSUER> 04 <ISSUE_TIME> 2005-01-18T17:23:00Z </ISSUE_TIME> 05 <ISSUERS_SIGNATURE> 06 HXAI6CPU....UebjcLBGIdIFjNGs/ikB2pOOK2w= 07 </ISSUERS_SIGNATURE> 08 <REQ_PATH> 09 <PATH NUMBER="0"> ... </PATH> 10 <PATH NUMBER="1"> ... </PATH> 11 <PATH NUMBER="2"> ... </PATH> 12 </REQ_PATH> 13 </SURF> Figure 6.10: An example of a signed request (SURF) containing three certificate paths. • The request status, which is either granted, denied or failed. A status of failed indicates that a component of the PDP has malfunctioned causing the verification of the request to fail (e.g. the database storing the meta-data is unavailable). • A global description of errors (if any), that caused the request to fail or to be denied. • A detailed description of errors for every path (if any), that caused this path to fail or to be denied. Figure 6.11 shows such a response. 6.2.9 Extensibility Sygn is designed to be easily extensible. The extension points are the subjects, objects and actions that are supported by the Sygn language. New Sygn subjects can be added by creating a new instance of the abstract class subject identifier. New subjects need to provide methods for exporting and importing the subject to and from an XML document. Those subjects can subsequently be used as SOA of any Sygn objects with the exception of ROIDs where the SOA must be a UID. 6.2. SYNTAX AND SEMANTICS OF THE SYGN LANGUAGE 97 00 <SYGN_RESPONSE> 01 <REQUEST_STATUS> denied </REQUEST_STATUS> 02 <GLOBAL_ERROR> Path error </GLOBAL_ERROR> 03 <PATH NR="0"> 04 <STATUS> denied </STATUS> 05 <ERROR> 06 Certificate 3 is invalid: certificate has expired 07 </ERROR> 08 </PATH> 09 <PATH NR="1"> 10 <STATUS> granted </STATUS> 11 <ERROR> none </ERROR> 12 </PATH> 13 </SYGN_RESPONSE> Figure 6.11: An example of a Sygn-PDP response to a specific request. New Sygn objects can also be added by creating new instances of the abstract class object identifier. New objects can have parameters that differ from existing ones, provided they have an object-name and a SOA. As with Sygn subjects, an export and an import method to and from XML for new object-types must be implemented. Furthermore the object must implement a function that returns it’s SOA. Sygn actions can also be extended by adding new action names to the list of Sygn action names. The semantics of these actions must be interpreted by the Sygn-PEP in order to enforce access decisions. 6.2.10 Formal representation The Sygn language can be represented by the following set of productions as a grammar. Terminal Symbols are: any sid, public key, string, url, integer value, timestamp, signature. UID -> public_key ROLE_SOA -> UID ROLE_NAME -> string REVIEW_REPOSITORY -> url RID -> ROLE_SOA, ROLE_NAME, REVIEW_REPOSITORY 98 CHAPTER 6. SYGN ACCESS CONTROL SID -> any_sid | UID | RID FILE_SOA -> SID LOGICAL_FILENAME -> string FID -> FILE_SOA, LOGICAL_FILENAME SET_SOA -> SID SET_NAME -> string FSID -> SET_SOA, SET_NAME RESOURCE_SOA -> SID RESOURCE_NAME -> string RESID -> RESOURCE_SOA, RESOURCE_NAME ROID -> RID OID -> FID | FSID | RESID | ROID ACTION -> string, integer_value | string OBJECT -> OID SECOND_OBJECT -> FSID CAPABILITY -> OBJECT, ACTION | OBJECT, ACTION, SECOND_OBJECT CREATOR -> UID OWNER -> SID NOT_BEFORE -> timestamp NOT_AFTER -> timestamp DELEGATION -> integer_value NOT_WITH -> RID | NOT_WITH, RID AC -> CREATOR, OWNER, CAPABILITY, NOT_BEFORE, NOT_AFTER, NOT_WITH, DELEGATIONS, signature | CREATOR, OWNER, CAPABILITY, NOT_BEFORE, NOT_AFTER, DELEGATIONS, signature COMMAND -> name | name, UID | name, FID | name, AC | name, RESID | name, UID, RESID, integer_value AC_CHAIN -> AC | AC_CHAIN, AC PATH -> CAPABILITY | COMMAND | CAPABILITY, AC_CHAIN | 6.3. PDP META-DATA 99 CAPABILITY, COMMAND | CAPABILITY, AC_CHAIN, COMMAND ISSUER -> UID SURFSIG -> timestamp, signature PATHES -> PATH | PATHES, PATH SURF -> ISSUER, PATHES | ISSUER, SURFSIG, PATHES For the sake of shortness and easy comprehension this grammar does not address the size limitation of paths and the maximum number of paths in a request. The grammar is context-free and could be made regular by transforming some of the productions, however this would make it longer and less understandable, since some of the semantic information would be destroyed. 6.3 PDP meta-data Every Sygn-PDP needs a certain amount of meta-data to operate. This metadata is stored locally in a relational database. Since the security of the local resources depends on this meta-data, it is important to protect it adequately. Therefore no user but the local administrator should be able to access this database directly. Updates of the meta-data by remote users are submitted to the Sygn-PDP using an extension of the Sygn access control language that allows to create certificate paths that support an administrative command. After validation of such a path, the Sygn-PDP executes the command. Figure 6.12 shows the XML encoding of a certificate path that contains an administrative command. The following paragraphs explain the different meta-data and the conditions that allow to remotely update them. The most important meta-data are the sources of authority for all hardware resources and files controlled by the local Sygn-PDP. These form the root of trust for access decisions and are therefore essential to the functionality of the access control system. Sources of authority for hardware resources need to be entered manually into the database, whereas for files they are stored automatically once a new file is copied onto a local storage resource. Note that it is possible for any user having access to a file, to make a copy of this file on the Grid with a new file identifier and thus to declare himself source of authority for this copy. There is no reasonable way of preventing this in a distributed resource sharing environment. Therefore traceability must act as deterrent against such fraudulent behavior. The Sygn-PDP verifies that the logical filename of a new file does not duplicate one of a FID that is already registered in the local meta-data base. Only the source of authority for a file or hardware resource may remotely delete his registration. This 100 CHAPTER 6. SYGN ACCESS CONTROL 00 <PATH> 01 <SYGN_COMMAND> 02 <COMMAND_NAME> 03 add_file_soa 04 </COMMAND_NAME> 05 <PARAMETER NR="1"> 06 <UNIQUE_FILE_ID> ... </UNIQUE_FILE_ID> 07 </PARAMETER> 08 </SYGN_COMMAND> 09 </PATH> Figure 6.12: An example of a path containing an administrative command adding a source of authority for a file to the local meta-data base. Such a path could be included in a request. is usually done after the file has been deleted or the hardware resource has been taken offline, as subsequently no Grid access to the resource is possible anymore. Accounting for the use of granted hardware resources is also registered in the Sygn meta-data base. This is done by the Sygn-PEP itself, so remote administration of this meta-data is not possible. Sygn relies on some external measurement to provide the numerical value for the amount of use. This measurement could possibly be integrated in the Sygn-PEP. This means that resource administrators have to define a metric that measures the use of their hardware resource. For storage this can simply be the amount of storage space used, or a formula based on the time a certain amount of storage space is used. For CPU power the calculation of a usage metric becomes more difficult. One could imagine a formula based on processor cycles, priority and memory usage. These mechanisms need to be flexible since the exact execution time of a program can often not be known prior to its execution. An important piece of meta-data maintained by Sygn is a blacklist. Requests and permissions issued by any UID on the blacklist are rejected by the local Sygn-PDP. This mechanism allows the local administrators to deny access to specific users and locally invalidate the permissions they have issued. This mechanism also allows to invalidate rapidly all permissions issued by a user whose private key has been stolen. The blacklist is maintained by the local Sygn administrators therefore it can not be administrated remotely. In order to allow more fine-grained revocation of permissions, the Sygn meta-data also includes a certificate revocation table. For remote revocation of a certificate, a user must submit the valid certificate, of which he must 6.4. PDP ALGORITHM 101 be the issuer. Alternatively the source of authority for the capability of a valid certificate is also allowed to revoke it remotely. After verifying that the certificate is valid, the Sygn-PDP enters the certificate’s identifier into the revocation table thereby invalidating it locally. This table should be updated on a regular basis using Grid wide Certificate Revocation Lists (CRL) in order to maintain a coherent status of revoked permissions. Following the proposal within SPKI, this CRL should be issued at fixed points in time, previously known to the local resource providers. Therefore an attempt to delay a revocation by making a denial-of-service attack against the CRL distribution site can be noticed and appropriate measures can be taken. The revocation table also stores the expiration date of revoked certificates, and Sygn provides mechanisms for the administrator to erase revoked certificates that have expired. If the tracing mechanisms are turned on, the Sygn meta-data base also stores a log-table, that records all requests that pass through the Sygn-PDP. For each request, a locally unique request number and the exact time when the request occurred are stored. Furthermore the contents of the request and the PDP’s response are also recorded. Remote updates of this log-table are not supported. Only the local administrators can read the log-table. 6.4 PDP algorithm The policy decision algorithm decides if a certificate path grants the permission to use a target capability to a specific user. It is the core of the Sygn architecture and it is based loosely on the principle of complete induction, but includes a global memory. We start by giving an example path which we use to illustrate how the algorithm works, then we describe a simplified presentation of the algorithm as automaton and finally we give the full, formal definition of the algorithm. The example path which is illustrated in figure 6.13 is composed of four certificates. It gives Edgar the right to read the file document.txt. In the first certificate AC 1 Alice, who is the SOA for document.txt, grants Bob the capability to add document.txt to any file-set. AC 2 which is issued by Bob now adds document.txt to the file-set Set A. The SOA for Set A is Carol. In AC 3 Carol grants read rights on Set A to a role Role B. The SOA of Role B is Dave. Finally in AC 4, the last certificate of this path, Dave grants activation of Role B to Edgar. 102 CHAPTER 6. SYGN ACCESS CONTROL Path target = read on ’document.txt’ Path target−object = ’document.txt’ Source of authority for ’document.txt’: Alice AC 1 : Alice AC 2 : Bob can add target−object to a Set AC 3 : Carol Role B (SOA: Dave) read Set A Bob Carol add target−object to Set A (SOA: Carol) AC 4 : Dave Edgar activate Role B Figure 6.13: A complete, correctly linked path of certificates. Edgar can present the chain of these four certificates AC 1 − 4 to a SygnPDP in order to gain read access to the file document.txt. The input parameters of the algorithm are the target capability consisting of a target action and a target object and a request issuer, who is the specific user to whom the path should grant the permission to use the target capability. The used variables are the current target of the path, which has the initial value target but which may vary from target if the target object is assigned to a set. current target consists of a current action which is always equal to target action and a current object which is updated when the current object is added to a set. The automaton representation of the algorithm is illustrated in figure 6.14. The automaton has three basic states and four intermediate states which deal with the delegation of roles and the declaration of role hierarchies. The transitions between states are triggered by the current certificate of the path. The nature of the certificate that triggers a change to a specific state is indicated within the state. Transitions can be subject to further conditions, not directly related to the current certificate. These are indicated next to the transition arrows. The basic states are: • Granting the permission to use the current target capability and the associated state that can be entered if this permission is granted to a role. • Addition of the current object to a set and the associated state that can be entered if the set’s SOA is a role. The set to which current object is added becomes the new current object. In this state, set hierarchies are declared implicitly if current object itself is a set. 6.4. PDP ALGORITHM 103 • Granting the permission to add current object to a set and the associated state that can be entered if this permission is granted to a role. In our example, the value of target and the initial value of current target would be (document.txt, read). The request issuer would be Edgar. The first certificate AC 1 leads the automaton to State I, since it grants the permissions to add current object to a set. AC 2 triggers a transition from State I to State II, since it adds current object to a set. Current object changes from document.txt to Set A. With AC 3 the automaton passes to State III since the permission to use the capability current target is granted. However since AC 3 is granted to a role, the automation immediately passes on to State IIIa. Finally AC 4 grants the activation of Role B to Edgar, since Edgar is request issuer we pass to the terminal state of the automaton. The formal implementation of the algorithm has three condition sets, the start-conditions, the induction conditions and the end-conditions. In order to be valid the first certificate of the path must fulfill the start-conditions, every tuple of consecutive certificates must fulfill the induction-conditions and the last certificate must fulfill the end-conditions. The algorithm’s global memory stores the current target capability of the path that varies through the verification procedure if the object of the target capability is assigned to a set. If a permission in the certificate path is granted to a role, the global memory stores this role as current role for checking subsequent delegations of this role. The memory also includes a has target flag that indicates if the owner of the current certificate has been granted the permission to use the target capability, a can add to set flag that indicates if the owner of the current certificate is empowered to add the current object to a set, and a role valid flag that indicates if current role has been set and may be delegated in the next certificate of the path. INPUT: The user identifier of the request issuer. A target capability with target := (target object, target action). An ordered set path cert chain := {ac1 , ..., acn } of n authorization certificates: aci := (creatori , owneri , capi ). OUTPUT: true if the path grants the right to use the target capability to the request issuer, f alse otherwise. CHAPTER 6. SYGN ACCESS CONTROL 104 State 0 grant role activation delegation or role hierarchy declaration grant to role SOA of current_object is a role State IIIa grant role activation delegation or State I grant add_to_set on current_object grant to role set SOA is a role delegation State II add current_object to a set. current_object := set set hierarchy declaration State III State Ia grant role activation delegation or role hierarchy declaration State IIa grant role activation delegation or role hierarchy declaration This font: Comments delegation grant to request_issuer grant current_target grant to role hierarchy grant to request_issuer request_issuer declaration This font: Variables This font: Conditions and state actions Figure 6.14: The informal representation of the Sygn algorithm as an automaton. A state change is initiated by a certificate. The text of the states indicate which type of certificate will initiate a transition to this state. Further conditions on state changes, not related to the current certificate are indicated next to the transition. 6.4. PDP ALGORITHM 105 VARIABLES: current target := (current object, current action) : The current target capability. current role : The current role that may be delegated (if any). creatori : The creator of aci . owneri : The owner of aci . capi : The capability of aci . seti : If aci adds an object to a set then seti is this set, undefined otherwise. can add to set : Boolean variable, indicates that the owner of this certificate can add current object to a set. has target : Boolean variable, indicates that the owner of this certificate was granted the right to use target capability. role valid : Boolean variable, indicates if the current role can be delegated. FUNCTIONS: soa : {object identif iers ∪ subject identif iers} → subject identif ier. If the input is an object identifier, returns the SOA for that object, if the input is a role returns that role’s SOA, if the input is a user identifier returns that identifier unchanged. grant add to set : capabilities → {true, f alse}. Returns true if the capability grants the permission to add the current object to a set, and f alse otherwise. adds to set : capabilities → {true, f alse}. Returns true if the capability adds the current object to a set and f alse otherwise. grants role : capabilities × roles → {true, f alse}. Returns true if the capability grants the permission to activate the role and f alse otherwise. is role : subject identif iers → {true, f alse}. Returns true if the subject is a role and f alse otherwise. // Start conditions verification current target := target // The initial current target is the path’s target if creator1 6= soa(soa(current object)) then // The first certificate must be created by the SOA of current object. // The double use of the soa() function is due to the fact that // this SOA may be a role. In that case we want creator1 to be // the role’s SOA. return f alse end if if cap1 = current target and grants add to set(cap1 ) = f alse then 106 CHAPTER 6. SYGN ACCESS CONTROL // This certificate grants current target to someone, and not // the permission to add current object to a set. has target := true can add to set := f alse if is role(owner1 ) then role valid := true current role := owner1 else role valid := f alse end if else if is role(soa(current target)) and grants role(cap1 , soa(current target)) then // Here the SOA of current object is a role and the permission // to activate this role is granted in this certificate. has target := true can add to set := true current role := soa(current target) role valid := true else if adds to set(cap1 ) = true then if owner1 6= soa(soa(set1 )) then // The algorithm requires the owner of a certificate // that adds an object to a set to be the set’s SOA. return f alse end if // If cap1 adds current object to a set, this set becomes the // new current object for the path. The creator of the next // certificate is the SOA of this set. current object := set1 has target := true can add to set := true if is role(owner1 ) then role valid := true current role := owner1 else role valid := f alse end if else if grants add to set(cap1 ) = true then // If cap1 grants the permission to add current object to a set, // this means that the owner of the ac1 has not been granted // the permissions to use current target capability. Therefore // the path should not end here. 6.4. PDP ALGORITHM has target := f alse can add to set := true if is role(owner1 ) then role valid := true current role := owner1 else role valid := f alse end if else // Neither has target been granted to owner1 , nor has a role // that is SOA of target been granted, nor has the // current object been added to a set, nor has owner1 been // granted the right to add current object to a set. Therefore // this path starts incorrectly. return f alse end if //Induction conditions verification, go through the certificate path for i := 1 to n − 1 do if creatori+1 6= soa(owneri ) then // The creatori+1 must be the owneri or if owneri is a role, // then that role’s SOA must be the creatori+1 . return f alse end if if has target = true and capi+1 = current target then // Here the permission to use current target is granted and not // the permission to add the current objet to a set. can add to set := f alse if is role(owner1 ) then role valid := true current role := owner1 else role valid := f alse end if else if role valid = true and grants role(capi+1 , current role) = true then // Here the current role is delegated. The internal memory // does not change. NOP else if is role(owneri ) = true and grants role(capi+1 , owneri ) = true then 107 108 CHAPTER 6. SYGN ACCESS CONTROL // The owner of certificate aci was a role. Therefore this // role can be delegated in certificate i + 1. current role := owneri role valid := true else if can add to set = true then if adds to set(capi+1 ) = true then if owneri+1 6= soa(soa(seti+1 )) then // The algorithm requires the owner of a certificate // that adds an object to a set to be the set’s SOA. return f alse end if // Here the current object is added to a set. Therefore // the set becomes the new current object and as // the owneri+1 is the set’s SOA, he also has the // permission to use target capability. current object := seti+1 has target := true if is role(owner1 ) then role valid := true current role := owner1 else role valid := f alse end if else if grants add to set(capi+1 ) = true then // Here the permission to add current object to a set is // granted. This means that owneri+1 has not been granted // the permission to use target capability. Therefore the path // should not end here. has target := f alse if is role(owner1 ) then role valid := true current role := owner1 else role valid := f alse end if else // The capability should either have added current object to // a set or should have granted the right to add it to a set. return f alse end if else 6.4. PDP ALGORITHM 109 // Neither has target been granted, nor has the current object // been added to a set, nor has owneri+1 been granted the // right to add current object to a set. This path is false. return f alse end if end for // End conditions verification if has target = f alse then // The path may not end if target capability is not // granted to ownern . return f alse else if soa(ownern ) 6= request issuer then // The permission to use target capability must be granted // to the request issuer or to a role for which // the request issuer is SOA. return f alse else return true end if If we apply this algorithm to the example, we have the initial value of current target which is equal to target = (document.txt, read). The path complies with the start conditions since the creator of AC 1 is Alice, who is the SOA of the current object. Since the permission to add current object to a set is granted, the condition grants add to set(cap1 ) evaluated to true. Therefore the has target flag is set to f alse and the can add to set flag is set to true. Since the owner of AC 1 is not a role, the role valid flag is set to false. We now enter the verification of the induction conditions, where AC 2 which is issued by Bob adds document.txt to the set Set A. AC 2 complies with the induction conditions, since Bob is the creator of AC 2. The flag can add to set is equal to true and the condition adds to set(capi+1 ) evaluated to true. Furthermore Carol is the owner of AC 2. Therefore the current object is set to Set A. The can add to set and the has target flags are set to true, since Carol as SOA of Set A has the permission to use the target capability and can grant the permission to add Set A to other sets. In AC 3 Carol grants read rights on Set A to a role Role B. The SOA of Role B is Dave. AC 3 also complies with the induction conditions, since Carol is its creator. The conditions cap1 = current target and grants add to set(cap1 ) = f alse both evaluate to true. Therefore The 110 CHAPTER 6. SYGN ACCESS CONTROL has target flag stays true and the can add to set flag is set to f alse. Since the owner of AC 3 is a role, the role valid flag is set to true and current role is set to the value Role B. Finally in AC 4, the last certificate of this path, Dave grants activation of Role B to Edgar. This certificate complies with the paths induction conditions, since the conditions role valid = true and grants role(capi+1 , current role) evaluate to true. The internal memory does not change and the end conditions are checked. Since has target is true and Edgar is the owner of AC 4 the end conditions are also fulfilled and the algorithm returns true. 6.5 Sygn performance In this section we give the results of some performance tests that we made with the Sygn-PDP. We measured the execution time of the PDP’s decision algorithm for the processing of a request without counting the time for network connection setup, authentication and request transmission. The tests were run on a 1.9 GHz Pentium 4 under SuSE Linux 9.0. For cryptographical primitives we used the Crypto++ library (version 5.1) by Wei Dai 4 . All the program code was written in C++. The algorithm includes the verification of the validity of all certificates by checking the expiration date, the signature and the local certificate revocation list (the latter requires a MySQL database query). Each parameter combination was run through the PDP 5000 times and the total execution time was averaged. Except for the path complexity checks all paths were direct delegations, that did not involve roles or file sets. Except for the multi-path checks, all requests included only one certificate path of variable length. The signatures of the certificates were created and verified with the RSA digital signature algorithm, using PKCS padding and the SHA1 hash-function. We have examined the following factors: • Impact of the length of the path on the execution time (i.e. number of certificates in the path). We measured paths ranging from one to seven certificates. • Impact of the length of the signature keys. We used 1024, 2048 and 4096-bit RSA keys. • Impact of the database queries on execution time. 4 Available from http://cryptopp.com 6.5. SYGN PERFORMANCE 111 • Complexity of the path. We used paths that delegated permissions through file sets and roles. • Number of paths in request (while keeping the number of certificates constant) in order to verify the overhead of setting up a path verification. The complexity of the path had no significant impact on the duration of the PDP’s decision procedure. The number of paths in the request has a very low impact on the execution time (about 1 ms). The two main factors that influence the execution time are the length of the signature keys and the number of certificates in a path. Therefore the signature verification is a key factor in the execution time of the algorithm. The measurements suggest that the decision time increases linearly with the number of certificates in the request. A third factor for the execution time are the database queries, however their impact is comparatively small (about 2 ms for a path containing seven certificates). Figure 6.15 shows the results of measurements with different key sizes and path lengths. 40 certificate signature key size (in bits): "1024" "2048" "4096" 35 Decision time in ms 30 25 20 15 10 5 0 0 1 2 3 4 5 6 Number of certificates in the request 7 8 Figure 6.15: The performance of the Sygn-PDP for different certificate signature key lengths and different numbers of certificates in the path. 112 CHAPTER 6. SYGN ACCESS CONTROL As these results indicate, the time needed for the decision finding process in the Sygn-PDP is extremely short, compared to time that is taken by the general overhead of setting up a connection, transferring data or executing jobs. 6.6 Discussion Sygn proposes to use a permission push model as described in section 4.3. Since Sygn ACs are not solely intended as short lived permission certificates, but also for long-term permission storage, it is necessary to provide a revocation mechanism to be able to invalidate a permission before the AC that granted it expires. This drawback, which is inherent to the push model has to be weighted against the advantages of the push model. The user can choose and submit exactly the ACs needed for the requested actions, thereby allowing to follow the least privilege principle. Furthermore the user can choose exactly which permissions are disclosed to the different Grid services. The approach to bind permissions to public keys has a drawback compared to binding permissions to distinguished names: When the corresponding private key is stolen, all permissions bound to the public key must be revoked. It is not sufficient to rely on the revocation of the authentication certificate containing this public key, since the Sygn PEP only verifies that the request issuer is correctly authenticated. Therefore a correctly authenticated request issuer may use delegated permissions created with a stolen private key. On the other hand the direct binding of permissions to public keys makes the verification of permission integrity easier and requires a much smaller authorization data transfer volume. If a permission is bound to a distinguished name in a certificate, the entire certificate chain for the public key of the creator of this certificate is necessary, in order to verify his digital signature. Such a certificate chain would have to be submitted for every authorization certificate in a path. A central feature of Sygn is the support for decentralized permission granting. Different SOAs can administrate access control to fine-grained resources without intervention of a third party. However this feature makes it impossible to know for sure the entire set of permissions given to a specific user or role. Therefore the results of any permission review functions (which are required in standard RBAC, see section 4.2.3) are not necessarily complete. Enforcing such a completeness would require a centralized validation of all permissions and would negate the advantage of decentralized, ad-hoc permission granting. Therefore review functions have to rely on the goodwill of the respective SOAs. It is the SOA’s responsibility to store duplicates of all 6.6. DISCUSSION 113 permissions they issue in corresponding review repositories. Another effect of this situation with regard to RBAC is that it becomes impossible to enforce a static separation of duties (i.e. tuples of permissions that can not be granted to the same entity). However since a dynamic separation of duties can be enforced through the use of restrictions, this second drawback is only of lesser concern. Another central feature of Sygn are its delegation mechanisms. Following the approach of SPKI [39] (see also section 4.5.3) we have examined three choices for delegation control: 1. No control. All users can delegate any of their permissions. 2. Boolean control. A flag specifies if delegation is allowed or not. 3. Delegation depth control. A non-negative integer specifies how many levels of delegation are allowed. What speaks for the first option is that there is no way to prevent users from sharing their private key used for authenticating with others. Alternatively users could also set up a service that signs any challenges to allow their impersonation without having actual access to the private key. Therefore attempts to restrict delegation would be ineffective and would possibly weaken the protection of private key material. We have not chosen this option since we believe that security education of users should prevent such situations from happening. If users have such bad security practices as giving away their private keys, no system will be able to really protect any resources on a Grid from unauthorized access. A way to enforce a secure handling of private key material by untrained users, could be to deploy hardware tokens that allow to perform private key operations but keep the key material locked on the device. The argument in favor of both other options is that it can be necessary to specify if a permission can be delegated or not. If entities where entitled to delegate any permissions they hold, this would increase the risk of a misuse. Since the SOA of a resource can be held partially responsible for a misuse, even if a delegated permission was used, it is important that a differentiation can be made between entities that are given a permission and entities that are allowed to delegate the same permission to others. The creators of the SPKI Certificate Theory argue that depth control does not give real control over the proliferation of a delegated permission, since only the depth and not the width of the delegation tree can be controlled. Even though this argument is valid, we still believe that depth control is to be preferred over boolean control, since it allows to enforce a flat delegation tree and therefore make it easier to track down the responsible secondary 114 CHAPTER 6. SYGN ACCESS CONTROL SOA if a misuse of permissions is detected. Another point in favor of depth control is that Grid architectures commonly use proxying mechanisms to create temporary credentials out of long-term credentials (see [100, 93] for details). These proxy credentials are then delegated the permissions required to execute a given task. In a boolean delegation control architecture that would mean that every user needs delegation power over all of his long-term permissions, thus making the delegation control almost ineffective. With the depth control, this situation can be resolved without negating the benefit of delegation control, by allowing one level of delegation for permissions that need to be given to a proxy. We have therefore chosen to implement a delegation depth control in Sygn. Sygn offers support for RBAC, and it can also be used in parallel to create and handle discretionary access control (DAC ) permissions. This allows to adapt the type of permissions to the situation in which they are used. If a complex (possibly hierarchical) permission structure with authorizations based on tasks is present, RBAC can be used. For ad-hoc permission granting or in similar situations where RBAC is too cumbersome to use, Sygn can handle DAC permissions that are more easy to create and use. In [9] Bertino et al. suggest, that the access control for hardware resources should already be considered in the resource allocation process, to avoid allocating resources to which the request issuer does not have access. These considerations have to be taken into account when using Sygn for hardware resource access control. Clearly the resource broker needs to be able to verify if an allocated resource is really accessible for a certain user request. A possible approach would be to have the resource broker submit authorization requests to the local Sygn-PDPs of the resources, on behalf of the request issuer. Finally the structure of Sygn requests allow to support scenarios where multiple permissions are needed simultaneously. A simple example of this would be the replication of a Grid-file. Such an operation requires read permissions on the file in question and access to a certain amount of disk space at the replication site. By using a Sygn request with multiple paths, the authorizations for such operations can be grouped together in a convenient way. Chapter 7 CryptStore encrypted storage This chapter describes the design and implementation of our encrypted storage architecture CryptStore. Section 7.1 presents CryptStore and motivates the key-servers as central idea for the CryptStore system. We then present the architecture of CryptStore and how CryptStore can be used on a Grid in section 7.2. Section 7.3 presents the syntax and the semantics of the CryptStore meta-data. In section 7.4 we present the algorithms of the CryptStore architecture. We analyze the risk of attacks on CryptStore and how to mitigate these in section 7.5 and conclude with a discussion of the different CryptStore design choices in section 7.6. 7.1 Basic concepts of CryptStore The CryptStore architecture gives an entity, that is SOA of a file, the option to encrypt it before storing it on the Grid. CryptStore provides a file administration client that can be integrated in a Grid interface or used separately. This tool performs the encryption of the file and generates the necessary meta-data. In order to give authorized users access to decryption keys, key-servers are deployed on various Grid sites as part of the CryptStore architecture. They function as repositories for decryption keys and can be queried by users who whish to decrypt data they are authorized to access. To avoid making keyservers valuable targets for attacks we have chosen to distribute shares of the various encryption keys on multiple key-servers using Shamir’s secret sharing algorithm [90]. The characteristics of this algorithm are described in section 7.4. The file administration client handles the tasks related to the encryption of a file, the generation of key-shares and the connection to the key-servers 115 116 CHAPTER 7. CRYPTSTORE ENCRYPTED STORAGE in order to store the key-shares and related meta-data. To access an encrypted file, a file user client is provided by CryptStore. It handles the recovery of key-shares from the key-servers, the reconstruction of the key from the shares and the decryption of the file. The key-shares are subject to access control based on the requesting user’s file access permissions (i.e. if the access control grants a user access to a file, this user also has access to the decryption key of that file). The key-servers therefore have a generic access control interface that can be instantiated to make them interact with any access control system present on the Grid. If the access control system works in a decentralized way, an instance of it can be co-located with the keyserver. If the access control mechanism uses a permission push architecture, the user has to provide the necessary credentials to the file user client to enable it to recover the key on his behalf. If the file clients are not part of a Grid interface they can access key-servers using a simple client-server model, where the key-servers are queried by the clients. The connection is authenticated and secured by SSL and requires the key-server to open a port on the host machine. CryptStore deals with four distinct scenarios concerning access control permissions of encrypted files: • The users that are subject of the permission and the files that are object of the permission are individually known when the permission is created and do not change. Such an authorization structure is illustrated by figure 7.1. This situation would allow to directly transmit the decryption key to all authorized users via a secure channel (e.g. direct connection secured with TLS/SLL or IpSec, encrypted mail). • The users that are subject of the permission are individually known as above. The files that are object of the permission are identified by a file-set which may change dynamically as files are added or removed from the set. Figure 7.2 shows such an authorization structure. In such a scenario the lockbox concept presented in chapter 5 could be used to provide authorized users with decryption keys. • The files that are object of the permission are known and do not change. The users that are subject of the permission are identified by a group or role, and membership may change dynamically. This authorization structure is represented in figure 7.3. In such a scenario a group key could be used, that gives all members of the group access to lockboxes containing the decryption keys. However every time a member is removed from the group, the group key and all the file keys have to be updated, all the files have to be re-encrypted and the lockboxes have 7.2. ARCHITECTURE AND USE OF CRYPTSTORE 117 File or static set of files Allows access to Uses to gain access Permission Gives permission to User or static user group Data SOA Figure 7.1: A simple scenario of authorizations. Dynamic set of files Adds or removes objects to/from the set Allows access to Uses to gain access Permission Gives permission to Data SOA User or static user group Figure 7.2: An authorization scenario, where the permission object is a set of files. to be re-created. Since the files are fixed and known this may still be feasible. • The users that are subject of the permission and the files that are object of the permissions are both identified by a group (or role, or set) and membership of both is changed dynamically. This case is shown in figure 7.4. For this setting we need key-servers that store key-material and decide dynamically who may access a key. Authorized users can contact the key-servers and recover the necessary key-shares in order to decrypt a file. 7.2 Architecture and use of CryptStore The CryptStore system consists of three components that are deployed on the Grid: The administrator-client, the user-client and the key-server system. 118 CHAPTER 7. CRYPTSTORE ENCRYPTED STORAGE File or static set of files Group SOA Adds or removes users to / from group Uses to gain access Allows access to Permission Gives permission to Data SOA Dynamically changing user group Figure 7.3: An authorization scenario where the permission subject is a group (or a role) consisting of multiple users. Dynamic set of files Add or remove files to/from the set Data SOA’s Group SOA Adds or removes Allows access to Uses to gain access users to / from group Permission Gives permission to Set SOA Dynamically changing user group Figure 7.4: An authorization scenario where the permission subject is a group (or a role) consisting of multiple users and the permission object is a set of files. 7.2. ARCHITECTURE AND USE OF CRYPTSTORE 119 The administrator-client allows file owners to perform the following actions: • Encryption of a file. • Creation of a message authentication code (MAC) with the same key used to encrypt the file in order to ensure file integrity. • Creation of key-shares. • Storage of key-shares on a key-server. • Storage of encryption parameters and key-server location information in the file meta-data. • Update of key-server data. The administrator-client can be deployed as part of the user Grid interface or as a stand-alone Grid tool. The user-client allows Grid users that want to access an encrypted file to perform the following actions: • Extraction of key-server locations from an encrypted file meta-data • Access to a key-server in order to retrieve a key-share (user must provide authentication and possibly authorization tokens). • Reconstruction of a key from retrieved key-shares. • Decryption of an encrypted file including extraction of the encryption parameters from an encrypted file meta-data As the administrator-client, the user-client can be deployed as part of the user Grid interface or as stand-alone Grid tool. The key-server system is set up at different Grid resource sites and provides the following services: • Storage of key-shares and associated file-id. • User-client interface that allows to gain access to key-shares. • Generic interface to access control services to determine who may access a key-share based on file access permissions. • Deletion of key-shares by the owner of an encrypted file. 120 CHAPTER 7. CRYPTSTORE ENCRYPTED STORAGE Storage Server Data owner using 3.) Stores data and addresses of administrator−client the key servers on the storage server 1.) Encrypts data and creates key shares 5.) Retrieves data and addresses of the key servers 4.)Gives permissions to user or user group Data consumer using user−client 2.) Stores key shares on different key servers 6.) gets key shares from the key servers 7.) Reconstructs key from key shares and decrypts data secure transfer optional: Key Servers secure tranfer mandatory: Figure 7.5: The use of CryptStore for encrypted file storage and access. The update of a key-share is transparent to a key-server, since the administrator client handles this as the deletion of the old share and the storage of the new share separately. The key-server uses some standard C++ libraries and a C++ interface library for the MySQL database system. These libraries have to be deployed on the Grid resource site in order to run the key-server. Figure 7.5 illustrates how these components work together. In a first step, the owner of a file encrypts the data, and creates the key-shares using his administrator client. In a second step he stores the keyshares on different randomly selected key-servers. He can then generate the meta-data header of the encrypted file that contains encryption parameters and the locations of the key-servers. In a third step he stores the encrypted data (including the meta-data header) on a Grid storage server. The fourth step that can be temporally disconnected from the first three is to use the Grid access control mechanism to give some user or user group access to the encrypted file. This step is performed outside the CryptStore architecture, using available Grid access control tools, as for example Sygn. The new actor now is the user that wants to access the encrypted file. With his access permissions he retrieves the encrypted file from the storage server using normal Grid file access mechanisms in a fifth step. The user can now read the associated meta-data from the file header using the CryptStore user-client. With this information the user-client can query the corresponding 7.3. CRYPTSTORE META-DATA 121 key-servers in a sixth step in order to retrieve the key-shares. In the final step the user-client reconstructs the key from the key-shares and uses it to decrypt the data. 7.3 CryptStore meta-data CryptStore requires a certain amount of meta-data to function. The current design of CryptStore adds these meta-data in an unencrypted form in the headers of the encrypted files as it is required to locate key-servers and to configure the decryption algorithm. The rationale behind this is that the size of the meta-data will generally be relatively small compared to the size of the file and therefore a small increase in file size will not be relevant. We are however aware of situations where this assumption would not hold true. For example if we want to encrypt database entries, where the table columns are of fixed size, an increase of the size of the column data may not be possible. In such a case the design of CryptStore would have to be marginally changed in order to allow the storage of the meta-data externally to the encrypted file. We discuss the reasons why we have chosen the first option in section 7.6. The meta-data required by CryptStore are the following: • The encryption algorithm and the encryption mode if several modes are possible (e.g. ECB, CBC, CFB for block ciphers). • The initialization vector (sometimes also be called nonce or tweak depending on the encryption mode) used, if any. • The encryption key size in bytes. • Optionally: The algorithm used for the generation of the message authentication code. • Optionally: The message authentication code. • The threshold value of the secret sharing algorithm (i.e. the number of key-shares required to recover the key). • Information about the key-servers that store shares of the file decryption key. If a simple client/server model is used these are the URLs and port numbers under which the key-servers are accessible. Chapter 8 discusses other possible forms of deploying CryptStore, which may make changes in this format necessary. 122 CHAPTER 7. CRYPTSTORE ENCRYPTED STORAGE CryptStore itself requires no further meta-data. However if a decentralized access control is deployed on the Grid, an access control server may be colocated with the key-server. This server checks which users are authorized to obtain which keys based on their normal file authorizations. This access control server needs a meta-database containing the SOA’s of the files for which key-shares are stored, in order to have a root of trust for its access decisions. These meta-data are collected at the moment a file SOA stores a new keys-share in a key-servers database. Furthermore the SOA of a file can also choose to store a message authentication code in the file header. This allows to check the integrity of the encrypted file. The meta-data stored in the file header are encoded in XML, as this is a widespread structured format that is still human readable. Figure 7.6 shows an example of such a file header. It specifies that the following file was encrypted with the AES algorithm, using the cipher-feedback mode, with a key size of 16 bytes, using the given initialization vector. It features a message authentication code generated with HMAC using the SHA-1 hash and gives the value for message authentication. Furthermore it specifies that at least two key-shares are needed to recover the key. Then it gives the addresses including the port numbers of three key-servers that each hold one key. This last information implies that three key-shares were initially created. Appendix B gives an XML Schema definition of CryptStore meta-data headers. 7.4 CryptStore algorithms CryptStore uses different cryptographic algorithms and protocols to perform its functions. In this section we present those algorithms and motivate their choices. 7.4.1 Cryptographic algorithms CryptStore uses three different cryptographic algorithms: A file encryption algorithm, a message authentication code (MAC) for file integrity and Shamir’s secret sharing scheme. For file encryption, we have chosen the AES algorithm, since it is standardized, widespread and very efficient. The AES is a block cipher algorithm with variable key length and a block size of 128 bits. We have chosen a block cipher rather than a stream cipher because stream ciphers do not allow random access to parts of encrypted files and re-encryption of a modified file 7.4. CRYPTSTORE ALGORITHMS 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 123 <ENCRYPTION_PARAMETERS> <ALGORITHM> AES_CFB </ALGORITHM> <KEYSIZE> 16 </KEYSIZE> <IV> 7msW8+augercHfk0oE4zjA== </IV> </ENCRYPTION_PARAMETERS> <MAC> <ALGORITHM> HMAC_SHA1 </ALGORITHM> <DIGEST_VALUE> qVqCQDDlwODNtcGqFKZ47olQ524= </DIGEST_VALUE>"; </MAC> <KEYSHARING_INFORMATION> <THRESHOLD> 2 </THRESHOLD> <KEYSHARE_SERVER> if.insa-lyon.fr:4711 </KEYSHARE_SERVER> <KEYSHARE_SERVER> nn.cern.ch:1234 </KEYSHARE_SERVER> <KEYSHARE_SERVER> liris.cnrs.fr:1764 </KEYSHARE_SERVER> </KEYSHARING_INFORMATION> Figure 7.6: An example of a CryptStore meta-data header specifying how the file was encrypted, a message authentication code for verifying the file’s integrity with a message authentication code and information where keyshares can be retrieved. with the same key is not secure when using a stream cipher. Since medical data files may become very large, and rapid access to some part of the file may be necessary at some time, it is important to support random access to parts of an encrypted file. As mode of operation for the AES we have chosen the cipher block chaining mode (CBC) with ciphertext stealing (CTS) as presented in section 5.1. This mode of operation ensures that a manipulation of the encrypted data in order to change the plaintext becomes difficult and that patterns in the plaintext blocks are hidden in the encrypted blocks. Furthermore CTS enables us to keep the size of the ciphertext equal to the size of the plaintext. In order to ensure the integrity of the data, we have chosen to use message authentication codes rather than a public key digital signature scheme. The reason for this is that multiple users may be updating the same file. Therefore if a signature scheme was used, these users would either have to share a key-pair for signing this file or would have to provide their public key for signature verification to every potential reader of the file. This would make cumbersome public key distribution mechanisms necessary and could 124 CHAPTER 7. CRYPTSTORE ENCRYPTED STORAGE m,n Share secret k st , ..., st 0 m−1 Reconstruct secret s0 , ... , sn−1 k Figure 7.7: The concept of Shamir’s secret sharing algorithm. The number of shares is n, the threshold to reconstruct the secret k is m. The algorithm produces a set of n shares si ; {ti } denotes a set of different indices 0 ≤ ti < n. easily lead to confusion on which public key was used for the actual signature of the file. A message authentication code uses a secret key to create a code that allows the verification of the file integrity by other users having access to this secret key. In CryptStore we use the same key for the file encryption and the generation of the code. This means that a user who has only read rights can update a code for an encrypted file. We assume such manipulations are prevented by Grid access control mechanisms that should not allow such a user to write back a modified version of the file. We have chosen the HMAC algorithm [6] with the SHA-1 hash function to generate the message authentication codes. The reason for this is that HMAC is standardized and widely used (e.g. in the IPSec protocol). The algorithm used to split the file encryption key into shares is Shamir’s secret sharing scheme [90]. The basic functionality of this algorithm is the following: For a given secret that a user wants to share between n participants the user chooses a threshold m with n ≥ m that indicates how many shares are needed to reconstruct the secret. The algorithm takes the secret, n and m as input and produces a set S of n shares so that any subset of S containing at least m different shares can be used to reconstruct the secret. Furthermore the algorithm has the property that no subset of S containing fewer than m different shares allows to deduct the secret or to reduce the complexity of an exhaustive search for the secret. Figure 7.7 illustrates this concept for a secret k and parameters n and m. It shows the generation of a set of shares si with i = 0...n. 7.4. CRYPTSTORE ALGORITHMS 7.4.2 125 Request handling CryptStore handles two types of requests: One between the owner of a file and a key-server and a second between a user wishing to retrieve a key-share and the key-server. Communication between users and file owners is not part of the CryptStore design. CryptStore assumes that any user who has access to the Grid can store key shares at a key-server, provided that he does not overwrite existing keyshares belonging to other users. Therefore only an authentication is necessary to submit a pair of key-share and file identifier to a key-server. The possible requests are: • Store a key-share belonging to a file, identified by its logical filename (lfn). Such requests come from file owners or users who have write permissions on a file. • Delete a key-share belonging to a file, identified by its lfn. Such requests come from file owners or users who have write permissions on a file. • Retrieve a key-share belonging to a file, identified by its lfn. Such requests come from file owners or users who have read or write permissions on a file. As already mentioned updates of key-shares are treated as separate delete and store requests for the key-servers. Figure 7.8 shows examples for each possible type of request from a file owner to a key-server. The first one to store a key-share, and the second to delete a key-share related to a specific file. The CryptStore key-share retrieval requires the requesting user to be authenticated. If the authentication is successful the user’s request is considered. The form of the request depends on the Grid access control system to which CryptStore has been linked. If it uses a pull message sequence system, then the request consists only of the file identifier for which the user wishes to retrieve a key-share. If the access control uses a push message sequence, then the request must contain authorization assertions in addition to the file identifier. The key-server contacts a Grid access control server and requests an authorization decision whether the user is allowed to access the encrypted file. If the response is positive the key-server returns the corresponding key-share. Figure 7.9 shows an example request from a user to a key-server, requesting the release of a key-share. Appendix B gives an XML Schema definition for CryptStore requests. 126 CHAPTER 7. CRYPTSTORE ENCRYPTED STORAGE 01 <cryptstore_request> 02 <request_type> store_keyshare </request_type> 03 <lfn> +/AbBuY...xe88= </lfn> 04 <keyshare>AAAAABWlEpxrg...j7x3yk= </keyshare> 05 </cryptstore_request> 01 <cryptstore_request> 02 <request_type> delete_keyshare </request_type> 03 <lfn> +/AbBuY...xe88= </lfn> 04 </cryptstore_request> Figure 7.8: Examples of a file owner requests to a CryptStore key-server. The first request is to store a key-share belonging to a file identified by a lfn, and the second is to delete the same key-share. 01 <cryptstore_request> 02 <request_type> retrieve_keyshare </request_type> 03 <lfn> +/AbBuY...xe88= </lfn> 04 </cryptstore_request> Figure 7.9: Examples of a file user request to a CryptStore key-server. The request asks for a key-share belonging to a file identified by a logical file name (lfn). 7.5 Security Analysis In this section we analyze possible attacks on CryptStore and discuss how to mitigate these threats. We do not consider attacks based on social engineering and on malicious hardware, since those are out of the scope of this thesis and can be applied against any cryptographic storage system. We do also not consider attacks against the Grid’s file storage mechanisms since CryptStore does not interact directly with those. The following attacks could be tried against CryptStore: 1. Attacks on the encryption algorithm, with the goal to disclose the content of a file stored encrypted with CryptStore. 2. Attacks on the message authentication scheme with the goal to hide unauthorized modifications of a file stored with CryptStore and protected by a message authentication code. 7.5. SECURITY ANALYSIS 127 3. Attacks on the key-share transfer with the goal of disclosing or falsifying the transferred key-share. 4. Impersonation of authorized users in order to gain access to key-shares. 5. Byzantine attacks by malicious key-servers. 6. Attacks on the key-servers, especially on the MySQL database used for the storage of key-shares, with the goal of either disrupting availability of the services or with the goal of disclosing key-shares. 7. Malicious modifications of the CryptStore software. 8. Attacks on the CryptStore user client, especially on the client database. As CryptStore uses the AES encryption algorithm and currently no feasible attacks on the AES are known, the first attack poses no real threat at the time this thesis is written. Cryptographical breakthroughs, leading to new attacks on the AES may make it necessary to change the algorithm used by CryptStore and to re-encrypt all files. However users should be aware that in such a case data may still be compromised, since attackers may have made personal copies of the old encrypted data, which are not under the control of the original data owner. Concerning the second type of attacks, CryptStore uses the HMAC message authentication algorithm with the SHA-1 hashing algorithms. Recent cryptographical attacks on SHA-1 [98] may make it necessary to replace it by a more secure hashing algorithm. This also means that all existing message authentication codes will have to be recalculated. Since all communications between user clients and key-servers are protected with SSL (using the OpenSSL library), the third type of attack requires to attack either the SSL protocol, or an algorithm used within this protocol or the implementation of the OpenSSL library. The fourth type of attacks are not really attacks on CryptStore. They are directed at the access control system (or the authentication system used by the access control system) that governs access to key-shares. If Sygn is used for CryptStore key-share access control, the information that has to be protected in order to prevent impersonation are the users private keys. In the fifth type of attacks, attackers set up key-servers themselves and try trick users in storing key-shares on these servers. If an attacker is able to set up enough Byzantine key-servers, he may get access to a sufficient number of key-shares in order to reconstruct the decryption key for an encrypted file. Furthermore the Byzantine key-server could provide false key-shares to 128 CHAPTER 7. CRYPTSTORE ENCRYPTED STORAGE requesting users, in order to deny access to decryption keys. In order to prevent such attacks, measures must be taken to ensure that entities providing key-servers are trustworthy. Such measures can include the registration of the key-server and of the entity that runs it in a list of trustworthy services. The sixth type of attacks on CryptStore is the most dangerous, since the key-servers store a great amount of security critical information. In the current version of CryptStore, the key-server communicates with a MySQL database using an unprotected channel. Therefore it is important to keep the database on the same machine as the key-server. We plan to modify the database interface in future version in order to allow the use of the SSL protocol to protect communications between the key-server and the database. As the key-server needs to know the database password in order to access its records, the account on which the key-server is installed needs to be protected against unauthorized access. If an attack is successful, all key-shares stored on this server need to be updated, furthermore all other key-servers need to update the key-shares belonging to the same key as the compromised key-shares. The secret sharing mechanism used by CryptStore provides some protection if a CryptStore key-server is hacked, since the attackers need to break into several key-servers in order to gain access to enough key-shares to allow them to reconstruct decryption keys. Finally the key-server handles key-shares in cleartext during its operations. This means that a key may be swapped onto disk due to the internal memory management or that it may be written in a core dump, if CryptStore crashes. These problems can be prevented by turning off swap or using an encrypted swap area and setting the maximal coredump size to zero. The seventh type of attacks consists of providing users or key-servers with maliciously modified versions of the CryptStore software, that will leave a backdoor or leak information to attackers. Such attacks can be prevented, if the CryptStore code is signed and no unverified versions of the code are installed. The last type of attacks are those on the CryptStore user client. They are similar to the attacks on the CryptStore key-server, since the user client also uses a MySQL database to store copies of all encryption keys for the user’s encrypted files and handles keys in memory. Therefore the same protection measures have to be taken to reduce the risks of attacks on the user client. 7.6. DISCUSSION 7.6 129 Discussion In this section we discuss the algorithmic and architectural choices within CryptStore. Since we are interested in medical data, which can include radiological, sonographical or computer tomograph pictures in addition to simple text, we had to take into account the possibility that the files in question can become quite large. We therefore had to weight the encryption speed of stream ciphers versus the random access capabilities and the possibility to re-encrypt under the same key offered by block ciphers. Another important point was the possibility of an encryption mode that does not change the size of the data. This final property is inherent to stream ciphers since they process streams of bits (or bytes in the case of the RC4 algorithm) one by one. For block ciphers the property of keeping the plaintext size can be obtained by using the ciphertext stealing scheme (and omitting base 64 encoding of the plaintext which is quite common otherwise). Furthermore we chose against using the CFB block cipher mode that effectively turns the block cipher in a stream cipher for the obvious reason that if we wanted to use a stream cipher we might as well take an efficient one (which is not the case of block ciphers used in CFB mode for byte-by-byte encryption). The final point that made us choose a block cipher in CBC mode other than stream ciphers was the capability to re-encrypt an updated file using the same key. This way a user having write access to the file can update its contents, re-encrypt it and write it back to the Grid storage without having to change the key shares on the key servers. The decision to store the meta-data in the file header was made in order to make it possible to handle the access to encrypted files in the same way as normal files from the point of view of a Grid storage resource. We are aware that this means that we do change the size of the data, which may be a problem if the original data was stored in a database table with a fixed size of table cells and the encrypted data needs to be written back to the same database table. Extending the current CryptStore design to allow an external management of the encryption meta-data would not be a major problem, since most Grid architecture can keep meta-data about the Grid files anyway, which could be used to store our file encryption meta-data additionally. The decision to use Shamir’s secret sharing scheme for the key storage on the key servers was made in the general spirit of this work to avoid single points of attack and trusted third parties. As we have pointed out in the presentation of the basic concepts of CryptStore, a partially trusted third party is inevitable, if we want to efficiently manage key accesses for dynamic 130 CHAPTER 7. CRYPTSTORE ENCRYPTED STORAGE groups of users and sets of data resources. In order to reduce the impact of a successful attack on one key server we have therefore chosen not to entrust them with the entire key information. Due to the nature of the secret sharing scheme we gain additional benefits: CryptStore becomes robust against break downs of single key servers, if the user created a redundant number of shares. Furthermore the storage of the key shares on the key servers provides an backup that can be used for emergency access if the user looses his decryption key. A topic that has to be considered was raised in chapter 5: The handling of re-encryption, when permissions of users that had access to decryption keys are revoked. Since we can not control the environment on the user machines, we can never prevent a user who had access to a file from making copies of it and spreading them to unauthorized users. Therefore we advocate the use of the lazy re-encryption scheme, in which a file is only re-encrypted with a different key after a permission change, when the file content has changed. Another choice that had to be made in the design of CryptStore was whom to put in charge of the decryption of an encrypted file. Setting up a decryption service that performs decryption for the user would have solved the revocation problem, since users would not have access to the decryption keys. The drawback of such a solution would have been to introduce a single point of attack and a trusted third party into the system. Therefore we decided to leave the responsibility of decryption to the machine of the end user of the data, where it can be handled by the CryptStore user-client. The most important concept of CryptStore is the interface to an access control mechanism. The rationale behind this is to keep access permissions to files consistent with access permissions to keys that decrypt these files. We have therefore chosen to avoid a duplicate access control layer and to make it possible to use the full power of the access control service that is available on the Grid architecture. This approach only works if the file owners are the actual sources of authority for access control decision concerning their files and not the local storage sites (as it is the case with the VOMS access control architecture for example), but since empowering the owner with access control over his files is one of the requirements that we defend in this thesis, we believe that this fits together smoothly. As the granularity of protection for CryptStore is files (and not dynamically generated views on data), the concerns raised in [18] about the absence of a bijection between encryption and access rights do not apply to our approach. Chapter 8 Sygn and CryptStore in a Grid n this chapter we discuss the aspects concerning the integration of our proposals in a Grid architecture. We consider two Grid architectures for our specific examples: µgrid a minimal Grid architecture1 [88] and the OGSA/WSRF standardized Grid architecture provided by the Globus Toolkit version 4. 8.1 µgrid The µgrid architecture was designed and implemented as a minimal Grid architecture in order to test and implement scientific Grid based applications without having to install, configure and administrate a production Grid architecture. It is therefore small, easy to install and to run and it depends on very few software libraries. The µgrid consists of three software components, the client software that allows users to access to the Grid, the farm manager, the current Grid entry point of µgrid, that groups resources together and manages the scheduling of jobs, the resource assignment and the data management. Finally computers providing resources run the third component the host manager, which manages the computing jobs and the storage of data. All the communication is done through simple sockets, using a client/server architecture. With this architecture, a transparent sharing of resources is possible. Due to its simple design and interfaces µgrid is easy to install, configure and administrate. Possible file operations are to copy a file from a local disk to the Grid, to replicate a file on the Grid, to copy a file from the Grid to a 1 mugrid was developed by Johan Montagnat from the CREATIS laboratory at INSA Lyon (now I3S at the University of Nice) and Diane Lingrant from the I3S laboratory at the University of Nice. It was created in the context of the French ministry of research project MEDIGRID. 131 132 CHAPTER 8. SYGN AND CRYPTSTORE IN A GRID local disk and to delete a file on the Grid. A C++ API allows to use these file manipulation commands within jobs processed on the Grid. Authentication is implemented using OpenSSL and a PKI. Each user, farm and host has its own certificate allowing mutual authentication. µgrid assumes a single root certificate authority in its actual version. The current design of µgrid has limited scalability since the farm manager quickly becomes a bottleneck when it is assigned too many resources. Therefore the extension of µgrid is planned by adding a layer of servers above the farm manager that will also be the new Grid entry points. 8.2 OGSA/WSRF standardized Grids The Open Grid Services Architecture (OGSA) [49] is a standard developed by the Global Grid Forum (GGF) 2 . OGSA aims at defining a common, standard open architecture for grid-based applications. OGSA is service oriented, and requires a distributed computing middleware, that supports stateful services (i.e. in which services can store information of previous sessions, from one invocation to another). Web services3 have been chosen as an architecture for implementing these Grid services following the requirements of OGSA. The use of Web services requires three major components: A discovery service, that is used to locate existing services, the Web Service Description Language (WSDL) [24] which is an XML based language used to describe the interfaces of Web services in a standardized way, and a protocol to exchange Web service requests and responses. The most frequently used protocol for Web service communication is SOAP [19], which is a protocol that enables to exchange XML encoded messages using the HTTP communication protocol. Figure 8.1 shows a typical Web service invocation, using the Web Service Description Language (WSDL) to define and publish the Web service interfaces and the SOAP protocol to exchange messages. Web services as defined by the W3C are stateless, and thus pure Web services are not sufficient for the requirements of the OGSA specification. Therefore the Web Services Resource Framework (WSRF) was developed by the OASIS consortium. WSRF specifies how Web services can be made stateful. Figure 8.24 illustrates the relationships between OGSA, WSRF, and Web services. 2 www.ggf.org http://www.w3.org/2002/ws 4 Figure inspired by http://gdp.globus.org/gt4-tutorial 3 8.2. OGSA/WSRF STANDARDIZED GRIDS Discovery Service Client 133 Web Service (CryptStore key server) 1. Where can I find a CryptStore key server? 2. At this address: URL 3. How can I invoke the key server? 4. I have the following interfaces: WSDL 5. SOAP Service request: store key share 6. SOAP Service response: storage successful Figure 8.1: A typical Web service invocation. OGSA WSRF requires specifies Stateful Web services extends Web services Figure 8.2: The relationship between OGSA, WSRF and Web services. 134 CHAPTER 8. SYGN AND CRYPTSTORE IN A GRID In [76] a strategy for addressing security within the Open Grid Services Architecture (OGSA) is proposed. It describes a set of security components that need to be realized in the OGSA security architecture and presents a set of use cases that show the interactions of these components in a secure Grid environment. This strategy defines three challenges that have to be addressed in the realization of a Grid security architecture: • The integration of heterogeneous, local security solutions. As it will be impossible to enforce the use of a single security solution, the Grid security architecture needs to be generic and extensible, so that it can be instantiated with any existing security mechanisms. As Sygn is designed to be deployed locally with the resources it controls, it allows the use of different local security solutions at other Grid resource sites. CryptStore is designed in the same spirit and allows to use different access control mechanisms to control access to decryption keys. • The interactions between those local security solutions. As services may span across multiple domains using different security technologies, the Grid security architecture needs to provide solutions, that allow these security technologies to interact. Therefore a common message exchange protocol is needed (SOAP over HTTP is proposed as an example). Furthermore a common method that allows to communicate and negotiate security policies and finally a common way of mapping a user identity from one domain to another has to be specified. To allow interaction of Sygn with other security solutions, the Sygn permission language would have to be adapted to conform to a standard such as SAML. This would require an extension of SAML in order to support Sygn’s delegation mechanisms. Furthermore the Sygn and CryptStore interfaces would have to be adapted to enable them to interpret SAML assertions. • Trust relationship management. The main problems in trust relationship management is that end users will use the Grid to perform requestspecific tasks, possibly executing their own code on some distant Grid machines. Classical security questions such as authentication and authorization need to be answered in the new context of processes executing such user-created code. The necessity of delegation of rights to allow such processes to execute tasks on a user’s behalf is also specifically mentioned. 8.2. OGSA/WSRF STANDARDIZED GRIDS 135 Sygn’s delegation mechanisms allow to provide user-created code with the necessary permissions it needs for its execution. To achieve a smooth integration it would be useful to integrate Sygn access control in a Grid resource access API, as it was done for file access in µgrid. Access control is addressed very briefly in the OGSA security architecture (using the term Authorization Enforcement). The authors conjecture that every domain will typically have its own authorization service, using different access control models such as DAC and RBAC. Therefore the Grid authorization model needs to be based on upcoming standards such as XACML, SAML and WS-Authorization to allow interaction and mapping between different access control services (see sections 4.4.2 for more information on XACML and 4.5.1 for more information on SAML). WS-Authorization is currently not even available as draft, however the Web Services security roadmap [31] specifies that this standard will define how access policies for a Web service are specified and managed, especially how permissions may be expressed in certificates and how they are to be interpreted at the service end-points. Proposals dealing with access control models for Web services such as [10] suggest that SAML and XACML will be the basis for implementing WS-Authorization. The OGSA security architecture is mainly focused on standardization and interoperation of heterogeneous security services. As this is out of the scope of the work presented within this thesis, the impact on our work is relatively low. However the few points of the OGSA security architecture regarding the requirements for authorization are worth keeping in mind: These are the requirement to be able to map different access control policies to each other and the requirement to support delegation of rights in order to give permissions to a process acting on behalf of a user. First of all, in Sygn, a mapping of different access control policies can be realized, by using the concept of hierarchical roles. A role A from one domain can be mapped onto a role B of another domain by making role B hierarchically inferior to role A (reminder: this means that any entity which can activate role A can also activate role B). An equivalence between both roles can also be defined, by making them mutually hierarchically inferior to one another (i.e. every entity that can activate role B can also activate role A and vice-versa). Second, Sygn supports delegation mechanisms that allow to give authorizations to a process acting on a user’s behalf. As Sygn permissions are bound to public keys, the problem of authenticating such a process can be solved, using classical public key authentication mechanisms. 136 8.3 CHAPTER 8. SYGN AND CRYPTSTORE IN A GRID Integrating Sygn in a Grid In order to keep the architecture of Sygn independent of the underlying Grid architecture, thus making Sygn more portable, we have chosen the following approach: The policy enforcement point (PEP) acts as an agent between the user client and the resource. The Grid user client has to be modified in order to attach the Sygn request to the Grid request the user issues. When a request arrives at the PEP, it strips the Sygn request off and passes it to the PDP for checking. If the PDP’s response is positive, the PEP has to make sure that the Sygn request was issued by the same entity that issued the Grid request. For this, the PEP has to interact with the Grid authentication mechanism in order to get the authenticated user’s public key. If this check is positive the PEP has to verify that the Grid request matches the Sygn request. This requires the resources that are the objects of both requests are the same as well as the actions requested on these objects. If this check succeeds, the PEP passes the Grid request to the Grid infrastructure controlling the local resource, which then returns its answer to the user. This scheme makes it possible to keep the Grid middleware that handles the resources unchanged. Only the request handling protocol needs to be changed, to integrate the PEP acting as an agent between users and the resources, when requests arrive. A PEP that integrates Sygn in the µgrid architecture has been implemented by Didier Oriol, in the course of his end-of-studies project for a Master degree at INSA-Lyon. It allows file access control using Sygn within µgrid. In order to realize the matching between the Grid request issuer and the Sygn request issuer, this PEP interacts with the OpenSSL authentication mechanism used in µgrid and extracts the public key from the certificate that was used for authentication. This has proven to be somewhat difficult, since the function that returns the public key contained in the OpenSSL X.509 certificate is not documented in the official OpenSSL manuals5 . The matching between Grid request object and Sygn request object is realized as simple string equality verification. To this end, Sygn permissions use µgrid’s logical filenames as part of the Sygn file identifiers. The actions required to execute a specific Grid request are the following: • To copy a Grid file to the user’s local disk we require him to have the read action. • In order to write a file to the Grid or to delete it, the user needs write action. If a new file is copied to the Grid in this way, Sygn automatically 5 The reference to this undocumented function was found on a Spanish discussion forum dealing with OpenSSL. 8.3. INTEGRATING SYGN IN A GRID 137 registers the user who submitted this request as the SOA for this file. Therefore this user automatically has the write permission that allows him to proceed with his request. Sygn prevents users from overwriting Grid files with new files having the same logical filename, unless the user has the write action for the overwritten Grid file. • µgrid allows users to manually replicate files between different Grid storage elements with the possibility of changing their logical file names. For this operation Sygn requires the user to have the read action for the source file and the write action on the target file, if that file already exists (i.e. if it is overwritten by this operation). The Sygn-PEP is integrated within the µgrid file manipulation API, therefore file manipulation by user created code is handled the same way as manual file access through the terminal interface of µgrid. Users are responsible for providing their own code with the necessary permissions for any file access that it will need to make. In order to implement Sygn on a OSGA standardized Grid, one has to consider if Sygn needs to be deployed as a Grid service. As Sygn is designed to run co-located with the Grid resources, it remains doubtful whether an implementation as a Grid service is necessary or if the resource can communicate with the Sygn-PEP locally. However we foresee no major problems in adding a Web service interface to Sygn. The Sygn-PDP is stateless and could therefore be implemented as a simple Web service without the need for WSRF’s extensions for stateful Web services. As all communications are already encoded in XML, we would only have to define the service description using WSDL and generate the SOAP communication protocol code. Various tools for generating the latter out of a description in WSDL exist, as for example the gSOAP Web services development toolkit6 . The use of Sygn requires some public key based authentication service. Therefore a password based authentication such as Kerberos service tickets would not be usable with Sygn. We do not think that this is a major limitation, as the Globus toolkit’s security infrastructure GSI [37] provides public key authentication based on OpenSSL, and this seems to be the approach other Grid infrastructures are taking too. Therefore similar key extraction mechanisms as the one used in the integration of µgrid could be used. Problems could arise, if the user’s authentication certificate is not directly available to the Sygn-PEP. In such cases it would be necessary to provide proof of authentication in another way (e.g. through SAML authentication assertions), that allows the Sygn-PEP to acquire the authenticated user’s public 6 Available from http://www.cs.fsu.edu/∼engelen/soap.html 138 CHAPTER 8. SYGN AND CRYPTSTORE IN A GRID key. Depending on the nature of the Grid identification mechanisms for files and hardware resources, a mapping between Sygn’s object identifiers and these Grid resource identifiers may be required in order to realize the matching between a Grid request object and the Sygn permission object. Finally the Grid requests have to be mapped on the available Sygn actions. As we have mentioned in 6.2.9 the Sygn actions are easily extensible, and therefore action identifiers to fit the categories of the Grid request actions can be easily added to the Sygn language. 8.4 Setting up CryptStore as a Grid service In order to use CryptStore in a OSGA/WSRF-standardized Grid, the keyserver needs to be implemented as a Grid service. As the key-server is stateless, it can be implemented like a normal Web service. Requests and responses are already encoded in XML, therefore only the Web service description has to be written in WSDL and the SOAP protocol code has to be generated as discussed in the previous section. As Grid security standards are still evolving relatively fast (for example the upcoming WS-Authorization standard), we have not yet added such a Web service interface to CryptStore. 8.5 Using Sygn for CryptStore access control We have implemented an interface that allows to use Sygn as access control architecture for CryptStore key shares. In this design, an instance of the SygnPDP is co-located with the CryptStore key server. The local Sygn meta-data base stores the sources of authority for the files for which the key server stores key shares. This means that if a file owner stores a key share on the key server, he is also registered as SOA of that file. Using this information, the Sygn-PDP can make access control decisions concerning the locally stored key shares. The CryptStore administrator and user interfaces encapsulate Sygn requests in CryptStore requests according to actions the user wishes to initiate. At the key server, the CryptStore access control integration module is effectively a Sygn-PEP, which performs the functions described in section 8.3. When an administrator submits a new key share for storage, his CryptStore interface automatically generates an administrative Sygn command (see section 6.3) that tries to register him as source of authority for the file to which this share belongs. In this case the PEP only checks if the user is 8.6. SUMMARY 139 correctly authenticated and the Sygn-PDP checks if the file is not already registered for a different SOA. In order to retrieve a key share, the user must submit Sygn permissions that allow him to read or write the corresponding file (which implies that write permissions also include read permissions). To be able to update the key shares belonging to a file, the user must have write permissions on that file. This allows users who modified the file’s contents to re-encrypt it, if a lazy re-encryption scheme as described in chapter 5 is used. In this setup, CryptStore is completely independent of the storage sites where the file and its replicas are actually stored. It can rely on a local access control service in order to control the access to key shares. This makes the combination of CryptStore and Sygn extremely scalable, tolerant against breakdowns and avoids adding an unnecessary layer of access control that manages access to key shares. 8.6 Summary In this chapter we have presented our integration of Sygn access control in a working Grid architecture. We have made successful tests for the Sygn access control within µgrid, covering a representative set of allowed and denied requests. We have presented OSGA/WSRF standardized Grids and discussed how to integrate Sygn and CryptStore in such a Grid. Finally we have presented how to use Sygn for CryptStore key share access control. This has also been implemented successfully and tested with a representative set of allowed and denied requests. 140 CHAPTER 8. SYGN AND CRYPTSTORE IN A GRID Chapter 9 Conclusions and Future Works In the present thesis we have studied the use of Grid computing architectures for health-care with a focus on data security. We have shown that classical security solutions are not all directly applicable, due to the specifics of Grid computing. As the central problem for health-care applications in Grid computing is the data security we have chosen to examine the specifics of access control. Based on a set of use-cases, we have presented a list of requirements and constraints that are related to principles of good security, the nature of a Grid architecture and the specifics of the health-care application. The most important point we have found is the need for a decentralized administration of the access rights, for traceability and for encrypted storage mechanisms. The need for encrypted storage in conjunction with access control stems from the fact that encrypted storage can enforce the use of the access control mechanism, which could be circumvented otherwise by persons having physical access to the storage medium. Based on these conclusions we have examined the current state of the art regarding distributed and Grid access control and found that none of the systems currently proposed fulfills all of our requirements, even when disregarding the requirement of encrypted storage. We have then examined the state of the art in encrypted storage systems. Our special focus here on group sharing mechanisms for encrypted files, considering a dynamic evolution of group membership and file contents. Our results indicate that all current encrypted storage systems that support group sharing of encryption keys do not handle dynamic groups well. Another important point is that all of these systems have their own access control mechanism, thus creating a duplicate, possibly inconsistent layer of access control. Our first contribution, the access control system Sygn, is based on a 141 142 CHAPTER 9. CONCLUSIONS AND FUTURE WORKS decentralized permission administration. To this end it implements a concept of decentralized roles and file sets, that are based solely on certificates. Sygn permission management supports decentralized authorization management by allowing the delegation of permissions through certificate chains. Finally the goal of decentralization is also supported by minimizing access control information that needs to be stored at the decision points. Most of the access control information is provided by the users requesting an access in the form of authorization certificate chains. The decision points only need to know the source of authority for each of the local resources which they control. Not only does such decentralization enhance the scalability of the access control system, it also minimizes the impact of a successful attack on an access control decision point, since only the local resources will be exposed. Sygn also has integrated functions to allow traceability and can be configured to require non-repudiable requests which can be used as auditing evidences. By integrating traceability in the access control mechanism, Sygn allows for an easy deployment of both functionalities. Access control is a convenient point to collect audit information, as all requests to a system must pass through the access control system. Our second contribution, CryptStore, complements the access control by protecting the data resources against the circumvention of the access control system. CryptStore allows users to store their data in encrypted form and to share the decryption keys with authorized users. Due to the necessity to acquire the decryption key in order to use the data, accessing the encrypted files on the storage medium does not help an attacker to gain knowledge about the data contained in those files. In order to have consistent permissions both on the decryption keys and on the files they allow to decrypt, CryptStore uses the Grid’s file access control mechanisms to determine if a user has access to an encrypted file’s decryption key. This is achieved through a generic access control interface, that can be adapted to any access control system present on the Grid infrastructure. As keys themselves are valuable data, no key-server is given a entire copy of a key. Instead keys are split into shares, using classical secret sharing algorithms and key-shares are spread among the key-servers. Due to the possibility to create redundant key-shares, CryptStore is also robust against temporary inaccessibility or a loss of keys through failures of a key-server. As future work on Sygn we plan to integrate mechanisms for fine grained database access control with Sygn. This would allow a controlled exposure of databases on a Grid infrastructure, regardless of the underlying database 143 management system. Sygn actions would need to be extended to include typical database actions such as SELECT, INSERT, UPDATE and DELETE, based on the corresponding SQL queries. The database object identifiers that would be needed in Sygn, will include information about the database to which the permission applies, the table, the columns and possibly specific table cells, specified by regular expressions applied to their contents. The matching algorithm, verifying if a request object applies to a permission object becomes more complex in such a case, since a permission object may have many different subsets (e.g. restricted selections of table columns) that need to produce a positive match against the permission object if submitted as request object. Furthermore the implementation of database access control would open opportunities to extend Sygn’s delegation mechanisms, allowing constrained delegation as presented in [3]. This would allow a user who has been granted some permissions on a database to delegate only a subset of these permissions to another user. Following the same track we also plan an extension of Sygn to allow fine grained access control on elements of XML documents, drawing on previous proposals on that topic such as [7], [30], [51] or [75]. We could build on our previous experiences acquired during a cooperation with the Swedish research laboratories PDC and SICS, where we implemented a system for update access control on XACML policies [89]. For generic XML access control the required Sygn action set would be read, insert and update, where read means that the XML element may be read, insert specifies that new child elements may be added and update means that the contents of the element and of all its child elements may be written (this includes the permission to delete all or parts of them). In this case a new type of Sygn object would be the elements of an XML document, which can be easily identified by an XPath [26] expression. Updates of XML documents could be controlled at a very fine grained level of detail, using XML-document change detection algorithms such as [102] and matching their results against the Sygn permissions. XML document access control would also provide the opportunity to integrate constrained delegation, where a user delegates a subset of his permission to another. A challenging question that has to be answered in that context is how to determine if an XPath expression is a restriction of another one. 144 CHAPTER 9. CONCLUSIONS AND FUTURE WORKS Finally another interesting question we plan to investigate are the legal implications of Grid computing for the processing of personal data as exposed in section 3.5. The problem that would have to be addressed in this context is how to create the contractual bindings between an entity processing personal data on the Grid and the Grid resource providers. An interesting approach could be the implementation of automatic, ad-hoc contract negotiation mechanisms, based on pre-defined security requirements. In such a system a trusted third party could be used to certify that a resource provider complies with certain security requirements. Software agents would match the requirements of the user versus the services provided at the available resources and choose the appropriate resource providers and conclude the processing contract on the user’s behalf. As we have already mentioned it remains to be seen if such automatically concluded contracts can be legally binding. In this question we can cooperate with legal experts to find the legal requirements and to validate that our proposed technical solutions comply with these requirements. Appendix A XML Schema for the Sygn language <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <!-- Definition of the subject identifiers --> <xs:element name="SID" abstract="true" /> <!-- Definition of the ANYSID identifier --> <xs:element name="ANY_SID" substitutionGroup="SID"/> <!-- Definition of the user identifiers UID --> <xs:complexType name="uid"> <xs:simpleContent> <xs:extension base="xs:string"/> </xs:simpleContent> </xs:complexType> <xs:element name="USER_ID" type="uid" substitutionGroup="SID"/> <!--Definistion of the role SOA type --> <xs:complexType name="rsoa"> <xs:sequence> <xs:element ref="USER_ID"/> </xs:sequence> </xs:complexType> 145 146 APPENDIX A. XML SCHEMA FOR THE SYGN LANGUAGE <!-- Definition of the role (object) identifiers RID --> <xs:complexType name="rid"> <xs:sequence> <xs:element name="ROLE_SOA" type="rsoa"/> <xs:element name="ROLE_NAME" type="xs:string"/> <xs:element name="REVIEW_REPOSITORY" type="xs:string"/> </xs:sequence> </xs:complexType> <xs:element name="ROLE_ID" type="rid" substitutionGroup="SID"/> <!-- Definition of preOID (help construct, OID without RID) --> <xs:element name="preOID" abstract="true" /> <!-- Definintion of Capability objects (help construct --> <!-- to simulate multiple inheritance) --> <xs:complexType name="OID"> <xs:choice> <xs:element ref="preOID"/> <xs:element ref="ROLE_ID"/> </xs:choice> </xs:complexType> <!--Definistion of the object SOA type --> <xs:complexType name="osoa"> <xs:sequence> <xs:element ref="SID"/> </xs:sequence> </xs:complexType> <!-- Definition of the file identificers FID --> <xs:complexType name="fid"> <xs:sequence> <xs:element name="FILE_SOA" type="osoa"/> <xs:element name="LOGICAL_FILENAME" type="xs:string"/> </xs:sequence> </xs:complexType> <xs:element name="UNIQUE_FILE_ID" type="fid" substitutionGroup="preOID"/> 147 <!-- Definition of the file set identificers FSID --> <xs:complexType name="fsid"> <xs:sequence> <xs:element name="SET_SOA" type="osoa"/> <xs:element name="SET_NAME" type="xs:string"/> </xs:sequence> </xs:complexType> <xs:element name="FILE_SET_ID" type="fsid" substitutionGroup="preOID"/> <!-- Definition of the Resource Identifiers RESID --> <xs:complexType name="resid"> <xs:sequence> <xs:element name="RESOURCE_SOA" type="osoa"/> <xs:element name="RESOURCE_NAME" type="xs:string"/> </xs:sequence> </xs:complexType> <xs:element name="RESOURCE_ID" type="resid" substitutionGroup="preOID"/> <!-- Definition of action names --> <xs:simpleType name="actionType"> <xs:restriction base="xs:string"> <xs:whiteSpace value="collapse"/> <xs:enumeration value="read"/> <xs:enumeration value="write"/> <xs:enumeration value="activate"/> <xs:enumeration value="add_to_set"/> <xs:enumeration value="remove_from_set"/> <xs:enumeration value="grant"/> <xs:enumeration value="use"/> </xs:restriction> </xs:simpleType> <!-- Definition of the Actions --> <xs:complexType name="action"> <xs:simpleContent> <xs:extension base="actionType"> <xs:attribute name="SIZE" type="xs:positiveInteger" /> 148 APPENDIX A. XML SCHEMA FOR THE SYGN LANGUAGE </xs:extension> </xs:simpleContent> </xs:complexType> <xs:element name="ACTION" type="action"/> <!-- Definition of Capability set objects (help construct) --> <xs:complexType name="capset"> <xs:sequence> <xs:element ref="FILE_SET_ID"/> </xs:sequence> </xs:complexType> <!-- Definition of the Capabilities --> <xs:complexType name="cap"> <xs:sequence> <xs:element name="CAPABILITY_ID" type="xs:string"/> <xs:element name="OBJECT" type="OID"/> <xs:element ref="ACTION"/> <xs:element name="SECOND_OBJECT" type="capset" maxOccurs="1" minOccurs="0" /> </xs:sequence> </xs:complexType> <xs:element name="CAPABILITY" type="cap"/> <!-- Definition of AC Creator (help construct) --> <xs:complexType name="accreator"> <xs:sequence> <xs:element ref="USER_ID"/> </xs:sequence> </xs:complexType> <!-- Definition of AC Owner (help construct) --> <xs:complexType name="acowner"> <xs:sequence> <xs:element ref="SID"/> </xs:sequence> </xs:complexType> 149 <!-- Definition of AC restrictions (help construct) --> <xs:complexType name="restrictions"> <xs:sequence> <xs:element ref="ROLE_ID" maxOccurs="5" minOccurs="1"/> </xs:sequence> </xs:complexType> <!-- Definition of the Authorization Certificates AC --> <xs:complexType name="ac"> <xs:sequence> <xs:element name="ID" type="xs:string"/> <xs:element name="CREATOR" type="accreator"/> <xs:element name="OWNER" type="acowner"/> <xs:element ref="CAPABILITY"/> <xs:element name="NOT_BEFORE" type="xs:string"/> <xs:element name="NOT_AFTER" type="xs:string"/> <xs:element name="NOT_WITH" type="restrictions" maxOccurs="1" minOccurs="0"/> <xs:element name="DELEGATIONS" type="xs:nonNegativeInteger"/> <xs:element name="SIGNATURE" type="xs:string"/> </xs:sequence> </xs:complexType> <xs:element name="AUTHORIZATION_CERTIFICATE" type="ac"/> <!-- Definition of the Sygn command names (help construct) --> <xs:simpleType name="commandType"> <xs:restriction base="xs:string"> <xs:enumeration value="add_file_soa"/> <xs:enumeration value="delete_file_soa"/> <xs:enumeration value="revoke_certificate"/> <xs:enumeration value="clean_revoked_table"/> <xs:enumeration value="blacklist"/> <xs:enumeration value="unblacklist"/> <xs:enumeration value="register_resource"/> <xs:enumeration value="unregister_resource"/> <xs:enumeration value="log_resource_use"/> <xs:enumeration value="get_metadata"/> </xs:restriction> </xs:simpleType> 150 APPENDIX A. XML SCHEMA FOR THE SYGN LANGUAGE <!-- Definition of Sygn command parameters (help construct) --> <xs:complexType name="parameter"> <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute name="NR" type="xs:positiveInteger" /> </xs:extension> </xs:simpleContent> </xs:complexType> <!-- Definition of the Sygn Commands --> <xs:complexType name="command"> <xs:sequence> <xs:element name="COMMAND_NAME" type="commandType"/> <xs:element name="PARAMETER" type="parameter" maxOccurs="3" minOccurs="0"/> </xs:sequence> </xs:complexType> <xs:element name="SYGN_COMMAND" type="command"/> <!-- Definition of the path targets (help construct) --> <xs:complexType name="pathtarget"> <xs:sequence> <xs:element ref="CAPABILITY"/> </xs:sequence> </xs:complexType> <!-- Definition of the Certificates Paths --> <xs:complexType name="path"> <xs:sequence> <xs:element name="TARGET" type="pathtarget" maxOccurs="1" minOccurs="0"/> <xs:element ref="AUTHORIZATION_CERTIFICATE" maxOccurs="10" minOccurs="0"/> <xs:element ref="SYGN_COMMAND" maxOccurs="1" minOccurs="0"/> </xs:sequence> <xs:attribute name="NUMBER" type="xs:nonNegativeInteger"/> </xs:complexType> 151 <xs:element name="PATH" type="path"/> <!-- Definition of the request issuer (help construct) --> <xs:complexType name="reqIssuer"> <xs:sequence> <xs:element ref="USER_ID"/> </xs:sequence> </xs:complexType> <!-- Definition of the request signature (help construct) --> <xs:group name="surfsignature"> <xs:sequence> <xs:element name="ISSUE_TIME" type="xs:string"/> <xs:element name="ISSUERS_SIGNATURE" type="xs:string"/> </xs:sequence> </xs:group> <!-- Definition of pathes for request (help construct) --> <xs:complexType name="pathes"> <xs:sequence> <xs:element ref="PATH" maxOccurs="5" minOccurs="1"/> </xs:sequence> </xs:complexType> <!-- Definition of the standard user request format SURF --> <xs:complexType name="surf"> <xs:sequence> <xs:element name="REQ_ISSUER" type="reqIssuer"/> <xs:group ref="surfsignature" maxOccurs="1" minOccurs="0"/> <xs:element name="REQ_PATH" type="pathes"/> </xs:sequence> </xs:complexType> <xs:element name="SURF" type="surf"/> <!-- Definition of response values (help construct) --> <xs:simpleType name="gdf"> <xs:restriction base="xs:string"> <xs:enumeration value="granted"/> <xs:enumeration value="denied"/> <xs:enumeration value="failed"/> 152 APPENDIX A. XML SCHEMA FOR THE SYGN LANGUAGE </xs:restriction> </xs:simpleType> <!-- Definition of path responses (help construct) --> <xs:complexType name="PathResponse"> <xs:sequence> <xs:element name="STATUS" type="gdf" /> <xs:element name="ERROR" type="xs:string" /> </xs:sequence> <xs:attribute name="NR" type="xs:integer" use="required"/> </xs:complexType> <!-- Definition of request responses --> <xs:complexType name="AdfResponse"> <xs:sequence> <xs:element name="REQUEST_STATUS" type="gdf"/> <xs:element name="GLOBAL_ERROR" type="xs:string"/> <xs:element name="PATH" type="PathResponse" maxOccurs="5" minOccurs="1"/> </xs:sequence> </xs:complexType> <xs:element name="SYGN_RESPONSE" type="AdfResponse"/> </xs:schema> Appendix B XML Schema for CryptStore <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <!-- Definition of the requests --> <xs:simpleType name="req_type"> <xs:restriction base="xs:string"> <xs:whiteSpace value="collapse"/> <xs:enumeration value="store_keyshare"/> <xs:enumeration value="delete_keyshare"/> <xs:enumeration value="retrieve_keyshare"/> </xs:restriction> </xs:simpleType> <xs:complexType name="cs_req"> <xs:sequence> <xs:element name="request_type" type="req_type"/> <xs:element name="lfn" type="xs:string"/> <xs:element name="keyshare" type="xs:string" minOccurs="0" maxOccurs="1"/> </xs:sequence> </xs:complexType> <xs:element name="cryptstore_request" type="cs_req"/> <!-- Definition <xs:complexType <xs:sequence> <xs:element <xs:element of the encryption parameters --> name="crypt_par"> name="ALGORITHM" type="xs:string"/> name="KEYSIZE" type="xs:positiveInteger"/> 153 154 APPENDIX B. XML SCHEMA FOR CRYPTSTORE <xs:element name="IV" type="xs:string"/> </xs:sequence> </xs:complexType> <xs:element name="ENCRYPTION_PARAMETERS" type="crypt_par"/> <!-- Definition of file digest --> <xs:complexType name="digest_info"> <xs:sequence> <xs:element name="ALGORITHM" type="xs:string"/> <xs:element name="DIGEST_VALUE" type="xs:string"/> </xs:sequence> </xs:complexType> <xs:element name="MAC" type="digest_info"/> <!-- Definition of the key recovery information --> <xs:complexType name="keyshare_info"> <xs:sequence> <xs:element name="THRESHOLD" type="xs:positiveInteger"/> <xs:element name="KEYSHARE_SERVER" type="xs:string" minOccurs="1" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:element name="KEYSHARING_INFORMATION" type="keyshare_info"/> <!-- Definition of the file header --> <xs:complexType name="CS_file_header"> <xs:sequence> <xs:element ref="ENCRYPTION_PARAMETERS"/> <xs:element ref="MAC_DIGEST" minOccurs="0" maxOccurs="1"/> <xs:element ref="KEYSHARING_INFORMATION"/> </xs:sequence> </xs:complexType> <xs:element name="CRYPTSTORE_METADATA" type="CS_file_header"/> </xs:schema> Appendix C Sygn permission creation GUI In order to help users in the creation of Sygn permissions, a graphical user interface for handling permission creation and storage was designed and implemented under my supervision by Dan Hididis in the course of his third year project of his Master studies at INSA-Lyon. The user interface supports the creation the following Sygn certificate components: • Creation of RSA key-pairs for use as user identifiers (UID). • Creation of role identifiers (RID) and role object identifiers (ROID). • Creation of file identifiers including the optional generation of the logical filename by applying a SHA-1 hash to the file content. • Creation of file set identifiers (FSID). • Creation of capabilities • Creation of authorization certificates (AC). The Sygn actions supported by this interface are configurable through a parameter file, and thus easily extensible. All certificate components can be saved in a MySQL database which can be queried using the interface. To help users referencing user identifers and authorization certificates, aliases can be assigned to both. As the interface is designed to run on lightweight computing devices, all cryptographical primitives are executed remotely on a trusted machine that has the necessary libraries installed. The interface connects to the remote machine through an SSL secured Web service interface, using the SOAP protocol. In order to be portable, the interface itself in implemented entirely in 155 156 APPENDIX C. SYGN PERMISSION CREATION GUI Java, whereas the Web service in written in C++ and its Web service protocol code was generated by the gSOAP Web services development toolkit1 . Figure C.1 show the user interface during the creation of an authorization certificate. 1 Available from http://www.cs.fsu.edu/∼engelen/soap.html Figure C.1: A graphical user interface for the creation of Sygn Authorization Certificates. 157 158 APPENDIX C. SYGN PERMISSION CREATION GUI Glossary AA See Attribute authority. AAA Authentication, Authorization, and Accounting. Access Control The process of verifying and enforcing authorizations. Accounting Extension of Auditing. Gathering measurements on resource use, possibly for billing. AC Authorization Certificate or Attribute Certificate ACL Access Control List. A representation of permissions under the Discretionary Access Control model. Activation Used in the context of role based access control as for example in the expression ”activation of a role”. Contrary to groups, roles are not constantly active, therefore in order to use the permissions of a role, the user has to activate it. This allows one to separate duties and to use the least privileges. Ad-hoc On demand, spontaneously. Used in conjunction with granting of permissions to refer to permissions granted at short notice, typically having a short lifetime. AES Advanced Encryption Standard. Block cipher algorithm chosen by NIST as US-standard in 2000. Agent message sequence Message sequence in AAA systems, where the AAA service works as agent between the users and the resource. Akenti Access control system developed at the Distributed Systems Department of the Lawrence Berkeley Laboratory, in the USA. Anonymization Removal of all information from a piece of personal data that allows the identification of the person concerned by this data. 159 160 GLOSSARY ARC4 Alleged RC4 algorithm. An unofficially published version of the RC4 stream cipher. ASN.1 Abstract Syntax Notation 1. Binary data encoding scheme created by the International Telecommunication Union. Assertion Equivalent to certification. Declaration of (security relevant) facts about a subject issued by a specific entity. Asymmetric cryptography Also known as public-key cryptography. In asymmetric cryptography the en-/decryption algorithm uses a different key for encryption and decryption. The decryption key can not (feasibly) be derived from the encryption key. The most well known asymmetric cryptosystem is RSA. Attribute A property assigned to a subject. For example a group membership or the permission to activate a role. Attribute authority An entity that is trusted to issue certain attributes to subjects. Attribute certificate A certificate in which an attribute is assigned to a certain subject. Auditing The process of analyzing the events that occurred on a certain system by reviewing log-data. Authentication The procedure by which a subject can prove a claimed identity. Authorization All activities that deal with the question who may access which resource in what way. Authorization action The operation that is requested or granted on a resource in an authorization procedure. Authorization object The resource targeted by an authorization. Authorization subject The subject (i.e. person or process) to which an authorization applies. Block ciphers Class of en-/decryption algorithms that transform fixed blocks of data using a secret key. GLOSSARY 161 Block cipher modes Execution mode of a block cipher to sequentially en/decrypt multiple blocks of data with the same key. Examples are the Electronic Code Book mode (ECB), the Cipher Block Chaining mode (CBC) or the Cipher Feedback mode (CFB). Blowfish A block cipher created by Bruce Schneier. Capability Combination of an authorization object and an authorization action. First defined by discretionary access control models (DAC). Cardea Access control system developed at the NASA Advanced Supercomputing (NAS) Division of the NASA Ames Research Center, USA. CAP Abbreviation for capability in Sygn. CAS Community Authorization Server. Authorization server developed by the Globus Alliance for use with the Grid infrastructure Globus Toolkit. CBC Cipher Block Chaining. Block cipher mode of operation, that aims to hide patterns in different blocks of ciphertext. Cepheus Encrypted storage system developed at the MIT. Certificate Digital document that specifies certain properties about a subject (the owner). Examples for such properties are: a public key to be used for authentication, attributes assigned to the owner or authorizations given to the owner. Certificates have a creator, who signs them using a digital signature, and usually specify a validity period. Certificate Chain Equivalent to certificate path. An ordered set of certificates, through which an authorization or authentication is validated. Certificate Path See Certificate Chain. CFB Cipher Feedback mode. Block cipher mode of operation, that allows to use a block cipher as a (not very efficient) stream cipher. CFS Cryptographic File System. A secure storage system developed at the AT&T Bell Laboratories. CNIL French National Commission for Liberties and Informatics. Appointed by law to deal with privacy issues in computerized data processing. 162 GLOSSARY Community Term used as a synonym of Virtual Organization (VO) or a part of a VO in the CAS system. Designs a (possibly) crossorganizational interest group sharing resources on a Grid. Confidentiality Prevention of the disclosure of sensitive data. Often uses encryption methods to achieve its goals. Creator Signer of an authorization certificate in Sygn. CRL Certificate revocation list. Lists certificates that have been invalidated before their expiration date for security reasons. Such lists need to be consulted to determine the validity of a certificate, if a revocation scheme exists for those certificates. CryptFS Encrypted storage system created at the Computer Science Department of the Columbia University, USA. Crypto++ Object-oriented C++ library of cryptographical functions. Available from http://www.cryptopp.com. Cryptographic hash A cryptographical function that maps a variable size digital document to a fixed size hash value. Often used for integrity protection in digital signatures and message authentication codes. Cryptographical tokens Computer hardware implementing some cryptographical function. For example smartcards that handle public/private key cryptosystems. CryptStore Cryptographical storage system proposed in this thesis. C-SDA Chip-Secured Data Access. Cryptographical storage system developed at the PRISM laboratory of the University of Versailles in France. CTS Ciphertext stealing, a variant of ECB and CBC block cipher modes. Normally the last block of a plaintext that is encrypted using a block cipher is extended to match the cipher’s block size. In some cases this may be undesirable. CTS allows to keep the encrypted text the same size than the plaintext. DAC See Discretionary Access Control. DataGrid European Grid research project. IST-2000-25182. Started 2001 and ended 2004. GLOSSARY 163 Delegation The act of transferring authorizations or the power to issue authorizations from one entity to another. Denial-of-service A type of attack against a computer system that aims to disrupt the availability of the system. DES Data Encryption Standard, a block cipher algorithm. No longer considered secure due to the short length of the key (56 bits). Has been progressively replaced by AES since 2000. DESX Variant of the DES block cipher algorithm that increases the length of the key. Digital signature Method to make changes in a digital document detectable. Uses public key cryptography. Directive 95/46 EC European Union directive on the protection of individuals with regard to the processing of personal data. Discretionary Access Control Access control model that uses different representations of an access control matrix to represent access permissions. The matrix is composed of a row per user and a column per resource; a matrix cell contains the rights of the corresponding user with the corresponding resource. DRM Digital Rights Management. Branch of access control dealing with the usage of digital media (audio, video) under the aspects of copyright protection. DSD Dynamic Separation of Duties. RBAC-related concept. Equivalent to a Separation of Duties enforced at runtime. ECB Electronic Code Book. Block cipher mode that encrypts all cleartext blocks sequentially. EGEE Enabling Grid for EsciencE in Europe. European Grid research project. Follow-up project to DataGrid. IST-2003-508833. Started in 2004. Project homepage: http://public.eu-egee.org. Entity A user or a process acting on a user’s behalf or an automated process acting on the Grid. Escrow In cryptography the act of depositing a copy of an encryption key at a trusted third party. 164 GLOSSARY FID File identifier in Sygn. FSID File set identifier in Sygn. GACL Grid Access Control List. ACL implementation for Grids described in [72]. GGF Global Grid Forum. Grid standardization body. http://www.ggf.org. Grid computing An approach to distributed computing that allows transparent sharing of heterogeneous resources. See also old wine in new bottles. GridShib A recent project to adapt the Shibboleth access control architecture to a Grid computing environment. Group server In encrypted storage, a server that handles group sharing of encrypted files. Health-care networks A network created by the interconnection of healthcare institutions (clinics, hospitals, individual doctors) with the goal of improving health-care by making medical data more available. HMAC A mechanism or message authentication using cryptographic hash functions HMAC is a FIPS standard (FIPS PUB 113). IEC International Electrical Commission. The IEC is a standards organization for all areas of electrotechnology. IEEE Institute of Electrical and Electronic Engineers. Non-profit, technical professional association. IETF Internet Engineering Task Force. Open international community of network designers, operators, vendors and researchers concerned with the evolution of the Internet architecture and the smooth operation of the Internet. Publishes RFC s. Integrity protection In cryptography, protection of a digital document against unauthorized modifications. Intrusion detection Detection of successful hacker attacks against a computer system. IPSec Internet Protocol Security, RFC 2401. A set of protocols for the secure exchange of packets at the IP layer. GLOSSARY 165 ISO International Standards Organization. ISO is a network of the national standards institutes. KeyNote A trust management system dealing with authentication and authorization. Also defines a policy specification language. Key server In cryptographic storage, a server that stores decryption key material. LDAP Lightweight Directory Access Protocol, RFC 2251. Protocol that allows querying and modifying data stored in a hierarchical distributed database on the network. Least privilege RBAC -related concept of using always the least set of privileges for performing an action. Limits the damage that can be done by faulty or malicious processes acting on a user’s behalf. LFN See Logical file name. Lockbox Concept related to cryptographic storage. Refers to the storing of a symmetric key encrypted with the public key of some user, in order to make the symmetric key accessible for the user. Logical file name Unique file name assigned to a file used to address it. MAC See Mandatory Access Control or Message Authentication Code. Mandatory Access Control Access control model that assigns different levels of security to each resource and each user. Users may read all resources that have a level equal or lower than their own and write to all resources that have a level equal or higher than their own. MDC/SHS Encryption algorithm created by Peter Gutmann for SFS. Turns a cryptographic hash function into a block cipher running in CFB mode. Message authentication code Key dependent one-way hash function. Method of protecting the integrity of a file for users sharing a common secret key. Can not be used for non-repudiation, since every having the secret key can produce a valid code. Meta-data Data associated to some other data, describing some quality of it. MySQL A database management system. http://www.mysql.org. 166 GLOSSARY Nonce A term sometimes used to design the initialization vector of certain block cipher modes. Non-repudiation Method by which the sender of some data is unable to deny the sending of that data. In cryptography, digital signatures can be used to implement non-repudiation. OASIS Organization for the Advancement of Structured Information Standards. Non-profit, international consortium that works on the development, convergence, and adoption of e-business standards. OFB Output Feedback Mode. Block cipher mode similar to CFB. Turns the block cipher into a stream cipher. OGSA Open Grid Services Architecture. A standard that aims at defining a common, open architecture for grid-based applications. OID Access control object identifier in Sygn. OpenSSL A library implementing the SSL protocol suite used for secure communication over sockets. Owner The holder of a certificate in Sygn. Path An ordered set of certificates, with the goal of proving a delegation originating from a source of authority going to a certain entity. PDP Policy decision point. The part of an access control system, that takes access control decisions. Term defined in RFC 2904 [96]. PEP Policy enforcement point. The part of an access control system, that enforces access control decisions. Term defined in RFC 2904 [96]. PERMIS A distributed access control system with a focus on RBAC, developed by the Information Systems Security Research Group of the University of Salford, UK. PKCS padding A scheme for extending a short block of cleartext to the blocksize of the RSA algorithm. PKI Public Key Infrastructure. A system where every entity can authenticate itself by using a digital certificate (and a corresponding private key), created by a certification authority. Policy In access control, a set of rules governing the access to resources. GLOSSARY 167 POSIX.1E A standards paper describing security extensions to the Portable Operating System Interface (POSIX) standardization effort. PRIMA An access control system with a focus on ad-hoc authorization, created at the Department of Computer Science of the Virginia Polytechnic Institute and State University. Proxy Certificate In Grid computing, a term referring to a short lived proxy authentication credential created from a long term authentication credential. Used by processes acting on behalf of the owner of the long term credential. Pull message sequence Message sequence in AAA systems, where the resource contacts the AAA service after receiving a request in order to get an authorization decision. Public key cryptography Synonym for Asymmetric cryptography. Push message sequence Message sequence in AAA systems, where the user contacts the AAA service in order to get an authorization decision before submitting a request to a resource. RBAC See Role Based Access Control. RC5 A stream cipher algorithm created by RSA Security Inc. Used in SSL up to version 3. RESID Hardware resource identifier in Sygn. Resources In Grids, any hardware facility (e.g. CPU’s providing computing power, hard disks providing storage space) and shared data. Restriction In Sygn: A limitation on how a permission may be used. Implements the RBAC concept of DSD. Review-repository In Sygn: A storage space where permissions assigned to a role are duplicated. Revocation The process of invalidating a certificate before its expiration date. RFC Abbreviation for Request For Comments. Format of IETF standard proposals. RID Role identifier in Sygn. 168 GLOSSARY ROID Role object identifier in Sygn. Used for roles as access control objects. Role RBAC -related concept, a named collection of permissions and possibly other roles, that are needed to perform a specific task. Role Based Access Control Access Control model, groups all permissions required to perform a specific task into a role and assigns roles to users based on the tasks they have to perform. RSA Name of the most famous asymmetric cryptography algorithm. Named after its inventors Ron Rivest, Adi Shamir and Leonard Adleman. SAML Security Assertions Markup Language. XML based language for communicating security relevant information. Created by the OASIS consortium. SEAL Stream cipher algorithm. Designed at IBM by Phil Rogaway and Don Coppersmith. Secret sharing In cryptography, algorithms that allow to distribute a secret between different parties, so that no party has access to the entire secret and several (or all) parties must collaborate to reconstruct the secret. Separation of duties RBAC concept of denying the simultaneous use of certain permissions in order to prevent a user from cumulating critical functions in a specific process. SFS Over-used acronym for cryptographic storage systems. There is a SFS (Secure FileSystem) by P. Gutmann [56], a SFS (Self-certifying File System) by D. Mazières [71] and a SFS (Secure File System) by J. P. Hughes et al. [60, 59]. SHA-1 Secure Hash Algorithm. Cryptographic hash algorithm designed to be used with the Digital Signature Standard (DSA). Shibboleth Access control system with a focus on user privacy protection, designed by the Middleware Architecture Committee for Education (MACE) of the Internet2 consortium and supported by IBM. SID Subject identifier in Sygn. SISWG Security in Storage Working Group. Group sponsored by IEEE with the goal to define standards for cryptographic algorithms and methods for encrypting data before storage. GLOSSARY 169 Smartcard A plastic card with an embedded chip that features a microprocessor and a non-volatile memory. When used for asymmetric cryptography, smartcards have the advantage of providing protection for the private key. SNAD Secure Network Attached Disks. Cryptographic storage system developed at the University of California, USA. SOA See Source of authority. SOAP Simple Object Access Protocol [19]. XML based protocol for information exchange using the HTTP protocol. Often misused for achieving firewall transversal. SOC Sygn Owner Client. Tool that is part of Sygn and allows a resource owner to create authorization certificates. Source of authority The initial person who has the authority to issue permissions on a specific resource. SPKI Simple Public Key Infrastructure. RFC 2692, 2693 [38, 39]. An architecture proposal for managing authorization through certificates. SSD Static Separation of Duties. RBAC -related concept. Equivalent to Separation of duties enforced at permission creation. SSL Secure Sockets Layer (OSI level 4). Protocol suite for ensuring communications security. Provides functionality such as mutual authentication, encryption and integrity protection. Stream cipher Class of en-/decryption algorithms that transforms a stream of bits (or bytes) using a secret key. SUC Sygn User Client. Tool that is part of Sygn and allows a user to store and retrieve his authorization certificates when needed. SURF Standard User Request Format in Sygn. The format in which requests are submitted to the Sygn-PDP. Sygn Access control system presented in this thesis. Symmetric encryption Class of cryptographic algorithms using a single secret key for encryption and decryption. Target Used in Sygn to designate the capability that a Path is intended to authorize to a user. 170 GLOSSARY TCFS Transparent Cryptographic File System. Cryptographic storage system developed at the University of Salerno in Italy. TDES Triple DES. Encryption algorithm based on the DES algorithms. Uses multiple applications of the DES algorithm with different keys in order to increase the overall key length. Threshold Term used in secret sharing. Determines the minimum number of shares needed to reconstruct the secret. Timestamp Time and data in a fixed format. Used in certificates and requests. TLS Transport Layer Security (OSI level 4). RFC 2246. Protocol standard for transport security created on the basis of SSL version 3.0. Traceability The possibility of verifying user actions (such as access to a resource) in a system through the use of some log data. Trusted third party An actor in a security relevant protocol, that holds security critical information. Tweak Name of the initialization vector in some block cipher modes. Tweakable block cipher Block cipher mode, optimized for storage security. UID User identifier in Sygn. A public key for which the user holds the corresponding private key. URL Uniform Resource Locator. Global address of resources on the World Wide Web. VO Virtual organization. Cooperation of several entities (possibly crossorganizationtial), providing and using in common a set of resources a Grid. VOMS Virtual Organization Membership Service. An authorization server created in the framework of the European DataGrid project, for which development continues in the EGEE project. W3C World Wide Web Consortium. Develops Web standards and guidelines. Web services Distributed computing technology based on the World Wide Web. Developed by the W3C. GLOSSARY 171 WinEFS Windows Encrypting File System. Cryptographical storage system incorporated in some edition of the Windows operating system. WSDL Web Services Description Language [24]. XML based language designed for the specification of the interfaces exposed by a web service. WSRF Web Services Resource Framework. Specification developed by the OASIS consortium in order to add state information to web services. The Globus Toolkit version 4 implements a Grid middleware that is compliant with WSRF. X.509 A public-key infrastructure defined by the IETF. A part of X.509 is the definition of a certificate format. Used in SSL/TLS. XACML eXtensible Access Control Markup Language. Standard proposal by the OASIS consortium, that defines a general purpose, XML based language for specifying access control policies. XML eXtensible Markup Language (XML). A simple, flexible text format for the exchange of data. Developed by the W3C. XMLSchema A language for defining the structure, content and semantics of XML documents. Developed by the W3C. XrML eXtensible rights Markup Language. Policy language based on XML, used to describe rights and conditions for using digital resources. Developed by the ContentGuard company. µgrid A minimal Grid architecture developed by J. Montagnat and D. Lingrant at the CREATIS laboratory of INSA Lyon in France. 172 GLOSSARY Bibliography [1] N. AHITUV, Y. LAPID, and S. NEUMANN. Processing Encrypted Data. Communications of the ACM, 30(90):777–780, September 1987. [2] R ALFIERI, R. CECCHINI, V. CIASCHINI, and al. VOMS, an Authorization System for Virtual Organizations. In Proceedings of the 1st European Across Grids Conference, Santiago de Compostela, Spain, February 2003. [3] O. BANDMANN, M. DAM, and B. SADIGHI FIROZABADI. Constrained Delegation. In Proceedings of 2002 IEEE Symposium on Security and Privacy, Oakland, CA, USA, May 2002. [4] D. E. BELL. A refinement of the mathematical model. Technical Report ESD-TR-278 vol. 3, The Mitre Corp., Bedford, MA, 1973. [5] D. E. BELL and L. J. LAPALUDA. Secure computer systems: Mathematical foundations. Technical Report ESD-TR-278 vol. 1, The Mitre Corp., Bedford, MA, 1973. [6] M. BELLARE, R. CANETTI, and H. KRAWCZYK. Keying Hash Functions for Message Authentication. In Advances in Cryptology - Crypto 96 Proceedings of the 16th Annual International Cryptology Conference conference., volume LNCS 1109, pages 1–15. SpringerVerlag, August 1996. [7] E. BERTINO and E. FERRARI. Secure and Selective Dissemination of XML Documents. In ACM, Transactions on Information and System Security (TISSEC), volume 5, pages 290–331. 2002. [8] E. BERTINO, E. FERRARI, and A. SQUICCIARINI. Trust Negotiations: Concepts, Systems, and Languages. CERIAS Tech Report 2004-68, Center for Education and Research in Information Assurance and Security, Purdue University, West Lafayette, IN 47907-2086, July/August 2004. 173 174 BIBLIOGRAPHY [9] E. BERTINO, P. MAZZOLENI, B. CRISPO, and al. Towards Supporting Fine-Grained Access Control for Grid Resources. In Proceedings of the 10th International Workshop on Future Trends in Distributed Computing Systems (FTDCS), pages 59–65, Suzhou, China, May 2004. [10] E. BERTINO and A. C. SQUICCIARINI. A Flexible Access Control Model for Web Services. In Proceedings of the 6th International Conference On Flexible Query Answering Systems, pages 13–16, Lyon, France, June 2004. [11] K. J. BIBA. Integrity considerations for secure computer systems. Technical Report TR-3153, The Mitre Corp., Bedford, MA, April 1976. [12] M. BLAZE. A Cryptographic File System for UNIX. In ACM Conference on Computer and Communications Security, pages 9–16, Fairfax, VA, November 1993. Association for Computing Machinery (ACM). [13] M. BLAZE. Key Management in an Encrypting File System. In Proceedings of USENIX Summer 1994 Technical Conference, Boston, MA, USA, June 1994. [14] M. BLAZE, J. FEIGENBAUM, J. IOANNIDIS, and al. The KeyNote Trust-Management System Version 2. Request For Comments (RFC) 2704, Internet Engineering Task Force (IETF), September 1999. http://www.ietf.org/rfc/rfc2704.txt (Webpage visited on 12/04/05). [15] P. BONATTI and P. SAMARATI. A Unified Framework for Regulating Access and Information Release on the Web. Journal of Computer Security, 10(3):241–272, September 2002. [16] P. BONATTI, S. DE CAPITANI DI VIMERCATI, and P. SAMARATI. An Algebra for Composing Access Control Policies. ACM Transactions on Information and System Security (TISSEC), 5(1):1– 35, February 2002. [17] L. BOUGANIM, F. D. NGOC, P. PUCHERAL, and al. Chip-secured data access: Reconciling access rights with data encryption. In Proceedings of the 29th conference on Very Large Data Bases (VLDB), pages 1133–1136, Berlin, Germany, September 2003. [18] L. BOUGANIM and P. PUCHERAL. Chip-Secured Data Access: Confidential Data on Untrusted Servers. In Proceedings of the 28th conference on Very Large Data Bases (VLDB), pages 131–142, Hong Kong, China, August 2002. BIBLIOGRAPHY 175 [19] D. BOX, D. EHNEBUSKE, G. KAKIVAYA, and al. Simple Object Access Protocol (SOAP) 1.1. W3C note, World Wide Web Consortium, May 2000. http://www.w3.org/TR/soap (Webpage visited on 16/05/05). [20] T. BRAY, J. PAOLI, C. M. SPERBERG-MCQUEEN, and al. eXtensible Markup Language (XML) 1.0. W3C recommendation, World Wide Web Consortium, 1998. http://www.w3.org/TR/REC-xml (Webpage visited on 12/04/05). [21] G. CATTANEO, G. PERSIANO, A. DEL SORBA, and al. Design and Implementation of a Transparent Cryptographic File System for UNIX. Technical report, University of Salerno, Italy, July 1997. [22] G. CATTANEO, G. PERSIANO, A. DEL SORBO, and al. The Design and Implementation of a Transparent Cryptographic File System for UNIX. In Proceedings of the UNIX Annual Technical Conference 2001, Freenix Track, Boston MA, USA, June 2001. [23] D. CHADWICK and A. OTENKO. The PERMIS X.509 Role Based Privilege Management Infrastructure. In Proceedings of the 7th ACM Symposium on Access Control Models and Technologies, pages 135–140, Monterey, CA, USA, June 2002. [24] E. CHRISTENSEN, F. CURBERA, G. MEREDITH, and al. Web Services Description Language (WSDL) 1.1. W3C note, World Wide Web Consortium, March 2001. http://www.w3.org/TR/wsdl (Webpage visited on 16/05/05). [25] B. CLAERHOUT and G. J. E. DE MOOR. Privacy protection for healthgrid applications. In Proceedings of the second European HealthGrid conference, Clermont-Ferrand, France, January 2004. [26] J. CLARK and S. DEROSE. XML Path Language (XPath). W3C recommendation, World Wide Web Consortium, November 1999. http://www.w3.org/TR/xpath (Webpage visited on 12/04/05). [27] PORTABLE APPLICATIONS STANDARDS COMITTEE. Portable Operating System Interface (POSIX) - Part 1: System Application Program Interface (API) - Amendment #: Protection, Audit and Control Interfaces [C Language]. Withdrawn draft, IEEE Computer Society, October 1997. http://wt.xpilot.org/publications/posix.1e/ (Webpage visited on 12/04/05). 176 BIBLIOGRAPHY [28] CONTENTGUARD. eXtensible rights Markup Language XrML 2.0 Specification. Whitepaper, ContentGuard Inc., November 2001. http://www.xrml.org/ (Webpage visited on 12/04/05). [29] COUNCIL OF EUROPE. Convention for the protection of human rights and fundamental freedoms. http://www.echr.coe.int, 4 November 1950. (Webpage visited on 12/04/05). [30] E. DAMANI, S. DE CAPITANI DI VIMERCATI, S. PARABOSCHI, and al. A Fine-Grained Access Control System. In Transactions on Information and System Security (TISSEC), volume 5, pages 169–202. ACM, 2002. [31] G. DELLA-LIBERA, B. DIXON, J. FARRELL, and al. Security in a Web Services World: A Proposed Architecture and Roadmap. Whitepaper, IBM Corporation and Microsoft Corporation, April 2002. available from http://www128.ibm.com/developerworks/webservices/library/ws-secmap (Webpage visited on 16/05/05). [32] J. DOMINGO-FERRER. A new privacy homomorphism and applications. Information Processing Letters, 60(5):277–282, December 1996. ISSN 0020-0190. [33] D. EASTLAKE, J. REAGLE, and D. SOLO. XML-Signature Syntax and Processing. W3c recommendation, World Wide Web Consortium, February 2002. http://www.w3.org/TR/xmldsig-core (Webpage visited on 12/04/05). [34] C. KENT Ed. Draft Proposal for Tweakable Narrow-block Encryption. Draft, IEEE Computer Society, August 2004. http://www.siswg.org/docs/index.html (Webpage visited on 12/04/05). [35] D. NAOR Ed. Draft Proposal for Key Backup Format for Wide-block Encryption. Draft, IEEE Computer Society, September 2004. http://www.siswg.org/docs/index.html (Webpage visited on 12/04/05). [36] S. HALEVI Ed. Draft Proposal for Tweakable Wide-block Encryption. Draft, IEEE Computer Society, March 2003. http://www.siswg.org/docs/index.html (Webpage visited on 12/04/05). BIBLIOGRAPHY 177 [37] V. WELCH Ed. Globus Toolkit Version 4 Grid Security Infrastructure: A Standards Perspective. Technical report, Globus Security Team, Globus Alliance, December 2004. available from http://www.globus.org/toolkit/docs/4.0/security (Webpage visited 16/05/05). [38] C. ELLISON. SPKI Requirements. Request For Comments (RFC) 2692, Internet Engineering Task Force (IETF), September 1999. http://www.ietf.org/rfc/rfc2692.txt (Webpage visited on 12/04/05). [39] C. ELLISON, B. FRANTZ, B. LAMPSON, and al. SPKI Certificate Theory. Request For Comments (RFC) 2693, Internet Engineering Task Force (IETF), September 1999. http://www.ietf.org/rfc/rfc2693.txt (Webpage visited on 12/04/05). [40] M. ERDOS and S. CANTOR. Shibboleth-Architecture Draft v05. Technical report, Internet2, 2002. http://middleware.internet2.edu/shibboleth (Webpage visited on 12/04/05). [41] EUROPEAN UNION. EUROPA - internal market - data protection - legislative documents. http://europa.eu.int/comm/internal market/privacy/law en.htm. (Webpage visited on 12/04/05). [42] EUROPEAN UNION. Directive 95/46/EC of the European Parliament and of the Council. Official Journal of the European Communities, L 281:31–50, 24 October 1995. [43] EUROPEAN UNION. Charter of fundamental rights of the european union. Official Journal of the European Communities, C 364:1–22, 7 December 2000. [44] EUROPEAN UNION. Consolidated Version of the Treaty on European Union. Official Journal of the European Communities, C 325:5–181, 24 December 2002. [45] S. FARRELL and R. HOUSLEY. An Internet Attribute Certificate Profile for Authorization. Request For Comments (RFC) 3281, Internet Egnineering Task Force (IETF), April 2002. http://www.ietf.org/rfc/rfc3281.txt (Webpage visited on 12/04/05). 178 BIBLIOGRAPHY [46] D. FERRAIOLO and D. R. KUHN. Role Based Access Control. In Proceedings of the 15th NIST-NCSC National Computer Security Conference, pages 554–563, October 1992. [47] D. FERRAIOLO, R. SANDHU, S. GAVRILLA, and al. A Proposed Standard for Role Based Access Control. ACM Transactions on Information and System Security, 4(3), 2001. [48] I. FOSTER and C. KESSELMAN, editors. The Grid Blueprint for a New Computing Infrastructure. Morgan Kaufmann Publishers, Inc., San Francisco, 1999. [49] I. FOSTER, H. KISHIMOTO, and A. SAVVA Eds. The Open Grid Services Architecture. Draft, Open Grid Services Architecture Working Group, January 2005. available from http://forge.gridforum.org/projects/ogsa-wg (Webpage visited on 16/05/05). [50] K. FU. Group Sharing and Random Access in Cryptographic Storage File Systems. Master’s thesis, Massachusetts Institute of Technology, June 1999. [51] A. GABILLON and E. BRUNO. Regulating Access to XML documents. In Proceedings of the fifteenth annual working conference on Database and application security, Niagara on the Lake, Ontaria, Canada, July 2001. [52] S. GODIK and T. MOSES Eds. eXtensible Access Control Markup Language (XACML). Standard, Organization for the Advancement of Structured Information Standards (OASIS), February 2003. http://www.oasis-open.org/ (Webpage visited on 12/04/05). [53] FRENCH GOVERNMENT. Loi n◦ 2004-801 du 6 août 2004 relative à la protection des personnes physiques à l’égard des traitements de données à caractère personnel et modifiant la loi n◦ 78-17 du 6 janvier 1978 relative à l’informatique, aux fichiers et aux libertés. Journal Officiel de la République Française, JUSX0100026L, 6 August 2004. [54] G. S. GRAHAM and P. J. DENNING. Protection principles and practice. In Proceedings of the American Federation of Information Processing Societies (AFIPS) Conference, volume 40, pages 417–429, Montvale, N.J., USA, May 1972. AFIPS Press. BIBLIOGRAPHY 179 [55] P. GUTMAN. PKI: It’s Not Dead, Just Resting. IEEE Computer, 35(8):41–49, August 2002. [56] P. GUTMANN. Secure filesystem. http://www.cs.auckland.ac.nz/∼pgut001/sfs (Webpage visited on 12/04/05), September 1996. [57] M. H. HARRISON, W. L. RUZZO, and J. D. ULLMAN. Protection in operating systems. Communications of the ACM, 19(8):461–471, 1976. [58] J. A. M. HERVEG, F. CRAZZOLARA, S. E. MIDDLETON, and al. GEMSS: Privacy and security for a Medical Grid. In Proceedings of the second HealthGRID conference, Clermont-Ferrand, France, January 2004. [59] J. HUGHES and C. FEIST. Architecture of the Secure File System. In Proceedings of the 18th IEEE Symposium on Mass Storage Systems, pages 277–290, San Diego, CA, USA, April 2001. [60] J. HUGHES, C. FEIST, S. HAWKINSON, and al. A Universal Access, Smart-Card-Based, Secure File System. In Proceedings of the 3rd annual Atlanta Linux Showcase, Atlanta, Georgia, USA, October 1999. [61] ISO/IEC. Information technology – Open Systems Interconnection – Security frameworks for open systems: Access control framework. ISO Standard ISO/IEC 10181-3, International Organization for Standardization (ISO), 1995. [62] P. KOCHER. Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems. In Advances in Cryptology: Proceedings of the CRYPTO’96 conference, pages 104–113, Santa Barbara, California, USA, August 1996. Springer Verlag. [63] P. KOCHER, J. JAFFE, and B. JUN. Differential Power Analysis : Leaking Secrets. In Advances in Cryptology: Proceedings of the CRYPTO’99 conference, vol 1666, pages 388–397, Santa Barbara, California, USA, August 1999. Springer Verlag. [64] B. LAMPSON. Protection. In Proceedings of the 5th Princeton Conference on Information Sciences and Systems, Princeton, 1971. Reprinted in ACM Operating Systems Rev., volume 8, 1, pages 18–24, 1974. [65] R. LEPRO. Cardea: Dynamic Access Control in Distributed Systems. Technical Report NAS-03-020, NASA Advanced Supercomputing (NAS) Division, November 2003. 180 BIBLIOGRAPHY [66] M. LORCH, D. ADAMS, D. KAFURA, and al. The PRIMA System for Privilege Management, Authorization and Enforcement. In Proceedings of the 4th International Workshop on Grid Computing, Phoenix, AR, USA, November 2003. [67] M. LORCH and D. KAFURA. Supporting Secure Ad-hoc User Collaboration in Grid Environments. In Proceedings of the 3rd International Workshop on Grid Computing, Baltimore, MD, USA, November 2002. [68] E. MALER, P. MISHRA, and R. PHILPOTT Eds. The OASIS Security Assertion Markup Language (SAML) v1.1. Standard, Organization for the Advancement of Structured Information Standards (OASIS), September 2003. http://www.oasis-open.org (Webpage visited on 12/04/05). [69] S. MANGARD. A Simple Power-Analysis (SPA) Attack on Implementations of the AES Key Expansion. In Lecture Notes in Computer Science Volume 2587: Proceedings of the 5th International Conference on Information Security and Cryptology (ICISC), pages 343–358, Seoul, Korea, November 2002. [70] F. MARTIN-SANCHEZ, A. BABIC, R. BAUD, and al. Synergy between medical informatics and bioinformatics: facilitating genomic medicine for future health care. Journal of Biomedical Informatics, 37(1):30–42, 2004. [71] D. MAZIÉRES. Security and Decentralized Control in the SFS Global File System. Master’s thesis, Massachusetts Institute of Technology, August 1998. [72] A. MCNAB and S. KAUSHAL. Gridsite: Grid access control language. http://www.gridsite.org/1.0.x/gacl.html, December 2003. (Webpage visited on 12/04/05). [73] MICROSOFT. Encrypting file system for windows 2000. Whitepaper 6715, Microsoft Corporation, 1998. [74] E. MILLER, D. LONG, W. FREEMAN, and al. Strong Security for Network-Attached Storage. In Proceedings of the 1st Annual Conference on File and Storage Technologies (FAST), Monterey, CA, USA, January 2002. [75] M. MURATA, A. TOZAWA, and M. KUDO. XML Access Control Using Static Analysis. In Proceedings of the 10th ACM conference on BIBLIOGRAPHY 181 computer and communication security, Washington, DC, USA, October 2003. [76] N. Nagaratnam, P. Janson, J. Dayka, A. Nadalin, F. Siebenlist, V. Welch, S. Tuecke, and I. Foster. Security Architecture for Open Grid Services. Technical report, GGF OSGA Security Workgroup, July 2002. Revised 6/5/2003, available from http://www.ggf.org/ogsa-secwg (Webpage visited on 26/06/05). [77] G. NAVARRO, B. SADIGHI FIROZABADI, E. RISSANEN, and al. Constrained delegation in XML-based Access Control and Digital Rights Management Standards. In Proceedings of the IASTED International Conference on Communication, Network, and Information Security, New York, USA, December 2003. [78] L. PEARLMAN, C. KESSELMAN, V. WELCH, and al. The Community Authorization Service: Status and Future. In Proceedings of the 2003 Conference for Computing in High Energy and Nuclear Physics (CHEP), La Jolla, California, March 2003. [79] L. PEARLMAN, V. WELCH, I. FOSTER, and al. A Community Authorization Service for Group Collaboration. In Proceedings of the 2002 IEEE Workshop on Policies for Distributed Systems and Networks, Monterey, California, USA, June 2002. [80] PKIX WORKING GROUP. Public Key Infrastructure (X.509). Technical report, Internet Engineering Task Force (IETF), 2002. http://www.ietf.org/html.charters/pkix-charter.html (Webpage visited on 12/04/05). [81] J. RAO, P. ROHATGI, H. SCHERZER, and al. Partitioning Attacks: Or How to Rapidly Clone Some GSM Cards. In Proceedings of the 2002 IEEE Symposium on Security and Privacy, pages 31–44, Oakland, California, USA, Mai 2002. [82] T. RINDFLEISCH. Privacy, information technology, and health care. Communications of the ACM, 40(8):92–100, 1997. [83] R. SAADI, J. M. PIERSON, and L. BRUNIE. APC: Access Pass Certificate. Distrust Certification Model for Large Access in Pervasive Environment. To appear in the proceedings of the IEEE International Conference on Pervasive Services, Santorini, Greece, July 2005. 182 BIBLIOGRAPHY [84] P. SAMARATI and S. DE CAPITANI DI VIMERCATI. Access Control: Policies, Models, and Mechanisms. In Proceedings of the first International School On Foundations Of Security Analysis And Design (FOSAD), volume LNCS 2171, pages 137–196. Springer, 2001. [85] R. SANDHU, E. J. COYNE, H. L. FEINSTEIN, and al. Role-Based Access Control Models. IEEE Computer, 29(2):38–47, 1996. [86] R. SANDHU and P. SAMARATI. Access Control: Principles and Practice. IEEE Communications Magazine, 32(9):40–48, 1994. [87] B. SCHNEIER. Applied Cryptography: Protocols, Algorithms, and Source Code in C, Second Edition. John Wiley & Sons, New York, second edition, 1995. [88] L. SEITZ, J. MONTAGNAT, J. M. PIERSON, and al. Authentication and Authorization Prototype on the µgrid for Medical Data Management. In From Grid to Healthgrid, Proceedings of Healthgrid 2005, pages 222–233, Oxford, UK, April 2005. IOS Press. [89] L. SEITZ, E. RISSANEN, T. SANDHOLM, and al. Policy Administration Control and Delegation using XACML and Delegent. Technical Report RR-2005-010, LIRIS, INSA-Lyon, France, 2005. [90] A. SHAMIR. How to Share a Secret. In Communications of the ACM, volume 22, pages 612–613, 1979. [91] M. THOMPSON, W. JOHNSTON, S. MUDUMBAI, and al. Certificate-based Access Control for Widely Distributed Resources. In Proceedings of the 8th USENIX Security Symposium, Washinton, D.C., USA, August 1999. [92] M. THOMPSON, S. MUDUMBAI, A. ESSIARI, and al. Authorization Policy in a PKI Environment. In Proceedings of the 1st Annual NIST workshop on PKI, Gaithersburg, Maryland, USA, April 2002. [93] S. TUECKE, V. WELCH, D. ENGERT, and al. Internet X.509 Public Key Infrastructure (PKI) Proxy Certificate Profile. Request For Comments (RFC) 3820, Internet Engineering Task Force (IETF), June 2004. http://www.ietf.org/rfc/rfc3820.txt (Webpage visited on 12/04/05). [94] INTERNATIONAL TELECOMMUNICATION UNION. Astract syntax notation one (asn.1). ITU-T Recommendation — ISO/IEC Standard X.680 — 8824-1:2002, International Telecommunication Union, July 2002. BIBLIOGRAPHY 183 [95] S. DE CAPITANI DI VIMERCATI and P. SAMARATI. New Directions in Access Control. In: Cyberspace Security and Defense: Research Issues, Kluwer Academic Publisher (to appear). Available from http://seclab.dti.unimi.it/Papers/nato.pdf (Webpage visited on 12/04/05). [96] J. VOLLBRECHT, P. CALHOUN, S. FARRELL, and al. AAA Authorization Framework. Request For Comments (RFC) 2904, Internet Engineering Task Force (IETF), August 2000. http://www.ietf.org/rfc/rfc2904.txt (Webpage visited on 12/04/05). [97] J. WANG, D. DEL VECCHIO, and M. HUMPHREY. Extending the Security Assertion Markup Language to Support Delegation for Web Services and Grid Services. submitted for publication, available from http://www.cs.virginia.edu/∼humphrey/GCG.html (Webpage visited on 12/04/05), 2005. [98] X. WANG, Y. L. YIN, and H. YU. Collision Search Attacks on SHA1. Available from: http://theory.csail.mit.edu/ yiqun/shanote.pdf (Webpage visited on 12/04/05), February 2005. [99] V. WELCH, T. BARTON, K. KEAHEY, and al. Attributes, Anonymity, and Access: Shibboleth and Globus Integration to Facilitate Grid Collaboration. In Proceedings of the 4th Annual PKI R&D Workshop, Gaithersburg, MD, USA, April 2005. [100] V. WELCH, I. FOSTER, C. KESSELMAN, and al. X.509 Proxy Certificates for Dynamic Delegation. In Proceedings of the 3rd Annual PKI R&D Workshop., Gaithersburg, MD, USA, April 2004. [101] D. WIJESEKERA and S. JAJODIA. A Propositional Policy Algebra for Access Control. ACM Transactions on Information and System Security (TISSEC), 6(2):286–325, May 2003. [102] W. YUAN and J. CAI D. DEWITT. X-diff: An effective change detection algorithm for xml documents. In Proceedings of the 19th International Conference on Data Engineering, pages 519–530, Bangalore, India, March 2003. [103] E. ZADOK, I. BADULESCU, and A. SHENDER. Cryptfs: A Stackable Vnode Level Encryption File System. Technical Report CUCS-021-98, Computer Science Department, Columbia University, July 1998.