RCommands
From GridInfo
Contents |
Introduction to the RCommands
The RCommands framework provides a set of scriptable commands to associated metadata to files stored within a distributed files system such as the Storage Resource Broker, a set of FTP servers, or a set of files available over http. They enable the creation or metadata to be semi-automated. The RCommands insert and modify metadata held within a central metadata server.
The RCommands have been developed by Rik Tyer of the CCLRC eScience Centre, Daresbury Laboratory UK.
Data organisation
The RCommands assume a three-layer hierarchy for the data
- The study level. This is the over-arching level under which you will group all files concerned with one particular piece of work. Examples might be a study of sea surface temperatures in the North Atlantic Ocean. If you use the pdf publications files as your data, all together they might represent a single study called "escience".
- The dataset level. This grouping will consist of a set of files associated with one aspect of the study. For example, in a study of sea surface temperatures, it might be one season or one region. If you use the pdf publications files above, we have already separated these into possible data sets ("grid computing", "data management", "collaborative tools" and "applications").
- The data object level. This will consist of a single file or a natural collection of files (such as the complete set of files produced by a single computation). If you use the pdf publications file, each file will be a data object.
One important point should be noted: the study and dataset levels are completely abstract. In contrast, the data objects correspond to URIs that point to real objects, including (but not exclusively so) files or collections of files in the SRB.
Users should not feel constrained by this hierarchy. For example, you may feel that your whole life's work is one study, so that this level has little meaning. On the other hand, you may feel that any one study should only have data objects. This hierarchy has many interpretations and should be used in the way that best suits the investigator.
It is possible to add metadata to each of these levels. Within the framework of the RCommands, each level will have an ID number that is used in the scriptable RCommands.
The commands
There are only ten RCommands, with detailed descriptions provided below.
- Rinit: starts an RCommand session, and is needed in order to read information from configuration files.
- Rpasswd: changes your password that is associated with your access to the metadata server.
- Rcreate: creates a metadata object, ie any of the study, dataset and data object levels of metadata.
- Rannotate: adds a decription or a metadata parameter name/value pair to a study of dataset
- Rls: lists the different entities within the metadata database.
- Rget: displays the metadata associated with a particular entity.
- Rrm: removes entities from the metadata database.
- Rchmod: adds or removes investigators to or from a study.
- Rsearch: searches the metadata associated with studies and datasets for name/value pairs or keyword descriptions
- Rexit: ends an RCommand session and has the primary effect of cleaning away hidden files created during the session.
Usage
Username
You will need a username to provide you with access to the RCommands database: this will be provided by the database manager.
Create the configuration files
You need to create a file of the name ~/.rcommands/rcommand.config, which has the form
username = <your username in the RCommands database> password = <your password> cacertdir = /etc/grid-security/certificates
Initiating an RCommand session
You initiate an RCommand session using the Rinit command. You can test that all is well by typing the Rls command: it will return a message telling you about any studies you have. To get information about other commands, you can simply type the command name with no arguments, you can use the unix man command, or you can look at the information below.
Creating a study
First use the Rcreate command to create a study level. To use Rcreate you will need to give the study a name, add a description, and assign it to a topic, via:
Rcreate -n <name> -k <description> -t <topicID>
First you should think about the topic. You can list all topics by the command
Rls -t
Chose a topic and note the number; this will be the topicID label. Run the Rcreate command to create a study. The name and description labels can contain more than one word within quotes. For example, suppose we want to create a database containing a set of workshop papers, we might set this up by:
Rcreate -n "Workshop papers" -k "Papers for workshop" -t 4
We can check that this has worked by running the Rls command. This will return information like
------------------------- StudyID: 1026 Name: Workshop papers -------------------------
where the StudyID number will differ for different people. Now we can look at this in more detail using the Rget command:
Rget -s studyID
where you add your StudyID number. For the example above:
Rget -s 1026
gives
------------------------- StudyID: 1026 Name: Workshop papers Description: Papers for workshop Created by: martin dove Status: In Progress Start_date: 07-01-2006 -------------------------
Adding datasets with metadata
Now we want to add some data sets to the study. Following the example of pdf publications, we could create some datasets by
Rcreate -s 1026 -n "Papers on grid computing" Rcreate -s 1026 -n "Papers on data management" Rcreate -s 1026 -n "Papers on collaborative tools" Rcreate -s 1026 -n "Papers on escience applications"
Each invocate will create a DatasetID, as will be echoed to the screen. Now check on the results of these commands by
Rls -s 1026
This will show you the DatasetID for each dataset (again, different users will get different numbers). You can look at any one dataset by using the command
Rget -d DatasetID
where you use the appropriate number of each DatasetID.
Now we will add some metadata against each data set. For this we use the Rannotate command. The first is to add a brief description to the dataset. In my example, running Rls - s 1026 gives
------------------------- Dataset ID: 26 Dataset Name: Papers on grid computing Parent StudyID: 1026 ------------------------- Dataset ID: 27 Dataset Name: Papers on data management Parent StudyID: 1026 ------------------------- Dataset ID: 28 Dataset Name: Papers on collaborative tools Parent StudyID: 1026 ------------------------- Dataset ID: 29 Dataset Name: Papers on escience applications Parent StudyID: 1026 -------------------------
We can use the Rannotate command in in two ways. First we can add a description to the dataset. My example is
Rannotate -d 29 -k "Collection of papers on escience applications"
Second we can add some name pairs. My example is
Rannotate -d 29 -p topic=escience Rannotate -d 29 -p topicarea=applications
Running the Rget -d 29 command to view the metadata gives
------------------------- DatasetID: 29 Name: Papers on escience applications Parent StudyID: 1026 Created by: martin dove Creation_date: 07-01-2006 Description: Collection of papers on escience applications -------------------------
Note that this shows the description but not the name pair values. To see the name pairs I need to use the command Rget -d 29 -p, which yields:
------------------------- Parameter Name: topic Parameter Value: escience ------------------------- Parameter Name: topicarea Parameter Value: applications -------------------------
You can repeat this for other datasets, and you can be add whatever name/value pairs you like.
Adding data objects with metadata
Finally we reach the point where we can add metadata to the data objects. You need to first have data somewhere, and in our case our data are in the SRB. The data object can either be a file or a collection of files within the SRB. The command for adding metadata to a data object is
Rcreate -u <url> -d <datasetID> -n <name>
The <url> specifies where the file is and has the form
srb://<zone>/<collection>/<object>
In general: <collection> is composed of
/home/<username>.<domain>/<subcollection1>/.../<subcollectionN>.
An example might be
srb://Test/home/nieessrb40.srbdom/test.dat
The <datasetID> gives the dataset that you want to associate the file with, and <name> is the name you want to give the data object.
You then add metadata with the Rannotate command in the same way that you added name/value pair metadata to the datase:
Rannotate -o dataObjectID -p <name>=<value>
where you get the object dataID from the dataset using the command Rls -d <datasetID>. Hopefully by now you are getting more familiar with the various ID labels: studyID, datasetID and now dataObjectID for the study, dataset and data object respectively.
As before, you can use the Rget command to get the metadata from a data object:
Rget -o <dataObjectID> -p
Searching on the metadata
The power of metadata comes down to what you do with it! The Rcommands provide for this with the Rsearch command. There are several ways to use this command:
Rsearch -s studyID -p <name>=<value> Rsearch -d datasetID -p <name>=<value> Rsearch -d datasetID -k <keyword> Rsearch -o dataObjectID -k <keyword>
Once you have created enough metadata you can experiment with the Rsearch command.
Syntax of the RCommand line commands
Rinit ...
Starts an RCommands session.
Usage
Rinit [-v]
Description
Rinit reads in the config information from ~/.rcommands/rcommand.config, it then authenticates with the RCommand server, and then obtains a session key which is stored in ~/.rcommands/rcommand.<shell pid>. This session key is valid for one hour and is specific to the shell instance within which Rinit was executed.
Option
- -v: Prints version string and exits
Exit status
Rinit returns zero on success or non zero if there is an error.
Files
- ~/.rcommands/rcommand.config: RCommand configuration information
- ~/.rcommands/rcommand.<pid>: Session key for shell with <pid>
Rpasswd ...
Changes the RCommand password
Usage
Rpasswd [-v]
Description
Rpasswd changes user RCommand password both on the RCommand server and within the user configuration file.
Options
- -v: Prints version string and exits
Exit status
Rpasswd return zero on success or non zero if there is an error.
Files
- ~/.rcommands/rcommand.config: RCommand configuration information
Rcreate ...
Creates metadata objects
Usage
Rcreate -v Rcreate -n <name> -k <description> -t <topicID> Rcreate -s <studyID> -n <name> Rcreate -d <datasetID> -n <name> -u <url>
Description
Rpasswd creates either study, dataset or data object.
Options
-v : Prints version string and exits -s studyID : StudyID to create dataset in -d datasetID : DatasetID to create data object in -n name : Name of study, dataset or data object -k description : Description of study -t topicID : Initial topic ID for study -u url : URL of data object
Exit status
Rcreate return zero on success or non zero if there is an error.
Rannotate ...
Attaches a parameter (name/value pair) to either a dataset or a data object
Usage
Rannotate -v Rannotate -s studyID -t topicID Rannotate [-s studyID | -d datasetID] -k <description> Rannotate [-d datasetID | -o dataID] -p <name>=<value>
Description
Rannotate is used to annotate different entities within metadata database. Topics can be assigned to studies or parameters (name/value pair) can be attached to either a dataset or a data object. Description fields of study or dataset can be updated using the -k flag.
Options
-v : Prints version string and exits -s studyID : Specifies study to annotate -d datasetID : Specifies dataset to annotate -t topicID : TopicID to add to study -k <description> : Description to add to either study or dataset -p <name>=<value> : Name/value to add to dataset/data object -t topicID : Specifies topic to add to study
Exit status
Rannotate return zero on success or non zero if there is an error.
Rls ...
Lists different entities within metadata database
Usage
Rls [-v | -c | -t] Rls -s studyID Rls -d datasetID
Description
Rls lists entities within the metadata database. With no arguments, it will list all studies where the user is either the originator or an investigator. With -c or -t options, it lists the people or topics, respectively, within the database. The -s option will list the data sets within the specified study, while the -d option will list the data objects within the specified dataset.
Options
-v : Prints version string and exits -s <study ID> : Lists datasets within a given study. -d <dataset ID> : Lists data objects within a given dataset. -o <dataObject ID> : Shows metadata corresponding to a given data object. -t : Lists topics within database. -c : Lists people (colleagues/collaborators) within database.
Exit status
Rls return zero on success or non zero if there is an error.
Rget ...
Displays metadata associated with particular entity
Usage
Rget -v Rget -s studyID [-c|-t] Rget [-d datasetID | -o dataObjectID] [-p]
Description
Rls shows metadata associated with metadata objects or their parameters.
Options
-v : Prints version string and exits -s studyID : Selects study to show metadata -d datasetID : Selects dataset to show metadata -o dataObjectID : Selects data object to show metadata -c : If used with -s, will list investigators associated with study -t : If used with -s, will list topics associated with study -p : If used with -d or -o, will list parameters associated with either dataset or data object
Exit status
Rget return zero on success or non zero if there is an error.
Rrm ...
Removes different entities from metadata database
Usage
Rrm [-v] Rrm -s studyID -t topicID Rrm [-d datasetID | -o dataObjID] [-p ParamName]
Description
Rrm removes entities or parameters from within the metadata database.
Options
-v : Prints version string and exits -s <study ID> : Specifies study to remove topics from. -d <dataset ID> : Specifies dataset to remove, or dataset parameter if used in conjunction with -p option. -o <dataObject ID> : Specifies data object to remove, or data object parameter if used in conjunction with -p option. -t <topicID> : Specifes topic to remove from topic list. -p <paramName> : Used with -d or -o options in order to remove dataset or data object parameters.
Exit status
Rget return zero on success or non zero if there is an error.
Rchmod ...
Adds or removes investigators to/from a study
Usage
Rchmod -v Rchmod -s studyID [+c|-c] personID
Description
Rchmod adds or removes investigators to/from a study
Options
-v : Prints version string and exits -s studyID : StudyID to modify investigator list -c personID : Removes corresponding person from study list +c personID : Adds corresponding person to study list
Exit status
Rchmod return zero on success or non zero if there is an error.
Rsearch ...
Searches dataset and data objects for parameters
Usage
Rsearch -v Rsearch -u url Rsearch -t topicID Rsearch [-s studyID | -d datasetID] -p <name>=<value> Rsearch [ -d datasetID | -o dataObjectID ] -k <keyword>
Description
Searches for entities within metadata database. Can search for studies by topic. Can search for keywords within study and/or dataset metadata. Can search to specifed parameters attached to either datasets and/or data objects. Can search for data objects with a specific url.
Options
-v : Prints version string and exits -u url : Searches for data object with specified url -s studyID : Specifies study to search -d datasetID : Specifies dataset to search -t topicID : Searches for studies with this topicID -p <name>=<value> : Parameter to search for -k keyword : Specifies keyword to search for
Exit status
Rsearch return zero on success or non zero if there is an error.
Rexit ...
Finishes an RCommand session
Usage
Rexit -v
Description
Rexit removes the shell session file (~/.rcommands/rcommand.<pid>) and contacts the RCommand server to invalidate the session key. If Rexit is not used, the session key will expire one hour after it was created.
Options
-v : Prints version string and exits
Files
- ~.rcommands/rcommand.config - RCommand configuration information
- ~/.rcommands/rcommand.<pid> - Session key for shell with <pid>
Exit status
Rexit return zero on success or non zero if there is an error.

