Importing SQL Data Services (SDS) Data into Azure Table Storage (ATS)

Mike Amundsen

2009-03-30


1. Introduction
2. Mapping SDS Storage to ATS Storage
2.1. Mapping the SDS ACE Model to ATS
2.2. Mapping SDS Entities to ATS Entities
3. Getting your data from SDS
4. Transforming Entities from SDS to ATS
5. Importing your data to ATS
6. Summary

1. Introduction

Now that Microsoft has announced that the current SQL Data Services (SDS) will be discontinued in favor of the more traditional Tabular Data Services (TDS) model, there are those who might need to export their existing SDS data to a viable alternative service. There are a number of possible candidates for this import. The list of likely candidates includes (but is not limited to) : 1) Amazon's SimpleDB, 2) Microsoft's Azure Table Storage, 3) the new TDS version of SDS. This article will outline the steps to export your data from the current SDS storage system and import that same data into the Azure Table Storage system.

In summary, the process of importing your SDS data into ATS is as follows:

  1. Export your data from SDS to a local workstation

  2. Transform the local SDS entities into valid ATS entities

  3. Import the converted data into ATS

There are a couple advantages to adopting this SDS to ATS conversion strategy. First, SDS and ATS share a similar model - the Entity-Attribute-Value (EAV) storage interface. That makes moving the SDS entities to ATS rather straight-forward. Second, there are some simple tools already available for reading and writing SDS and ATS data. This article highlights sample utilities written by the author, but there are others available as well as free code libraries supplied by Microsoft that allow users to do the same thing.

There are also some disadvantages to importing your SDS data to ATS. If you are looking forward to using the relational features of the new TDS version of SQL Data Services, converting to ATS is not for you. Second, there are some key features of SDS that are not currently available in ATS; chief among them is support for secondary indexes (ORDER BY) and JOIN. It should be pointed out that Microsoft has publicly stated their intent to add secondary index support to ATS, but no such commitment has been announced for JOINs. If these are important to you, then it's probably a better idea to import your SDS data to another service that supports these features such as Amazon's SimpleDB.

Tip

The technique in this article depends on two command-line applications that allow basic interaction with SDS and ATS. You can download the SDS.EXE and the AZURE.EXE utilities from the author's Codeplex pages.

2. Mapping SDS Storage to ATS Storage

Before getting into the details of converting SDS data into valid ATS data, it is important to outline a general strategy for mapping the SDS storage model to the ATS model. The mapping described here was designed to take advantage of the simplest possible conversion pattern - one that is easily automated. It is possible that this strategy will not fit all cases and may not take best advantage of the features of Azure Table Storage or fully replace the functionality of the existing SQL Data Services storage model. However, the process outlined here can provide the reader guidance in developing an appropriate strategy for specific data conversion projects.

2.1. Mapping the SDS ACE Model to ATS

One of the challenges of converting SDS data into ATS data is to develop a 'map' for matching the Authority-Container-Entity (ACE) model of SDS to ATS data stores. At first glance, this might not seem an easy thing to do but, with a bit of adjustment, there is a straight-forward mapping between the two services.

SDS uses a data model that provides each user account with unique DNS names for each Authority. Each Authority can hold a collection of Containers. The Container is the boundary for queries (SDS does not support cross-container queries). Finally, each Container can hold a collection of Entities. SDS Entities have an implied "Kind" value which is the root element of an Entity object (<customer>...</customer>, <order>...</order>, etc.). In SDS, you can store a mix of Entities in the same Container. In this way, SDS makes is easy to request different Entity types (or Kinds) in the same query.

Figure 1. Figure 1.1 - The ACE Data Model

Figure 1.1 - The ACE Data Model

ATS, on the other hand, provides only one DNS name per user account (http://{user-account}.table.core.windows.net). Each user account can hold multiple Tables. In ATS, the Table is the query boundary (ATS does not support cross-table queries). Each Table can hold multiple Entities. ATS Entities do not have a "Kind" or Entity-type identifier as in SDS. Instead, ATS Entities use a two-part key structure. Each Entity has a PartitionKey and a RowKey.

Figure 2. Figure 2 - The Azure Storage Model

Figure 2 - The Azure Storage Model

The mapping model adopted by this author is as follows:

Table 1. SDS to ATS Data Model Mapping

SQL Data Services Data ObjectAzure Table Services Data Object
AuthorityNONE
ContainerTable
Entity (Kind, Id)Entity (PartitionKey,RowKey)

The above table indicates there is no direct mapping of an SDS Authority to an ATS object. Since SDS offers Authorities as DNS entries and ATS only supports a single DNS entry per account, there is no need to find a mapping for this SDS object. Also, SDS Containers are mapped to ATS Tables. This is done to match the query boundaries of both data models. Finally, the SDS Entity is mapped to the ATS entity, but with an additional notation. Since SDS provided support for an entity "Kind" and ATS does not, this mapping uses the PartitionKey of ATS to hold the "Kind" information of SDS. The SDS Id value maps to the ATS RowKey.

Warning

This mapping does not address the differences in physical data partitioning between SDS and ATS. In SDS, physical partitioning was done 'behind the scenes.' In ATS, the PartitionKey is used to indicate a possible physical partition boundary. Even though ATS supports queries across these 'partition boundaries' it is possible that performance could be adversely affected by these queries. These are assumptions on the author's part and need to be explored more completely before validating this conversion model for large data sets.

2.2. Mapping SDS Entities to ATS Entities

Another key task in creating a conversion strategy is developing a 'map' between SDS Entities and ATS Entities. Again, with a slight adjustment, achieving this mapping is relatively easy.

SDS uses a custom XML model that is very lean. There are only two required elements in the SDS Entity (s:Id and s:Version). The user can indicate the root element of the Entity (<todo ... /> in the example shown below) and SDS data typing is indicated by the xsi:type attributes for each element.

<!-- SDS sample entity object -->
<todo xmlns:s="http://schemas.microsoft.com/sitka/2008/03/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:x="http://www.w3.org/2001/XMLSchema">
  <s:Id>633692420545456176</s:Id>
  <s:Version>49166621</s:Version>
  <message xsi:type="x:string">new thing</message>
  <date-created xsi:type="x:dateTime">2009-02-03T07:14:14</date-created>
</todo>

The ATS Entity model uses the more detailed Atom XML format. Much of the Atom record is used as a 'wrapper' for the ATS data itself. The actual ATS entity data is stored in the <m:properties>...</m:properties> element in each Atom Entry document. Data typing for ATS is only needed for non-string elements and is indicated by the m:type attributes.

<!-- ATS sample entity object -->
<entry xml:base="http://mamund.table.core.windows.net/" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata" m:etag="W/&quot;datetime'2009-03-29T18%3A14%3A20.262Z'&quot;" xmlns="http://www.w3.org/2005/Atom">
  <id>http://mamund.table.core.windows.net/sample(PartitionKey='todo',RowKey='633692420545456176')</id>
  <title type="text"></title>
  <updated>2009-03-29T18:50:23Z</updated>
  <author>
    <name />
  </author>
  <link rel="edit" title="sample" href="sample(PartitionKey='todo',RowKey='633692420545456176')" />
  <category term="mamund.sample" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
  <content type="application/xml">
    <m:properties>
      <d:PartitionKey>todo</d:PartitionKey>
      <d:RowKey>633692420545456176</d:RowKey>
      <d:Timestamp m:type="Edm.DateTime">2009-03-29T18:14:20.262Z</d:Timestamp>
      <d:message>new thing</d:message>
      <d:date-created m:type="Edm.DateTime">2009-02-03T07:14:14Z</d:date-created>
    </m:properties>
  </content>
</entry>

With the above information as a guide, the following SDS to ATS Entity mapping will be used for this article:

Table 2. SDS to ATS Entity Mapping

SQL Server Data Services EntityAzure Table Services Entity
s:Kindd:PartitionKey
s:Idd:RowKey
s:VersionNONE
additional elementsd:<name-of-the-element>

As Table 2 indicates, mapping of SDS entities is targeted at the <m:properties /> collection of ATS entities. There is a close mapping between the SDS s:Id and the d:RowKey element of ATS. There is an additional mapping between the SDS root element (<todo />) and the ATS d:PartitionKey element. There is no mapping for the SDS s:Version element (this is a read-only element used by SDS) and all other elements of the SDS record can be mapped directly to elements of the same name in ATS (using the d name space).

With these mappings defined, it is now possible to work through the details of implementing the conversion routine itself.

3. Getting your data from SDS

The first step in the conversion process is to export your data from the SDS storage system onto a local computer. Since the process outlined here depends on using XLST to convert the native XML from SDS into valid ATS entities, we'll use a tool that can read SDS data and write the resulting XML directly to disk. The tool used in this article is one written by the author - SDS.EXE. This is a simple command-line tool that affords direct access to SDS without the need for a browser or custom coding. Below is an example of the SDS.EXE tool in action.

C:\Projects\sds-to-azure>sds /samples/todo-list/ >todo-list.xml

<!-- todo-list.xml -->
<s:EntitySet xmlns:s="http://schemas.microsoft.com/sitka/2008/03/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:x="http://www.w3.org/2001/XMLSchema">
  <todo>
    <s:Id>633692420545456176</s:Id>
    <s:Version>49166621</s:Version>
    <message xsi:type="x:string">new thing</message>
    <date-created xsi:type="x:dateTime">2009-02-03T07:14:14</date-created>
  </todo>
  <todo>
    <s:Id>633692420985547635</s:Id>
    <s:Version>49166624</s:Version>
    <message xsi:type="x:string">old thing</message>
    <date-created xsi:type="x:dateTime">2009-02-03T07:14:58</date-created>
  </todo>
  <todo>
    <s:Id>633692421771056986</s:Id>
    <s:Version>49166627</s:Version>
    <message xsi:type="x:string">some fun</message>
    <date-created xsi:type="x:dateTime">2009-02-03T07:16:17</date-created>
  </todo>
  <todo>
    <s:Id>633692421841359141</s:Id>
    <s:Version>49166628</s:Version>
    <message xsi:type="x:string">some work</message>
    <date-created xsi:type="x:dateTime">2009-02-03T07:16:24</date-created>
  </todo>
</s:EntitySet>

The SDS.EXE utility allows users to make standard HTTP requests directly to the SDS data storage system. In the example shown here, the request was for all the entities in the todo-list table of the samples authority. The utility pipes the response to the todo-list.xml file on disk for later processing.

Tip

The SDS.EXE utility supports the full set of SDS query parameters. That means users can craft any valid SDS query in order to pull just the right collection of SDS Entities for conversion into ATS.

Once the SDS data has been saved to disk, the next step is to write an XSLT transform to convert the SDS data into valid ATS entities.

4. Transforming Entities from SDS to ATS

Since both SDS and ATS support XML, converting existing SDS entities into ATS entities is a straight-forward process. The utility used in this article to handle requests against the ATS data store (AZURE.EXE) uses the Atom media type for processing entities. Therefore the process of converting SDS entities into valid ATS entities is a matter of transforming SDS XML into valid Atom XML. For the example shown above (the todo list), that means converting the s:Id, message, and create-date elements in the SDS record into the proper attributes for the ATS record. Below is a small bit of XSLT that does just that:

  <xsl:template match="todo">

    <!-- start record -->
    <xsl:text><![CDATA[<m:properties>]]></xsl:text>

    <!-- partition key -->
    <xsl:text><![CDATA[<d:PartitionKey>todo</d:PartitionKey>]]></xsl:text>

    <!-- rowkey -->
    <xsl:text><![CDATA[<d:RowKey>]]></xsl:text>
    <xsl:value-of select="s:Id"/>
    <xsl:text><![CDATA[</d:RowKey>]]></xsl:text>

    <!-- message attribute -->
    <xsl:text><![CDATA[<d:message>]]></xsl:text>
    <xsl:value-of select="message"/>
    <xsl:text><![CDATA[</d:message>]]></xsl:text>

    <!-- date-created attribute -->
    <xsl:text><![CDATA[<d:date-created m:type=\"Edm.DateTime\">]]></xsl:text>
    <xsl:value-of select="date-created"/>
    <xsl:text><![CDATA[</d:date-created>]]></xsl:text>

    <!-- end record -->
    <xsl:text><![CDATA[</m:properties>]]></xsl:text>

  </xsl:template>

There are a few things to note in this example. First, ATS requires two fields for every entity: the PartitionKey and RowKey elements. SDS does not use these elements, but has similar requirements. For this conversion pattern, the root name of the SDS entity (todo) is placed in the ATS PartitionKey element and the SDS s:Id element is used as the ATS RowKey element. All other SDS elements are converted to their ATS equivalents directly. Note that the date-created element requires an added m:type setting for ATS (see Data Service Metadata for more information on use of the m:type attribute in ATS).

If you're already familiar with the Atom format, you'll also notice that the above transformation does not result in a valid Atom Entry. This transform takes advantage of a shortcut in the AZURE.EXE utility that allows users to supply only the <content> portion of the Atom entry. The AZURE.EXE utility supplies the rest of the Atom record itself. The example above shows the details for a specific SDS entity - the <todo> entity. However, the process of converting other SDS entities into valid ATS entities works the same way.

Warning

SDS supports storing BLOBs and ATS does not. If you have BLOB objects in your SDS storage, you'll need to store them somewhere other than the Azure Table Storage service. One option is to move them to Azure Blob Storage. The process would be different (and is beyond the scope of this article), but can be accomplished in a similar way.

5. Importing your data to ATS

Now that the details of converting an SDS entity into a valid ATS entity have been worked out, the final task of importing the converted data into the ATS data store is the only one that remains. An efficient way to accomplish this task is to build an XSLT transform that not only converts the SDS entities, but also outputs all the details needed to run the AZURE.EXE as a batch utility on the command-line itself. In other words, use the XSLT to produce a valid batch job that processes the SDS export file, converts it into ATS entities and executes the proper AZURE.EXE commands.

Below is a sample AZURE.EXE session that creates a new table and adds a record to that table (carriage returns added for clarity):

azure /sample post
azure /sample/todo,633692420545456176 post 
"<m:properties>
<d:PartitionKey>todo</d:PartitionKey>
<d:RowKey>633692420545456176</d:RowKey>
<d:message>new thing</d:message>
<d:date-created m:type=\"Edm.DateTime\">2009-02-03T07:14:14</d:date-created>
</m:properties>"

With the above sample AZURE.EXE session as a guide, we can complete the XSLT transform that reads the SDS data from disk and produces a valid AZURE.EXE 'job' that can import all the records in the SDS collection. Below is the final XSLT document.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

  xmlns:s="http://schemas.microsoft.com/sitka/2008/03/"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns:x="http://www.w3.org/2001/XMLSchema"
                
  xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices"
  xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata">

  <xsl:output method="text" encoding="utf-8"/>

  <xsl:template match="/">

    <!-- create the target table -->
    <xsl:text>azure /sample p&#13;</xsl:text>

    <!-- create record for each entity -->
    <xsl:apply-templates select="//todo" />

  </xsl:template>

  <xsl:template match="todo">

    <!-- /table/PartitionKey,RowKey -->
    <xsl:text>azure /sample/todo,</xsl:text>
    <xsl:value-of select="s:Id"/>

    <!-- start record -->
    <xsl:text><![CDATA[ p "<m:properties>]]></xsl:text>

    <!-- partition key in record -->
    <xsl:text><![CDATA[<d:PartitionKey>todo</d:PartitionKey>]]></xsl:text>

    <!-- rowkey in record -->
    <xsl:text><![CDATA[<d:RowKey>]]></xsl:text>
    <xsl:value-of select="s:Id"/>
    <xsl:text><![CDATA[</d:RowKey>]]></xsl:text>

    <!-- other attributes from the source record -->
    <xsl:text><![CDATA[<d:message>]]></xsl:text>
    <xsl:value-of select="message"/>
    <xsl:text><![CDATA[</d:message>]]></xsl:text>
    <xsl:text><![CDATA[<d:date-created m:type=\"Edm.DateTime\">]]></xsl:text>
    <xsl:value-of select="date-created"/>
    <xsl:text><![CDATA[</d:date-created>]]></xsl:text>

    <!-- end record and comment line-->
    <xsl:text><![CDATA[</m:properties>"]]>&#13;</xsl:text>

  </xsl:template>
  
</xsl:stylesheet>

The output of the XSLT transform (see below) results in a single batch file with everything needed to write records to the ATS data store (details elided for display purposes).

azure /sample p
azure /sample/todo,633692420545456176 p "<m:properties><d:PartitionKey>todo</d:PartitionKey><d:RowKey>633692420545456176</d:RowKey><d:message>...</m:properties>"
azure /sample/todo,633692420985547635 p "<m:properties><d:PartitionKey>todo</d:PartitionKey><d:RowKey>633692420985547635</d:RowKey><d:message>...</m:properties>"
azure /sample/todo,633692421771056986 p "<m:properties><d:PartitionKey>todo</d:PartitionKey><d:RowKey>633692421771056986</d:RowKey><d:message>...</m:properties>"
azure /sample/todo,633692421841359141 p "<m:properties><d:PartitionKey>todo</d:PartitionKey><d:RowKey>633692421841359141</d:RowKey><d:message>...</m:properties>"

Once the batch file has been created, all that remains is to execute the batch file at a command prompt. Below is sample output from this step:

C:\Projects\sds-to-azure>run-todo-job.cmd
Table [http://mamund.table.core.windows.net/Tables('sample')] has been added.
Entity [http://mamund.table.core.windows.net/sample(PartitionKey='todo',RowKey='633692420545456176')] has been added.
Entity [http://mamund.table.core.windows.net/sample(PartitionKey='todo',RowKey='633692420985547635')] has been added.
Entity [http://mamund.table.core.windows.net/sample(PartitionKey='todo',RowKey='633692421771056986')] has been added.
Entity [http://mamund.table.core.windows.net/sample(PartitionKey='todo',RowKey='633692421841359141')] has been added.

6. Summary

This article outlined a simple process for exporting data from SDS, transforming it into valid ATS data and importing it into the ATS data store. First, a mapping of the SDS storage model (ACE) to ATS was reviewed along with a suggested mapping of SDS entities to ATS entities. The details of the described conversion process depend on two command-line utilities written by the author (SDS.EXE and AZURE.EXE), but the work can be accomplished using other means including using existing SDK code libraries supplied by Microsoft. The article also points out some differences between the SDS and ATS feature set (ATS currently lacks ORDER BY and JOIN support; SDS supports BLOBs, ATS does not) and identifies alternative services (SimpleDB supports ORDER BY and JOIN; storing SDS BLOBs in Azure Blob Storage).