Archive for the ‘cloud database’ Category

MS Azure Blob Storage outpaces Amazon S3

March 7th, 2013 Comments off

Nasuni, the cloud storage vendor, has a new report, White Paper: The State of Cloud Storage in 2013, that saysamongst other things, Azure has passed Amazon S3 in price/performance value. Nasuni publishes the annual report to share the information that it gathers in order to properly evaluate CSPs for its own use. In much the same way that traditional enterprise storage vendors use commodity disk drives as components in their products, Nasuni uses public cloud storage from the major CSPs as a component in their Storage Infrastructure as a Service.

OLYMPUS DIGITAL CAMERAIn last year’s report, tests demonstrated that Amazon S3 was the top performer due to its overall performance and consistent results. Although other offerings showed potential, they had not yet reached the level of performance that Amazon S3 demonstrated.

For the 2013 CSP Performance Test, Nasuni measured performance across three categories:

  • Write/Read/Delete Speed: This test measures the raw ability of each CSP to handle thousands of writes, reads and deletes (W/R/D) with files of varying sizes and levels of concurrency.
  • Availability: This test measures each CSP’s response time to a single W/R/D process at 60-second intervals over a 30-day period.
  • Scalability: This test measures each CSP’s performance consistency (or lack thereof) as the number of objects under management increases into the hundreds of millions.

Read more…

Amazon Redshift – Datawarehouse in the Clouds

February 16th, 2013 Comments off

Amazon announced Redshift this week. Actually, they announced the general availability. They announced that it was coming late last year.

Redshift is the new service that leverages the amazon AWS infrastructure so that you can deploy a data warehouse. I’m not yet convinced that I would want my production data warehouse on AWS, but I can really see the use in a dev and test environment, especially for integration testing.

According to Amazon: Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools. It is optimized for datasets ranging from a few hundred gigabytes to a petabyte or more and costs less than $1,000 per terabyte per year, a tenth the cost of most traditional data warehousing solutions.

A terabyte warehouse for less than $1,000 per year. That is fantastic. For one financial services firm were I created a 16TB warehouse, the price for hardware and database licensing was several million dollars. That was just startup costs. Renewing licenses per year ran into the 10s of thousands of dollars.

Redshift offers optimized query and IO performance for large workloads. They provide columnar storage, compression and parallelization to allow the service to scale to petabytes sizes.

I think one of the interesting specs is that it can use the standards Postgres drivers. I don’t see anywhere, yet, where they say specifically that this was built on Postgres, but I am inferring that.

Pricing starts at $0.85 per hour but with reserved pricing, you can get that down to $0.228 per hour. That brings it down to sub-$1000 per year. You just can’t compete with this on price in your own data center.

IF you want to scale to petabyte, you need to have petabyte in place. In your data center, that is going to cost you a fortune. Once again, AWS takes the first step into moving an entire architecture into the cloud. Is anyone else offering anything close to this?  I guess Oracle’s cloud offering is the closest, but, as far as I know, they are not promoting warehouse size instances yet.

Did I say it’s scalable?

Scalable – With a few clicks of the AWS Management Console or a simple API call, you can easily scale the number of nodes in your data warehouse up or down as your performance or capacity needs change. Amazon Redshift enables you to start with as little as a single 2TB XL node and scale up all the way to a hundred 16TB 8XL nodes for 1.6PB of compressed user data. Amazon Redshift will place your existing cluster into read-only mode, provision a new cluster of your chosen size, and then copy data from your old cluster to your new one in parallel. You can continue running queries against your old cluster while the new one is being provisioned. Once your data has been copied to your new cluster, Amazon Redshift will automatically redirect queries to your new cluster and remove the old cluster.

Redshift is SQL bases so you can access it with your normal tools. It is fully managed so backups and other admin concerns are automatic and automated. I’m not sure what tools you can use to design your database schemas. Since the database supports columnar data stores, I’m not sure what tools will build the tables. Your data is replicated around multiple nodes so your tool would need to be aware of that also.

You can also use Amazon RDS, map reduce or DymanoDB to source data. You can also pull data directly from S3. All in all, I’m pretty excited to see this offering. I hope I get a client who wants to take a shot at this. I like working on AWS anyway but I would love to work on a Redshift gig.




MySQL in Spaaaaaace – Amazon Relational Database Service (RDS)

October 27th, 2009 Comments off

Yep, looks like Amazon finally clued in to the fact that SimpleDB is pretty much useless for any mission critical work. They’ve added a new web services, Relational Database Service, abbreviated RDS.

Amazon Relational Database Service (Amazon RDS) is a web service that makes it easy to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity while managing time-consuming database administration tasks, freeing you up to focus on your applications and business.

Amazon RDS gives you access to the full capabilities of a familiar MySQL database. This means the code, applications, and tools you already use today with your existing MySQL databases work seamlessly with Amazon RDS. Amazon RDS automatically patches the database software and backs up your database, storing the backups for a user-defined retention period. You also benefit from the flexibility of being able to scale the compute resources or storage capacity associated with your relational database instance via a single API call. As with all Amazon Web Services, there are no up-front investments required, and you pay only for the resources you use.

This is pretty slick. I haven’t played with it yet as it was just announced but it seems to be an API driven mysql instance. For slightly more than a base instance, 0.11/hour RDS vs 0.10/hour base EC2 (this price is dropping 15% BTW) on a small server, you get a complete server with MySQL installed. You can create and manage your database instances via procedural call (the API) and you can scale to larger instances or additional storage fairly painlessly by also using those APIs. You also pay extra for your storage of course.

That’s about it from what I’ve read. I don’t see any automated replication (beyond the normal AWS safety features) and I don’t see any kind of clustering or sharding. This is not what most people would call a cloud database. It’s just an easy to configure, maintain and grow MySQL server. Not that that’s bad. For a small business with some technical savvy but not a lot of time, this is an awesome addition to AWS. I would be willing to bet that some kind of clustering will come, sooner or later.

Ooops, just stumbled across:

Coming Soon: High Availability Offering — For developers and business who want additional resilience beyond the automated backups provided by Amazon RDS at no additional charge. With the high availability offer, developers and business can easily and cost-effectively provision synchronously replicated DB Instances in multiple availability zones (AZ’s), to protect against failure within a single location.

One of the things I have always liked about AWS is that they really do make it simple. For the uses cases where SimpleDB is appropriate, using it is a no brainer, as is EC2 and S3. AWS even makes queuing simple. RDS keeps to that methodology.

Amazon RDS allows you to use a simple set of web services APIs to create, delete and modify relational database instances (DB Instances). You can also use the APIs to control access and security for your instance(s) and manage your database backups and snapshots. For a full list of the available Amazon RDS APIs, please see the Amazon RDS API Guide. Some of the most commonly used APIs and their functionality are listed below:

CreateDBInstance — Provision a new DB Instance, specifying DB Instance class, storage capacity and the backup retention policy you wish to use. This one API call is all that’s needed to give you access to a running MySQL database, with the software pre-installed and the available resource capacity you request.

ModifyDBInstance — Modify settings for a running DB Instance. This lets you use a single API call to scale the resources available to your DB Instance in response to the load on your database, or change how it is automatically backed up and maintained on your behalf.

DeleteDBInstance — Delete a running DB Instance. With Amazon RDS, you can terminate your DB Instance at any time and pay only for the resources you used.

CreateDBSnapshot — Generate a snapshot of your DB Instance. You can restore your DB Instance to these user-created snapshots at any point, even to reinstate a previously deleted DB Instance.

RestoreDBInstanceToPointInTIme — Create a new DB Instance from a point-in-time backup. You can restore to any point within the retention period you specified, usually up to the last five minutes of your database’s usage.

This is a very cool addition to AWS. I am looking forward to playing with it. It’s important to note that if you are capable of administering your own server and database, you can save money by running a base EC2 instance and DIY. If you want to run any database other than MySQL, you have to do that anyway.


Using and Managing Amazon Web Services (AWS) – Part 1

April 28th, 2009 Comments off

Using and Managing Amazon Web Services (AWS)

I personally believe that AWS is perfect for any development and testing environment. Regardless of how sensitive your data is, you can build your applications and test them in a cloud environment using bogus data.

For production environments, the choice is much harder. Does the country(ies) you operate in have strict privacy, or data on-shoring, laws that would be impact your applications? If you can easily offshore your applications, you can easily use cloud computing.

Does the area where you work have reliable infrastructure? It doesn’t matter if Amazon has 99.99% uptime if your provider is down 50% of the time. You can easily use something like replication and keep a copy of your application’s data within your own data center but if you make that investment, do you really want to run anything in the cloud.

My suggestion to get started would be to use AWS to host a development effort first. Get comfortable with the quirks and gotchas of remote applications. Familiarize yourself with the additional security you will need when running in the cloud. Look at encrypting your data on disk. Amazon will encrypt the data as it travels over the wire.

The need for system administrators and DBAs does not go away by moving to the cloud. It really doesn’t change their jobs much at all. Most modern admins rarely touch the hardware directly anymore, anyway.

Once you’ve decided that it is for you and you have chosen your pilot project, you will need to take the actions described below.

A note to remember as you are working through this book. You only pay for what you use. When you run an instance, you pay for the CPU time that you use. When you use S3 or EBS, you pay for storage (and bandwidth in S3). You pay for Elastic IPs only if you allocate one and don’t attach it to a running instance.

Technorati : , , , , ,

Amazon Web Services – SimpleDB Overview

April 22nd, 2009 1 comment


SimpleDB was Amazon’s first available (in beta) web service. It is a neat feature but it has its downsides. First, SimpleDB is not a relational database. It is a name/value key pair. For simple lookups, it is highly reliable and scalable. For anything more complicated, it falls apart.

SimpleDB is not ACID compliant and has not referential integrity. For that matter, it has not schemas, tables or relationships. Amazon says that it “eliminates the administrative burden of data modeling”. Some things make me say, “Hmmmmm.”

SimpleDB structures data somewhat like a spreadsheet. Think of columns across and values down. A particular column can have multiple values. I provide an example of SimpleDB data in Chapter 6.

Like everything else in AWS, SimpleDB is API based. There is no SQL access here. The APIs are very simple to use: CREATE creates a new domain (worksheet), you can GET, PUT and DELETE items (columns) and values (data), QUERY data or QUERYWITHATTRIBUTES (meta data).

Amazon does have a query language but it is strictly string based. You enter a key value (a key being the name of one of your key/value pairs) and then list possible values. There are simple operators that you can use.

SimpleDB is designed to store small volumes of data and is optimized for that. Amazon recommends that large files be stored in S3 and the pointer to those files stored in SimpleDB.


You pay for three things with SimpleDB: machine usage (executing queries), data transfer and persistent storage.

Machine usage is based on the requests made and the amount of time it takes to satisfy those requests. The CPU is based on the same criteria as an EC2 compute unit. It costs $0.14 per machine hour utilized. You start with 25 machine hours for free and start paying at the 26th hour.

Persistent storage was $1.50 per GB until Dec 2008. That was much more expensive than S3 or EBS. In late 2008, Amazon lowered the costs to a more reasonable $0.25 per GB. That is a significant change.

Data transfer is comparable to the other services: Data transfer in is $0.10 per GB, first 10TB out is $0.17, $0.13 for the next 40TB, $0.11 for the next 100TB and $0.10 for all data over 150TB.

For a limited time, at least until June 2009, the first 25 CPU hours and 1GB per month are free. This is designed to give people a chance to try out the service.

As a database guy, SimpleDB is a non-starter for me. It’s easy enough for me to install MySQL or Postgres (for free) or Oracle (if I want to pay for it) and scale those to almost ridiculous levels. SimpleDB does not provide the transactional consistency required for transaction processing (OLTP) not does it provide the access paths or any of the key features (except maybe partitioning) required in OLAP processing.

These prices are accurate as of the time of writing them. As always, verify before making a decision.

Technorati : , , , , ,