Thesis Kurt Vermeersch: October 2010

Sunday, October 31, 2010

OVMP Software

I've sent an e-mail to the writers (Sivadon Chaisiri, Bu-Sung Lee and Dusit Niyato) of the paper titled "Optimal Virtual Machine Placement across Multiple Cloud Providers" (abstract can be found here) with the following question: " I was wondering whether the implementation of the OVMP algorithm has been made available somewhere? I would be interested in trying to understand your research better, by having a look at the software that accompanies it." I will update this blogpost whenever I get a response.

I got a reply with a sample code of the OVMP algorithm as attachment, this can be downloaded here. It is the OVMP algorithm (written in GAMS script and be solvable in NEOS Server or CPLEX). It deals with the demand uncertainty described by a normal distribution. I tried to run it using this software and the CPLEX solver, but since I do not own a GAMS license I got 'THE MODEL EXCEEDS THE DEMO LIMITS' as a result.

Statistical Analysis EC2 Spot Pricing Evolution

First of all I'm sorry for the lack of updates the last week (my sickness had something to do with it). But now I'm fully focusing on my thesis again.

The different scenarios of dividing the spot prices I'm having a look at are:

Average per day for the last 2 months
Average per week
Average per day of the week
Average per hour of the day
Average for the day vs the night

After generating the required graphs (also blox pots cfr what is it) for these scenarios (all output by the way can be downloaded here), I started having a look at how this data (or statistics extracted from this data) could be used to perform intelligent capacity planning.

The statistical values I store for every scenario (and this for each os-region-instance type combination possible) are:

Kurtosis: "it is a measure of the 'peakedness' of the probability distribution of a real-valued random variable. Higher kurtosis means more of the variance is the result of infrequent extreme deviations, as opposed to frequent modestly sized deviations."
Skewness: "it is a measure of the asymmetry of the probability distribution of a real-valued random variable. The skewness value can be positive or negative, or even undefined. Qualitatively, a negative skew indicates that the tail on the left side of the probability density function is longer than the right side and the bulk of the values (including the median) lie to the right of the mean. A positive skew indicates that the tail on the right side is longer than the left side and the bulk of the values lie to the left of the mean. A zero value indicates that the values are relatively evenly distributed on both sides of the mean, typically but not necessarily implying a symmetric distribution."
Arithmetic mean: "it is often referred to as simply the mean or average when the context is clear, is a method to derive the central tendency of a sample space."
Geometric mean: "it indicates the central tendency or typical value of a set of numbers. It is similar to the arithmetic mean, which is what most people think of with the word "average", except that the numbers are multiplied and then the nth root (where n is the count of numbers in the set) of the resulting product is taken."
Number of Values: "it is an indication of the number of price changes."
Maximum: "it is the greatest value in the set."
Minimum: "it is the smallest value in the set."
Percentile: "it is the value of a variable below which a certain percent of observations fall. The 25th percentile is also known as the first quartile (Q1); the 50th percentile as the median or second quartile (Q2); the 75th percentile as the third quartile (Q3)."
Variance: "it is used as one of several descriptors of a distribution. It describes how far values lie from the mean."
Standard deviation: "it is a widely used measurement of variability or diversity used in statistics and probability theory. It shows how much variation or 'dispersion' there is from the 'average' (mean, or expected/budgeted value). A low standard deviation indicates that the data points tend to be very close to the mean, whereas high standard deviation indicates that the data is spread out over a large range of values."

I'll explain all my findings in the document I'm working on that summarizes the research I've done and tells how I think to extract information from it that can be helpful to make an intelligent capacity planner. The project that I used to generate the graphs and statistical ouput is called 'SpotPricing' and can be found on my SVN location or here.

Monday, October 25, 2010

Port Spotmodel to Java

This weekend I was feeling a bit sick, so I couldn't do that much. But I was in a programming mode and decided to port the Spotmodel software that accompanied the 'Decision Model for Cloud Computing under SLA constraints' paper from C to Java. The original source code can be found here and the newly created Java project here. The code is also available though SVN, for which I'll mail you the credentials. Note that the code should still undergo some design changes (for the moment it needs a lot of memory for example) and javadoc comments should be added.

Thursday, October 21, 2010

On-Demand vs Spot EC2 Instances

I searched the web to find some differences between spot and on-demand instances, and the issues that rise when making a choice between those two:

Found on this website: Wouldn’t it be the cheapest to choose only spot instances and set a price equal to the on-demand instance prices? In theory, our pricing should never be worse than what we pay currently and would most likely be much better. The problems with these techniques: all your instances may get terminated at the same time as the price rises above on-demand (which it already did).

Found in this article "An introduction to Spot Instances" by Amazon: it is a tutorial that tells you step-by-step how to use spot instances through the AWS Management Console. It states as well for what kind of applications (with concrete examples) spot instances are useful:
Spot Instances are well suited to a number of different kinds of applications. In general, they can save you money if your application doesn’t depend on instances being started immediately, can start and perform useful work without manual intervention, and can resume work after being interrupted.
Finally, there are a number of best practices that are good to keep in mind when running applications on Spot Instances in order to minimize the potentials impact from your instance being interrupted: save your work frequently (add checkpoints/split up your work), test your application, minimize group instance launches, track when spot instances start and stop and access large pools of compute capacity.
Found on this website: "I think this means that over time people will realize that spot instances are feasible for a wide variety of workloads. There will be a psychological barrier for a while, but it will be overcome eventually." Why not just bid the on-demand price? "The fact that Amazon controls the spot price, still doesn't change the fact that unless there is a capacity problem with on-demand, people don't have any reason for bidding more than the on-demand price." "If everyone adopted your idea and bid at the on-demand price, that would invariably drive the spot pricing up to equal the on-demand... and then what have we gained?" "AWS spot price *may* be determined by supply and demand, but doesn't have to be. This is because there is only one seller."

I also read the following paper: "Optimal Virtual Machine Placement across Multiple Cloud Providers" by Chaisiri S., Lee B. and Niyato D. (I have to admit I did not yet fully understand the mathematics behind the presented model):

ABSTRACT Cloud computing provides users an efficient way to dynamically allocate computing resources to meet demands. Cloud providers can offer users two payment plans, i.e., reservation and on-demand plans for resource provisioning. Price of resources in reservation plan is generally cheaper than that in on-demand plan. However, since the reservation plan has to be acquired in advance, it may not fully meet future demands in which the on-demand plan can be used to guarantee the availability to the user. In this paper, we propose an optimal virtual machine placement (OVMP) algorithm. This algorithm can minimize the cost spending in each plan for hosting virtual machines in a multiple cloud provider environment under future demand and price uncertainty. OVMP algorithm makes a decision based on the optimal solution of stochastic integer programming (SIP) to rent resources from cloud providers. The performance of OVMP algorithm is evaluated by numerical studies and simulation. The results clearly show that the proposed OVMP algorithm can minimize users’ budgets. This algorithm can be applied to provision resources in emerging cloud computing environments.

Wednesday, October 20, 2010

Decision Model for Cloud Computing under SLA Constraints

Today I had a look at the software that comes with the "Decision Model for Cloud Computing under SLA Constraints" paper by Andrzejak A., Kondo D. and Yi S. The implementation of the presented model can be found on the project website.

The paper describes its contribution as follows:

Our main contribution is a probabilistic model that can
be used to answer the question of how to bid given SLA
constraints. A broker can easily apply this model to present
automatically to the user a bid (or set of bids) that will meet
reliability and performance requirements. This model is particularly
suited for Cloud Computing as it is tailored for environments
where resource pricing and reliability vary significantly
and dynamically, and where the number of resources allocated
initially is flexible and is almost unbounded. We demonstrate
the utility of this model with simulation experiments driven
by real price traces of Amazon Spot Instances, and workloads
based on real applications.

So, I had a look at the source file, they implemented the simulation process in one C-file. It took me a while to completely understand the code, especially the implementation of the actual simulation in the methods simulateOptimalCkpt() and simulateHourCkpt() was pretty hard. But now I'm sure this code can be used to develop a broker application (and it can be easily ported to another language). Note that other checkpointing schemes can easily be added since the required methods can be found in the source code that comes with the "Reducing Costs of Spot Instances via Checkpointing in the Amazon Elastic Compute Cloud" paper on this website.

The following quote from the paper describes how the output of the program can be used to make an intelligent (task is done before the given deadline and within the foreseen budget) bid for the spot price:

To find the optimal instance type and bid price, we compute ET(cdead) and M(cB) and check the feasibility (as stated in section 3D) for all relevant combinations of both parameters. As these computations are basically “look-ups” in tables of previously computed distributions, the processing effort is negligible. Among the feasible cases, we select the one with the smallest M(cB); if no feasible cases exist, the job cannot be performed under the desired constraints.

Their simulations resulted in some interesting findings found in the summary of the results in the paper (section 4F). Note that constraints on other random variables than ET and M can be introduced as well.

How do I see the broker application at the moment? It takes as input a collection of workloads and some properties and constraints should be provided for each workload. For a workload should first be determined on what kind of instance it should be executed (maybe in a first iteration of the software with simple CPU, Memory, ... usage thresholds). Then a division between reserved and other instances should be made (based on the on-demand price, since the spot price fluctuates too much). Then finally the algorithms provided in the paper can be used to determine whether the constraints can be met by using spot instances. The bid price and checkpointing scheme that should be chosen for the spot instances is determined when this is the case. Otherwise on-demand instances can be used. I'm thinking the broker should be a Java webservice that periodically downloads the spot price history and then reruns the simulations to update its data.

Monday, October 18, 2010

What's next?

At the moment I'm making a comparison between spot and on-demand instances: looking for some interesting facts/sources on the web. Afterwards I'll have a look again at the "Decision Model for Cloud Computing under SLA Constraints" paper, to make a decision about how feasible it is to implement the presented model.

Spot Instances in Different Regions Comparison

After calculating the average spot instance price, I made an overview of the spot, on-demand and reserved instance prices and this for all different geographical regions. Also the prices for storage in these regions are included in the overview, which can be found here. We notice that for all instances except for a standard large linux instance the US East region is the cheapest one. For storage only the US West region is a bit more expensive than the other regions. For data transfer on the other hand, the Asia region is the most expensive one. For every region I made a graph (they can be found here) that plots all the different prices, an example is given here:

Afterwards I had a look at the optimal division between reserved and spot instances, but without taking into account more properties of using spot instances (such as the need for checkpointing, ...). Using spot instances seemed to be cheaper than using reserved instances, such that the optimal division is using a 100 percent spot instances.

During the thesis meeting last week another question was raised: whether it is cheaper to under- or overestimate the number of required Reserved instances to reach an optimal division? While trying some different workloads in the earlier created Excel sheet, I found out this (as one would expect) depends on the workload. I tried to find a formula that expressed whether for a certain workload it would be better to take one less or one more reserved instance. Of course this is a bit contradictious, since you could determine the optimal amount of reserved instances if the workload is known. But maybe the found tipping point could tell us something, my findings can be found here.

Saturday, October 16, 2010

Amazon Spot Instance Pricing [CloudExchange.org]

After getting some advice, I made some new graphs from the cloudexchange input data that:

plots the average spot price for each day of the week (for every instance type)
plots the average spot price during each hour of the day (for every instance type)

A first example of this new output for a us-east-1.linux.c1.xlarge spot instance:

And as a second example the output for a us-east-1.linux.m1.xlarge spot instance:

We notice that some spot instances have a rather fluctuating price during the day and over the days of the week, while others do not show this behavior. This updated version of the software can be found here. And all the generated graphs can be downloaded from this location.

Afterwards I made a little program that takes the CSV files found on cloudexchange.org as input and calculates what the average spot instance price has been between two given dates for each instance type. This program can be downloaded here. The output is written to a new CSV file, for the Linux instances a PDF version of this file can be found here.

Thursday, October 14, 2010

Region Comparison

I had a look at the pricing of EC2 instances for the different geographical regions. Of course not only the money cost should be considered when running an application in the cloud but also the latency the servers in these different locations cause (but you can think of applications that do not care). The prices of the different locations can be found on Amazons website and are presented by me here. A first remark is that only the US East region does offer cluster compute instances. I investigated two different things about the pricing in the different regions:

What is the cheapest location to run a certain instance? The results can be found here. We notice that for Reserved instances US - N. Virginia is always the cheapest solution. For on-demand instances it is as well, but for Windows instances the EU and Asia region offer the same price.

How does the price of reserved instances vs on-demand instances compare on these different locations? The results can be found here. We notice that except for Micro instances US East always offers the most advantageous reserved instances. For Linux instances we see no difference between the regions (both for the 1 and 3-year terms). For Windows instances on the other hand we notice that Reserved instances are advantageous after a shorter amount of time in the US regions. It is also remarkable that Micro instances are the only ones that are more quickly advantageous in all regions but the US East one.

Remark that all of this can be found here (an Excel sheet).

Monday, October 11, 2010

Reserved vs On-Demand Instances

On Amazon EC2 these terms have the following meaning:

Reserved instances:
They give you the option to make a low, one-time payment for each instance you want to reserve and in turn receive a significant discount on the hourly usage charge for that instance.
On-Demand instances:
They let you pay for compute capacity by the hour with no long-term commitments. This frees you from the costs and complexities of planning, purchasing, and maintaining hardware and transforms what are commonly large fixed costs into much smaller variable costs.

I had a look today at the problem of finding the optimal division between Reserved and On-Demand instances. With the optimal solution being the one with the smallest total cost. My findings (in an Excel file) can be found here. Or I put them in two pdf files as well:

General Pricing Information: download here
Practical Example: download here

Sunday, October 10, 2010

Amazon Spot Instances Pricing [CloudExchange.org]

On cloudexchange.org we find the history of the price of the different Amazon Spot instances, this data is exporteable as an CVS file.
I made a little Java program (download here) that takes this file as input and generates the following graphs:

one containing the average price for the spot instance during each day

one with a line representing the average price during the day and a line for the average price during the night (0:00-8:00AM)

On the following graph we notice that for the corresponding spot instance (us-east-1.linux.m2.4xlarge) the average day price does not have peaks during the weekends (2 units on the rights of every vertical grid line), which seems logical.

For the same instance I then divided this graph in the day and night parts, but it seems there is no remarkable difference in price.

Papers

I read these 3 papers during the weekend:

"Exploiting Non-Dedicated Resources for Cloud Computing" by Andrzejak A., Kondo D. and Anderson D.P.:
ABSTRACT Popular web services and applications such as Google Apps, DropBox, and Go.Pc introduce a wasteful imbalance of processing resources. Each host operated by a provider serves hundreds to thousands of users, treating their PCs as thin clients. Tapping the processing, storage and networking capacities of these non-dedicated resources promises to reduce the size of required hardware basis significantly. Consequently, it presents a noteworthy opportunity for service providers and operators of cloud computing infrastructures. We investigate how a mixture of dedicated (and so highly available) hosts and non-dedicated (and so highly volatile) hosts can be used to provision a processing tier of a large-scale web service. We discuss an operational model which guarantees long-term availability despite of host churn, and study multiple aspects necessary to implement it. These include: ranking of non-dedicated hosts according to their long-term availability behavior, short-term availability modeling of these hosts, and simulation of migration and group availability levels using real-world availability data from 10,000 non-dedicated hosts. We also study the tradeoff between a larger share of dedicated hosts vs. higher migration rate in terms of costs and SLA objectives. This yields an optimization approach where a service provider can find a suitable balance between costs and service quality. The experimental results show that it is possible to achieve a wide spectrum of such modes, ranging from 3.6 USD/hour to 5 USD/hour for a group of at least 50 hosts available with probability greater than 0.90.
"Reducing Costs of Spot Instances via Checkpointing in the Amazon Elastic Compute Cloud" by Andrzejak A., Kondo D. and Yi S.:
ABSTRACT Recently introduced spot instances in the Amazon Elastic Compute Cloud (EC2) offer lower resource costs in exchange for reduced reliability; these instances can be revoked abruptly due to price and demand fluctuations. Mechanisms and tools that deal with the cost-reliability trade-offs under this schema are of great value for users seeking to lessen their costs while maintaining high reliability. We study how one such a mechanism, namely check pointing, can be used to minimize the cost and volatility of resource provisioning. Based on the real price history of EC2 spot instances, we compare several adaptive check pointing schemes in terms of monetary costs and improvement of job completion times. Trace-based simulations show that our approach can reduce significantly both price and the task completion times.
"Decision Model for Cloud Computing under SLA Constraints" by Andrzejak A., Kondo D. and Yi S.:
ABSTRACT With the recent introduction of Spot Instances in the Amazon Elastic Compute Cloud (EC2), users can bid for resources and thus control the balance of reliability versus monetary costs. A critical challenge is to determine bid prices that minimize monetary costs for a user while meeting Service Level Agreement (SLA) constraints (for example, sufficient resource availability to complete a computation within a desired deadline). We propose a probabilistic model for the optimization of monetary costs, performance, and reliability, given user and application requirements and dynamic conditions. Using real instance price traces and workload models, we evaluate our model and demonstrate how users should bid optimally on Spot Instances to reach different objectives with desired levels of confidence.

It took a while (I read some parts more than 3 times) to understand everything the papers were talking about, but now I think I get what they are saying. And the presented models sure will be a good extension to the model I'm creating.

Wednesday, October 6, 2010

Cloud Computing General [B]

During the first week I tried to get familiar with the cloud computing principles again, by doing the following:

Reading "The Big Switch" by Nicholas Carr. I think the following quote (from Booklist) describes the book rather well:
"Carr examines the future of the Internet, which he says may one day completely replace the desktop PC as all computing services are delivered over the Net as a utility, the Internet morphing into one giant 'World Wide Computer.' ... Carr warns that the downside of the World Wide Computer may mean further concentration of wealth for the few, and the loss of jobs, privacy, and the depth of our culture." The book is easy to read and gives some good insights and background information about the shift to utility/cloud computing.

Following the one hour webinar provided by VMware: "The New Era of IT: the VMWare Cloud Application Platform" (http://goo.gl/WBsT) which was not very technical but talked about VMwares cloud computing strategy. They want their developers to keep on using the tools they are used to (to build Java applications): the Spring framework (by the way I got a chance to get familiar with this framework during my summer job at Technicolor). These apps make use of the VMware vFabric platform. The talk also provided a sample hotel room booking application, which was used to illustrate some of the cloud computing principles (eg scaling up/down). More information about the platform can be found here

PS: I literally was in the clouds this week: I made a helicopter flight above Antwerp (http://twitpic.com/2u3x9z).

Thesis Kurt Vermeersch