Wednesday, December 8, 2010

OVMP Analysis

Today I had another look at the "Optimal Virtual Machine Placement across Multiple Cloud Providers" paper and accompanying GAMS software. This approach to getting an optimal solution (lowest cost) for scheduling tasks on a cloud environment has some advantages:
  • Finding a solution using a solver is rather quick.
  • This approach gives an optimal solution, while my heuristic gives a suboptimal solution (for workload model 2). But the question is whether we can describe the model as realistic here.
  • In this paper they manage to determine a solution while there is uncertainty of demand and prices. The workload demand at a certain moment is a probabilistic given, and the on-demand price will change during the simulation with a certain probability.
Because of these advantage it seemed like a good idea to have another look at it, and determine whether there description of a situation is realistic and whether by extending certain variables this model can be made more realistic.

So, I read the paper again and tried to fully understand the GAMS source code. An overview of this analysis follows here. The source code of the file we'll be discussing can be downloaded here. Found a webpage from where this script can be committed as a job to the Neos server here. The first remarkable thing is that only 3 types of tasks/VM classes were used in the simulation with the following resource demand.
To make it more realistic there should be more possible tasks with a probability distribution specifying which tasks should be used more/less often (to get a real life load model).

And 4 instance types were offered (from different cloud providers) with their characteristics specified as in the following table. Since we would use this to schedule tasks and determine an optimal division between reserved and on-demand instances we probably only need one instance type here (the one on which we'll schedule the given tasks).

The resource provisioning happens in three phases:
  • Reservation of a number of resources.
  • Determine the actual usage of these resources.
  • Determine the on-demand resources (which will be all the resources that are still needed after provisioning the reserved resources).
These 3 phases all have an associated cost which will be shown in the next table. Note that generally the reserved and usage cost together will still be smaller than the on-demand cost for the resource.

To make the model more simple, we could eliminate some of these costs (and for example only keep the CPU-time costs, since this one is the one that really matters in my research). This will make finding the solution quicker and possibly allow us to extend this model to a lot of different tasks and a longer period of time. Note that the period in which the VMs are scheduled at the moment is a maximum of 1200 hours which equals 50 days. It would become interesting when this could be extended to a year (which for Amazon equals 8736 hours). You could find it strange that different physical machines with the same characteristics have different prices (for example the reserved price of machine 1 vs machine 2), but this is possible since in this model they represent instances from different cloud providers.

The uncertainty about the price can be found in the parameters that depend on the M set, we notice that with a probability of 70 percents the price stays the same and there is 30 percents chance that the on-demand price increases. It does not increase a little, it is multiplied by 3. The following image illustrates this. This does not seem to be realistic, you can see in my previous blogpost that for EC2 this chance is a lot smaller and the price actually decreases. Here we use it as penalty, to represent the change in price difference between on-demand and reserved resources. We noticed however that they change about the same amount and changes are immediate for both kinds of resources. So bringing this into account is probably not realistic, it would be better to bring a price decrease for both on-demand and reserved instances into account.

In this document the different elements that make up the formulas for the objective (total cost) and the constraints will be explained.

Finally I tried to adapt the GAMS source file in such a way that I would optimize (find the solution with minimal cost) the problem that tasks distributed by the given function were scheduled over a number of instances with the same characteristics. Such that an instance only became reserved when enough of its hours are used during the time period of one year. Prices could change as well in this model (different probability for on-demand as for reserved resources to decrease their price), but only the price of CPU-time would be taken into account. I however have still trouble to get the constraint in there for an instance to only be reserved when it is loaded enough. The current GAMS file can be downloaded here.

A good guide to using GAMS can be found here by the way.

Sunday, December 5, 2010

EC2 On-Demand/Reserved Pricing History

Amazon EC2 instance prices are not fixed, in a way that Amazon has the ability to change these prices at any given moment in time. It should not come as a surprise that prices have decreased over the last couple of years. This is a rather natural phenomenon ... since the instances that existed back then do still have the same specifications, but their hardware cost have decreased dramatically. So, Amazon is able to keep being competitive by lowering its prices. When making a division between the different instance types, this possibility should be taken into account. If one should pay the reserved instance rate for the rest of the period the instance was rented, the price change that is predicted could be taken into account as a discount on the on-demand prices. To be able to make such conclusions one should have a look at the price decreases that happened in the past, so I'll give a little history overview of the EC2 instances and their pricing.

The history of EC2 and its instances is given in a chronological order, when prices did change they will be presented as well:
  • [24/08/2006] Amazon Elastic Compute Cloud beta is announced. There was only one instance type (1.7GHz Xeon processor/1.74 GB of RAM) available for $0.10 per hour.

  • [22/10/2007] EC2 in unlimited beta and new instance types are announced.

  • [29/05/2008] High-CPU instances announced.

  • [23/10/2008] EC2 exits beta and offers SLA with commitment of 99.95% availability within a region.
  • [23/10/2008] EC2 instances running Windows Server available.

  • [10/12/2008] the European Region is added.
  • [08/01/2009] the AWS management console is launched.
  • [03/03/2009] Windows instances announced for the EU region.
  • [12/03/2009] EC2 introduces Reserved instances

  • [15/04/2009] Reserved instances now available in Europe.
  • [20/08/2009] prices of Reserved instances are decreased.

  • [30/09/2009] distinction between instances running Windows and instances running Windows with Authentication Services is removed (the price of the general Windows instances is put in place).
  • [27/10/2009] new High-Memory instances announced.

  • [27/10/2009] all other on-demand prices are lowered with up to 15%

  • [03/12/2009] Northern California region launched.
  • [14/12/2009] Amazon EC2 Spot instances announced.
  • [23/02/2010] Extra Large High Memory instances get introduced.

  • [23/02/2010] EC2 instances with Windows now available.

  • [29/04/2010] Asia Pacific (Singapore) Region announced
  • [13/07/2010] Cluster Compute instances announced.

  • [01/09/2010] Lower prices for High Memory Double and Quadruple XL instances.

  • [09/09/2010] Micro instances are announced.

  • [21/10/2010] AWS Free Usage Tier introduced.
  • [15/11/2010] Cluster GPU instances announced.

  • [03/12/2010] Free monitoring of CPU, disk and network performance metrics through CloudWatch is introduced.
Your hourly rate for reserved instances is changed as well when prices are changed, which can be seen in the official press release of the price reduction that happened in september.
Effective September 1, 2010, we've reduced the On-Demand and Reserved Instance prices on the m2.2xlarge (High-Memory Double Extra Large) and the m2.4xlarge (High-Memory Quadruple Extra Large) by up to 19% and the Reserved Instances by up to 17%. If you have existing Reserved Instances your hourly usage rate will automatically be lowered to the new usage rate and your estimated bill will reflect these changes later this month.
You however still paid the full one-time fee, so these possible changes could still be taken into account. We notice however that not that many price reductions have happened in the EC2 history. Most of the new prices were made when a new instance type was introduced. The only real general price reduction happened in October 2009, when all on-demand prices were lowered up to 15%. But prices will probably have to change more often in the future since market pressure will rise.

Saturday, December 4, 2010

White Paper: Analytics for Internal Cloud Management

This paper describes using a high-level approach the resource provision model that will be needed in a cloud computing environment. Some snippets from the paper are quoted beneath.
For many organizations, cloud is more of a business change than a technical change, as it alters the way departments interact and the way costs are allocated. Cloud computing enables IT departments to disintermediate themselves from the day-to-day process of providing access to applications, software platforms and IT infrastructure. Instead it allows them to focus on aligning supply and demand, and efficiently provisioning infrastructure in a way that bridges the gap between capex-oriented procurement and opex-oriented consumption. From an IT consumer perspective, it puts control back in their hands by allowing them to respond to their business needs more quickly, while isolating them from the arcane business of buying, installing and managing IT infrastructure.
The addition of specific automation, analytics and cost allocation models is key to making a cloud a cloud, as it enables the self-service model while at the same time assuring infrastructure is managed as efficiently as possible.
Rather than using old-school capacity management approaches that focus on trend and threshold models, what is required is a new focus on workload placement and resource allocations. In this new model, workload placements dictate the optimal use of capacity, much like in the game of Tetris™.
In this cloud computing environment it is important to do strategic optimization, which is a proactive, long-term placement of the resources based on supply and demand. There exist two possible consequences of the shift to this form of resource allocation:
  • Over-provisioning: which leads to resources with a low level of utilization, so this means we are wasting money.
  • Under-provisioning: which leads to operational risks, so the capacity cannot meet the demand on some occasions.
This is illustrated by the following graph:


The corresponding white paper can be downloaded here and from the original site.

Scheduling of Tasks

A distinction is made in the EC2 Broker application between two workload models:
  • In the first workload model a task has a deadline and a certain amound of VM hours needed to execute the task, these hours can be divided over time such that it best fits the scheduler (more freedom to determine the number of VMs used in one hour).

  • The second model has tasks specified with a deadline and a certain length, but at every hour is also specified how many VMs are needed to execute the task. When a tsk is generated using the task generator of the application, a file specifying the workload of the model is also generated. The ratio of EC2 compute instances needed for a certain instance type is used to determine the number of VMs needed. The workload generator now takes an upper and lower boundary and lets the workload fluctuate between these two according to the EC2 compute instances associated with the corresponding instance type. A snippet from a CSV file containing such a load description is given here.


Workload Model 1 (total VM hours specified)
To determine the optimal division between reserved and on-demand instances it is important to schedule the tasks/loads in such a way that the VMs are in use as much of the time as possible. Scheduling the tasks was rather easy for the second type of workload, so I'll first have a look at this workload model, where a task consist of a certain amount of VM hours that need to be executed before a deadline. When no scheduling is performed this would be an example of a possible result with a corresponding total price of 619.92 US dollars.




When scheduling is performed we obtain a result with a total price of 516 US dollars, here all 4 instances that are used are reserved instances, while when there was no intelligent scheduler used, there were 3 on-demand instances as well. The scheduling algorithm resulted for this example in a decrease of 16.76 percents of the total cost.




To perform the scheduling for this workload model, the tasks are first sorted in an earliest deadline first manner. Then for every task we start filling the VM hours for every instance from the earliest available VM hour till the current tasks deadline hour. If existing instances are filled till your deadline and you still have VM hours to distribute the needed extra instances are added. This actually yields an optimal scheduling solution such that the right amount of reserved instances will be provisioned. How to determine the boundary (from what amount of VM hour utilization it is cheaper to rent a reserved instance) between a reserved and an on-demand instance was explained earlier on this blog.

Workload Model 2 (every hour #VMs specified)
Getting the scheduling in order for the first workload model was rather difficult. I'll first present the results for this model as well and then describe the scheduling algorithm itself. Using the basic scheduler (which means that all tasks simply start at their starting date) the following result is obtained, with a total cost price of 37509.08 US dollars.




When scheduling is performed we obtain a result with a total price of 34531.56 US dollars, also we only need 27 instances instead of the 39 instances that we needed before. And now 13 instances are reserved ones, before only 11 were reserved instances. The scheduling algorithm resulted for this example in a decrease of 7.94 percents of the total cost.




I first developed a way to optimally schedule these tasks as well, but afterwards I came to the conclusion that the time complexity was too high, so a suboptimal algorithm had to be put in place. I'll first describe the optimal solution that I first tried out. It consisted of first calculating all different divisions of the workload hours over the time available for that task, this is equal to a combination of x out of y with y being the total number of time slots available and x the number of time slots used by the task. Because not all these combination can be stored in memory a combination was only generated at the moment it was used. This was made possible by a combination generator class that was initialized with x and y and could be used to iterate over all the possible combinations. By now determining the optimal combination of task schedules for the different tasks, we could determine the optimal solution. Optimal in this interpreted as the solution that minimizes the number of instances needed. This already makes the solution suboptimal, since this requirement will yield an even distribution in most cases, but it is not necessarily the optimal case. I also had a look at optimizing on how many reserved instances would be needed, this would yield the real optimal solution, I'll explain later why this wasn't a solution as well. To make the algorithm feasible again I decided to use the above algorithm in smaller intervals ... which means dividing the time in intervals and dividing the workload evenly between the intervals. The size of such interval is determined by having a look at how many combinations would be possible within the interval and comparing this with a threshold value that has to be set. Also the deadlines of the tasks will be necessary interval borders. In each interval the above described algorithm was used to determine a suboptimal solution. Now we see that it is impossible to determine how many reserved instances would be needed in total, I tried a scheduler which interpolated the number of hours the instance was needed in the interval to the bigger time frame, but this didn't seem to give better results than the scheduler that minimized the number of instances needed in each interval. Next I tried to minimize the standard deviation in number of instances needed each hour in the interval, this seemed to yield a better result for the example.



There are a lot of things that could still be tried to make this scheduler better, maybe some updates will be posted in the coming weeks/months. By the way, a new version of the EC2 Broker can be found on the SVN repository.

Monday, November 22, 2010

Decision Model Java (Memory Fix + Link EC2 API)

I handled two problems that still existed with the Java port of the spot model software attached to the "Decision Model for Cloud Computing under SLA Constraints" paper.
  • The first problem I attacked is the fact that the software needed a lot of memory. This was the case because it first of all read a couple of 100000 records to memory from the input CSV file to then use this data to do a lot of simulations ... and to finally write all the results from memory back to file. I fixed this issue by adjusting the source code such that only the history spot prices of one instance type is read at once. Since there are no correlations between the data records of different instance types, it is possible to do the simulation (for the different task lengths and different checkpointing schemes) for every instance type/category separately.

  • A second thing that I changed is the input file for this application, previously it took one data.csv file as input containing the spot price history data for every instance type (different columns). This file did not contain any date information, every record contained the price for a minute of time. Now the application takes the CSV files from cloudexchange.org as input, which means there is a separate file for every region-os-instance combination that has two columns: the first one contains the date, the second one the corresponding spot price for that instance at that time. Also, to use up-to-date spot price history information and to ensure that all my applications would remain usable would cloudexchange ever cease operations, I made a link with the EC2 API. So, in a separate project (called AmazonSDK) I created an application that connects with the different Amazon EC2 endpoints and requests the spot price history of the corresponding region. This data is then processed by my application and the output files are organised and formatted the same way as the ones that can be found on cloudexchange.org.
Both projects can be checked out from the SVN repository, but backups can be found here:
  • The Decision Model project here.
  • The AmazonSDK project here.

Wednesday, November 17, 2010

OVMP Software on Neos Solver

I succeeded in running the OVMP software I received (GAMS model named namely StoIP_norm1.gms) on a free solver. By downloading the Neos software from this location (command to run this submission tool: java -cp NeosClien.jar:apache/xmlrpc-1.2-b1.jar NeosClient) and using the following parameters:
  • Server host: neos.mcs.anl.gov
  • Server port: 3332
  • Solver: Mixed integer nonlinearly constrained optimization => AlphaECP:GAMS
The results the solver returned can be found in this pdf.

GAMS Versus AMPL: it looks like a free solver is available as well, called MathProg. The syntax of AMPL is more natural/easier to read, it more closely resembles the algebraic formulations. After reading the first couple of chapters from the "AMPL: A Modeling Language for Mathematical Programming" book by Robert Fourer, David M. Gay and Brian W. Kernighan. Since the GAMS model seems to be pretty easy to understand, I see no problem in translating it to AMPL. I have to admit before stumbling upon these solvers while doing my thesis research, I had no idea about the existence of them ... but now I see the power of these solvers for optimization problems.

Tuesday, November 16, 2010

Decision Model Java Implementation

I made some little changes to the java port of the 'Decision Model ...' paper software. And added Javadoc comments to this source code, which can be checked out from the SVN repository. This software needs a lot of memory because it first of all reads a couple of 100000 records to memory from the input CSV file then uses it to do a lot of simulations ... and finally writes all the results from memory back to file. (Actually this process starts all over for every task length). I think to boost the performance it would not be a bad idea to only take the history spot prices of one instance type as input and do the simulation only for one category/instance type at a time. This would almost make all arrays used in the program a dimension smaller. And it's not that hard to then write something around it that runs the program a couple of times with different input for the different instance types. Also the program still takes its own input file, but this can easily be changed by implementing a data input reader that takes the cloudexchange CSV files as input. A backup of the source code of this program can be downloaded here.

Amazon EC2 News (ctd)

On Novemer 15th Amazon EC2 once again had an EC2 related announcement to make: a new instance type, called "Cluster GPU Instance". The incentive to launch this instance type according to Amazon: "GPUs are increasingly being used to accelerate the performance of many general purpose computing problems. However, for many organizations, GPU processing has been out of reach due to the unique infrastructural challenges and high cost of the technology". For the moment this instance type is only available in the US East (N.Virginia) region (with Unix/Linux) as on-demand and reserved instances. The on-demand version is priced 2.10 dollar per hour, while the reserved instance has a 1-year fixed price of 5630 dollar and a hourly rate of 0.74 dollar. Some further reading can be done on the following locations:
  • This article on Werner Vogels blog gives a good overview of the incredible power of these instances (over a TeraFlop per instance).
  • Amazon makes it possible for anyone to use a supercomputer by offering it on-demand, but is it performance the same as with in-house hardware? It seems to be according to these benchmarks.
  • Nvidea explains its architecture and what makes GPU computing so attractive here.
  • In the official press release some applications that could benefit from this instance type were already mentioned: medical imaging visualization software, financial data analysis and simulation, rendering of sophisticated CGI,...

Monday, November 15, 2010

Prototype Broker for Resource Allocation

I posted earlier what I was doing during the long weekend
I'm working on a prototype (A Broker for Cost-efficient QoS aware resource allocation in EC2) by combining the tools and knowledge I already acquired during the last couple of weeks"

Except for the fact that I keep feeling sick (for a couple of weeks already), everything went okay. The only issue is the actual scheduling in the time of the different tasks, I wonder if I can use some existing deadline scheduler for this. The prototype (it literaly is: e.g. undocumented, overlapping model and view, ... ) now handles everything for reserved and on-demand instances correctly. The next step (after refactoring and implementing a scheduling algorithm) will be to start adding spot instances. I created two programs:
  1. A TaskGenerator that takes some program arguments specifying the task as input and creates a
    CSV file that describes the task and a corresponding CSV file describing the randomly generated workload of the task.
  2. The actual EC2 broker prototype which takes the output of the previous program and some CSV files with the EC2 pricing details as input. I chose to create a Swing GUI that creates Gantt charts containing the scheduled tasks. I also created an overview of the corresponding costs. Some screenshots can be found below.

Overview of the scheduled tasks

Every task has a detailed tooltip

Cost overview

The source code of this prototype can be checked out from the SVN repository (EC2Broker project) or an archive can be downloaded right here.

Thursday, November 11, 2010

What's next?

I'm working on a prototype (A Broker for Cost-efficient QoS aware resource allocation in EC2) by combining the tools and knowledge I already acquired during the last couple of weeks. An update will follow during the weekend.
I also started reading in 'AMPL: a modeling language for mathematical programming', because we decided that it would be a good idea for me personally to learn a bit about solvers.

Thursday, November 4, 2010

Analysis EC2 Spot Pricing

I made a little document that gives an overview of my findings about the output graphs and statistic values of my spot history pricing analysis tool. This document also states how these findings are noticeable in the output values and thus can be used to make the EC2 capacity planner more intelligent. Download the pdf file here.

Tuesday, November 2, 2010

Amazon EC2 News Links

The Amazon Web Services are always evolving, with new instance types being introduced in EC2, pricing models being changed, ... The last couple of weeks some announcements were made:
  • A Free Usage Tier for AWS is introduced: which means that since 1 November 2010 new customers get a certain amount of free services e.g. 750 hours of Micro Linux instances on EC2. More information can be found on the official website.
  • The Amazon S3 prices got an update: in general they decreased and a 1TB tier was added and the 50-100TB tier was removed. The new prices can be consulted on the official website. To conclude this little news post I quote TechCrunch: "Amazon is continuing to drive down the developer costs for storing existing data, emphasizing the advantages of using cloud computing versus a fixed hard drive for storage."
An updated version of my Excel worksheet can be downloaded here.

Monday, November 1, 2010

Comparison Instance Pricing in Regions

I updated the excel worksheet I created to analyse the price differences of the EC2 instances across the different regions. An updated version can be downloaded here, in this spreadsheet some corrections are made and all previous done Excel research is combined in this one file.

I also created a normalised version of the instance price comparison, for the US East region this resulted in this graph:

For the EU region we got the following graph:

What can be concluded from these graphs is:
  • The Cluster Compute Instance is not available in the EU Region. (CloudExchange.org does not provide spot price history for Micro Instances either)
  • In the US East the reserved prices (when assumed that they are bought for a year) are about 65 percent of the on demand prices, while in the EU region this is about 70 percent.
  • Spot prices lay just below 40 percent of on demand prices in the US East region, in the EU region this is just above 40 percent.
  • There is one remarkable price: Standard Large Spot Instances are relatively more expensive in the US East region.

Sunday, October 31, 2010

OVMP Software

I've sent an e-mail to the writers (Sivadon Chaisiri, Bu-Sung Lee and Dusit Niyato) of the paper titled "Optimal Virtual Machine Placement across Multiple Cloud Providers" (abstract can be found here) with the following question: " I was wondering whether the implementation of the OVMP algorithm has been made available somewhere? I would be interested in trying to understand your research better, by having a look at the software that accompanies it." I will update this blogpost whenever I get a response.

I got a reply with a sample code of the OVMP algorithm as attachment, this can be downloaded here. It is the OVMP algorithm (written in GAMS script and be solvable in NEOS Server or CPLEX). It deals with the demand uncertainty described by a normal distribution. I tried to run it using this software and the CPLEX solver, but since I do not own a GAMS license I got 'THE MODEL EXCEEDS THE DEMO LIMITS' as a result.

Statistical Analysis EC2 Spot Pricing Evolution

First of all I'm sorry for the lack of updates the last week (my sickness had something to do with it). But now I'm fully focusing on my thesis again.

The different scenarios of dividing the spot prices I'm having a look at are:
  • Average per day for the last 2 months
  • Average per week
  • Average per day of the week
  • Average per hour of the day
  • Average for the day vs the night
After generating the required graphs (also blox pots cfr what is it) for these scenarios (all output by the way can be downloaded here), I started having a look at how this data (or statistics extracted from this data) could be used to perform intelligent capacity planning.

The statistical values I store for every scenario (and this for each os-region-instance type combination possible) are:
  • Kurtosis: "it is a measure of the 'peakedness' of the probability distribution of a real-valued random variable. Higher kurtosis means more of the variance is the result of infrequent extreme deviations, as opposed to frequent modestly sized deviations."
  • Skewness: "it is a measure of the asymmetry of the probability distribution of a real-valued random variable. The skewness value can be positive or negative, or even undefined. Qualitatively, a negative skew indicates that the tail on the left side of the probability density function is longer than the right side and the bulk of the values (including the median) lie to the right of the mean. A positive skew indicates that the tail on the right side is longer than the left side and the bulk of the values lie to the left of the mean. A zero value indicates that the values are relatively evenly distributed on both sides of the mean, typically but not necessarily implying a symmetric distribution."
  • Arithmetic mean: "it is often referred to as simply the mean or average when the context is clear, is a method to derive the central tendency of a sample space."
  • Geometric mean: "it indicates the central tendency or typical value of a set of numbers. It is similar to the arithmetic mean, which is what most people think of with the word "average", except that the numbers are multiplied and then the nth root (where n is the count of numbers in the set) of the resulting product is taken."
  • Number of Values: "it is an indication of the number of price changes."
  • Maximum: "it is the greatest value in the set."
  • Minimum: "it is the smallest value in the set."
  • Percentile: "it is the value of a variable below which a certain percent of observations fall. The 25th percentile is also known as the first quartile (Q1); the 50th percentile as the median or second quartile (Q2); the 75th percentile as the third quartile (Q3)."
  • Variance: "it is used as one of several descriptors of a distribution. It describes how far values lie from the mean."
  • Standard deviation: "it is a widely used measurement of variability or diversity used in statistics and probability theory. It shows how much variation or 'dispersion' there is from the 'average' (mean, or expected/budgeted value). A low standard deviation indicates that the data points tend to be very close to the mean, whereas high standard deviation indicates that the data is spread out over a large range of values."
I'll explain all my findings in the document I'm working on that summarizes the research I've done and tells how I think to extract information from it that can be helpful to make an intelligent capacity planner. The project that I used to generate the graphs and statistical ouput is called 'SpotPricing' and can be found on my SVN location or here.

Monday, October 25, 2010

Port Spotmodel to Java

This weekend I was feeling a bit sick, so I couldn't do that much. But I was in a programming mode and decided to port the Spotmodel software that accompanied the 'Decision Model for Cloud Computing under SLA constraints' paper from C to Java. The original source code can be found here and the newly created Java project here. The code is also available though SVN, for which I'll mail you the credentials. Note that the code should still undergo some design changes (for the moment it needs a lot of memory for example) and javadoc comments should be added.

Thursday, October 21, 2010

On-Demand vs Spot EC2 Instances

I searched the web to find some differences between spot and on-demand instances, and the issues that rise when making a choice between those two:
  • Found on this website: Wouldn’t it be the cheapest to choose only spot instances and set a price equal to the on-demand instance prices? In theory, our pricing should never be worse than what we pay currently and would most likely be much better. The problems with these techniques: all your instances may get terminated at the same time as the price rises above on-demand (which it already did).

  • Found in this article "An introduction to Spot Instances" by Amazon: it is a tutorial that tells you step-by-step how to use spot instances through the AWS Management Console. It states as well for what kind of applications (with concrete examples) spot instances are useful:
    Spot Instances are well suited to a number of different kinds of applications. In general, they can save you money if your application doesn’t depend on instances being started immediately, can start and perform useful work without manual intervention, and can resume work after being interrupted.
    Finally, there are a number of best practices that are good to keep in mind when running applications on Spot Instances in order to minimize the potentials impact from your instance being interrupted: save your work frequently (add checkpoints/split up your work), test your application, minimize group instance launches, track when spot instances start and stop and access large pools of compute capacity.
  • Found on this website: "I think this means that over time people will realize that spot instances are feasible for a wide variety of workloads. There will be a psychological barrier for a while, but it will be overcome eventually." Why not just bid the on-demand price? "The fact that Amazon controls the spot price, still doesn't change the fact that unless there is a capacity problem with on-demand, people don't have any reason for bidding more than the on-demand price." "If everyone adopted your idea and bid at the on-demand price, that would invariably drive the spot pricing up to equal the on-demand... and then what have we gained?" "AWS spot price *may* be determined by supply and demand, but doesn't have to be. This is because there is only one seller."
I also read the following paper: "Optimal Virtual Machine Placement across Multiple Cloud Providers" by Chaisiri S., Lee B. and Niyato D. (I have to admit I did not yet fully understand the mathematics behind the presented model):
ABSTRACT Cloud computing provides users an efficient way to dynamically allocate computing resources to meet demands. Cloud providers can offer users two payment plans, i.e., reservation and on-demand plans for resource provisioning. Price of resources in reservation plan is generally cheaper than that in on-demand plan. However, since the reservation plan has to be acquired in advance, it may not fully meet future demands in which the on-demand plan can be used to guarantee the availability to the user. In this paper, we propose an optimal virtual machine placement (OVMP) algorithm. This algorithm can minimize the cost spending in each plan for hosting virtual machines in a multiple cloud provider environment under future demand and price uncertainty. OVMP algorithm makes a decision based on the optimal solution of stochastic integer programming (SIP) to rent resources from cloud providers. The performance of OVMP algorithm is evaluated by numerical studies and simulation. The results clearly show that the proposed OVMP algorithm can minimize users’ budgets. This algorithm can be applied to provision resources in emerging cloud computing environments.

Wednesday, October 20, 2010

Decision Model for Cloud Computing under SLA Constraints

Today I had a look at the software that comes with the "Decision Model for Cloud Computing under SLA Constraints" paper by Andrzejak A., Kondo D. and Yi S. The implementation of the presented model can be found on the project website.

The paper describes its contribution as follows:
Our main contribution is a probabilistic model that can
be used to answer the question of how to bid given SLA
constraints. A broker can easily apply this model to present
automatically to the user a bid (or set of bids) that will meet
reliability and performance requirements. This model is particularly
suited for Cloud Computing as it is tailored for environments
where resource pricing and reliability vary significantly
and dynamically, and where the number of resources allocated
initially is flexible and is almost unbounded. We demonstrate
the utility of this model with simulation experiments driven
by real price traces of Amazon Spot Instances, and workloads
based on real applications.
So, I had a look at the source file, they implemented the simulation process in one C-file. It took me a while to completely understand the code, especially the implementation of the actual simulation in the methods simulateOptimalCkpt() and simulateHourCkpt() was pretty hard. But now I'm sure this code can be used to develop a broker application (and it can be easily ported to another language). Note that other checkpointing schemes can easily be added since the required methods can be found in the source code that comes with the "Reducing Costs of Spot Instances via Checkpointing in the Amazon Elastic Compute Cloud" paper on this website.

The following quote from the paper describes how the output of the program can be used to make an intelligent (task is done before the given deadline and within the foreseen budget) bid for the spot price:
To find the optimal instance type and bid price, we compute ET(cdead) and M(cB) and check the feasibility (as stated in section 3D) for all relevant combinations of both parameters. As these computations are basically “look-ups” in tables of previously computed distributions, the processing effort is negligible. Among the feasible cases, we select the one with the smallest M(cB); if no feasible cases exist, the job cannot be performed under the desired constraints.
Their simulations resulted in some interesting findings found in the summary of the results in the paper (section 4F). Note that constraints on other random variables than ET and M can be introduced as well.

How do I see the broker application at the moment? It takes as input a collection of workloads and some properties and constraints should be provided for each workload. For a workload should first be determined on what kind of instance it should be executed (maybe in a first iteration of the software with simple CPU, Memory, ... usage thresholds). Then a division between reserved and other instances should be made (based on the on-demand price, since the spot price fluctuates too much). Then finally the algorithms provided in the paper can be used to determine whether the constraints can be met by using spot instances. The bid price and checkpointing scheme that should be chosen for the spot instances is determined when this is the case. Otherwise on-demand instances can be used. I'm thinking the broker should be a Java webservice that periodically downloads the spot price history and then reruns the simulations to update its data.

Monday, October 18, 2010

What's next?

At the moment I'm making a comparison between spot and on-demand instances: looking for some interesting facts/sources on the web. Afterwards I'll have a look again at the "Decision Model for Cloud Computing under SLA Constraints" paper, to make a decision about how feasible it is to implement the presented model.

Spot Instances in Different Regions Comparison

After calculating the average spot instance price, I made an overview of the spot, on-demand and reserved instance prices and this for all different geographical regions. Also the prices for storage in these regions are included in the overview, which can be found here. We notice that for all instances except for a standard large linux instance the US East region is the cheapest one. For storage only the US West region is a bit more expensive than the other regions. For data transfer on the other hand, the Asia region is the most expensive one. For every region I made a graph (they can be found here) that plots all the different prices, an example is given here:


Afterwards I had a look at the optimal division between reserved and spot instances, but without taking into account more properties of using spot instances (such as the need for checkpointing, ...). Using spot instances seemed to be cheaper than using reserved instances, such that the optimal division is using a 100 percent spot instances.

During the thesis meeting last week another question was raised: whether it is cheaper to under- or overestimate the number of required Reserved instances to reach an optimal division? While trying some different workloads in the earlier created Excel sheet, I found out this (as one would expect) depends on the workload. I tried to find a formula that expressed whether for a certain workload it would be better to take one less or one more reserved instance. Of course this is a bit contradictious, since you could determine the optimal amount of reserved instances if the workload is known. But maybe the found tipping point could tell us something, my findings can be found here.

Saturday, October 16, 2010

Amazon Spot Instance Pricing [CloudExchange.org]

After getting some advice, I made some new graphs from the cloudexchange input data that:
  • plots the average spot price for each day of the week (for every instance type)
  • plots the average spot price during each hour of the day (for every instance type)
A first example of this new output for a us-east-1.linux.c1.xlarge spot instance:





And as a second example the output for a us-east-1.linux.m1.xlarge spot instance:





We notice that some spot instances have a rather fluctuating price during the day and over the days of the week, while others do not show this behavior. This updated version of the software can be found here. And all the generated graphs can be downloaded from this location.

Afterwards I made a little program that takes the CSV files found on cloudexchange.org as input and calculates what the average spot instance price has been between two given dates for each instance type. This program can be downloaded here. The output is written to a new CSV file, for the Linux instances a PDF version of this file can be found here.

Thursday, October 14, 2010

Region Comparison

I had a look at the pricing of EC2 instances for the different geographical regions. Of course not only the money cost should be considered when running an application in the cloud but also the latency the servers in these different locations cause (but you can think of applications that do not care). The prices of the different locations can be found on Amazons website and are presented by me here. A first remark is that only the US East region does offer cluster compute instances. I investigated two different things about the pricing in the different regions:
  • What is the cheapest location to run a certain instance? The results can be found here. We notice that for Reserved instances US - N. Virginia is always the cheapest solution. For on-demand instances it is as well, but for Windows instances the EU and Asia region offer the same price.

  • How does the price of reserved instances vs on-demand instances compare on these different locations? The results can be found here. We notice that except for Micro instances US East always offers the most advantageous reserved instances. For Linux instances we see no difference between the regions (both for the 1 and 3-year terms). For Windows instances on the other hand we notice that Reserved instances are advantageous after a shorter amount of time in the US regions. It is also remarkable that Micro instances are the only ones that are more quickly advantageous in all regions but the US East one.
Remark that all of this can be found here (an Excel sheet).

Monday, October 11, 2010

Reserved vs On-Demand Instances

On Amazon EC2 these terms have the following meaning:
  • Reserved instances:
    They give you the option to make a low, one-time payment for each instance you want to reserve and in turn receive a significant discount on the hourly usage charge for that instance.
  • On-Demand instances:
    They let you pay for compute capacity by the hour with no long-term commitments. This frees you from the costs and complexities of planning, purchasing, and maintaining hardware and transforms what are commonly large fixed costs into much smaller variable costs.
I had a look today at the problem of finding the optimal division between Reserved and On-Demand instances. With the optimal solution being the one with the smallest total cost. My findings (in an Excel file) can be found here. Or I put them in two pdf files as well:
  • General Pricing Information: download here
  • Practical Example: download here

Sunday, October 10, 2010

Amazon Spot Instances Pricing [CloudExchange.org]

On cloudexchange.org we find the history of the price of the different Amazon Spot instances, this data is exporteable as an CVS file.
I made a little Java program (download here) that takes this file as input and generates the following graphs:
  • one containing the average price for the spot instance during each day

  • one with a line representing the average price during the day and a line for the average price during the night (0:00-8:00AM)
On the following graph we notice that for the corresponding spot instance (us-east-1.linux.m2.4xlarge) the average day price does not have peaks during the weekends (2 units on the rights of every vertical grid line), which seems logical.


For the same instance I then divided this graph in the day and night parts, but it seems there is no remarkable difference in price.


.

Papers

I read these 3 papers during the weekend:
  1. "Exploiting Non-Dedicated Resources for Cloud Computing" by Andrzejak A., Kondo D. and Anderson D.P.:
    ABSTRACT Popular web services and applications such as Google Apps, DropBox, and Go.Pc introduce a wasteful imbalance of processing resources. Each host operated by a provider serves hundreds to thousands of users, treating their PCs as thin clients. Tapping the processing, storage and networking capacities of these non-dedicated resources promises to reduce the size of required hardware basis significantly. Consequently, it presents a noteworthy opportunity for service providers and operators of cloud computing infrastructures. We investigate how a mixture of dedicated (and so highly available) hosts and non-dedicated (and so highly volatile) hosts can be used to provision a processing tier of a large-scale web service. We discuss an operational model which guarantees long-term availability despite of host churn, and study multiple aspects necessary to implement it. These include: ranking of non-dedicated hosts according to their long-term availability behavior, short-term availability modeling of these hosts, and simulation of migration and group availability levels using real-world availability data from 10,000 non-dedicated hosts. We also study the tradeoff between a larger share of dedicated hosts vs. higher migration rate in terms of costs and SLA objectives. This yields an optimization approach where a service provider can find a suitable balance between costs and service quality. The experimental results show that it is possible to achieve a wide spectrum of such modes, ranging from 3.6 USD/hour to 5 USD/hour for a group of at least 50 hosts available with probability greater than 0.90.
  2. "Reducing Costs of Spot Instances via Checkpointing in the Amazon Elastic Compute Cloud" by Andrzejak A., Kondo D. and Yi S.:
    ABSTRACT Recently introduced spot instances in the Amazon Elastic Compute Cloud (EC2) offer lower resource costs in exchange for reduced reliability; these instances can be revoked abruptly due to price and demand fluctuations. Mechanisms and tools that deal with the cost-reliability trade-offs under this schema are of great value for users seeking to lessen their costs while maintaining high reliability. We study how one such a mechanism, namely check pointing, can be used to minimize the cost and volatility of resource provisioning. Based on the real price history of EC2 spot instances, we compare several adaptive check pointing schemes in terms of monetary costs and improvement of job completion times. Trace-based simulations show that our approach can reduce significantly both price and the task completion times.
  3. "Decision Model for Cloud Computing under SLA Constraints" by Andrzejak A., Kondo D. and Yi S.:
    ABSTRACT With the recent introduction of Spot Instances in the Amazon Elastic Compute Cloud (EC2), users can bid for resources and thus control the balance of reliability versus monetary costs. A critical challenge is to determine bid prices that minimize monetary costs for a user while meeting Service Level Agreement (SLA) constraints (for example, sufficient resource availability to complete a computation within a desired deadline). We propose a probabilistic model for the optimization of monetary costs, performance, and reliability, given user and application requirements and dynamic conditions. Using real instance price traces and workload models, we evaluate our model and demonstrate how users should bid optimally on Spot Instances to reach different objectives with desired levels of confidence.
It took a while (I read some parts more than 3 times) to understand everything the papers were talking about, but now I think I get what they are saying. And the presented models sure will be a good extension to the model I'm creating.

Wednesday, October 6, 2010

Cloud Computing General [B]

During the first week I tried to get familiar with the cloud computing principles again, by doing the following:
  • Reading "The Big Switch" by Nicholas Carr. I think the following quote (from Booklist) describes the book rather well:
    "Carr examines the future of the Internet, which he says may one day completely replace the desktop PC as all computing services are delivered over the Net as a utility, the Internet morphing into one giant 'World Wide Computer.' ... Carr warns that the downside of the World Wide Computer may mean further concentration of wealth for the few, and the loss of jobs, privacy, and the depth of our culture." The book is easy to read and gives some good insights and background information about the shift to utility/cloud computing.

  • Following the one hour webinar provided by VMware: "The New Era of IT: the VMWare Cloud Application Platform" (http://goo.gl/WBsT) which was not very technical but talked about VMwares cloud computing strategy. They want their developers to keep on using the tools they are used to (to build Java applications): the Spring framework (by the way I got a chance to get familiar with this framework during my summer job at Technicolor). These apps make use of the VMware vFabric platform. The talk also provided a sample hotel room booking application, which was used to illustrate some of the cloud computing principles (eg scaling up/down). More information about the platform can be found here

PS: I literally was in the clouds this week: I made a helicopter flight above Antwerp (http://twitpic.com/2u3x9z).

Thursday, September 30, 2010

Cloud Computing General [A]

This week I did a few general things to get in a cloud computing mood again:
  1. On Monday September 27th I attended the Microsoft conference 'A Journey to the Cloud' (http://msdn.microsoft.com/nl-be/ff955851.aspx), where I went to the following talks:

    • The Microsoft Cloud Continuum (Bart Vande Ghinste, Enterprise Architect for MS Belgium & Luxembourg): this was a good summary of what cloud computing stands for and why MS beliefs it will be big.

    • The Road to a Private Cloud Infrastructure (Eduardo Kassner, Enterprise Technical Architect for MS Corporation): this was less interesting, because the talk was very Microsoft oriented and the recorded demo videos were hard to follow for someone who has no Windows Azure experience.

    • An Architecture Lap around the Windows Azure Platform (Kurt Claeys, Technical Solution Specialist for MS EMEA): fun introduction to the Azure platform.

    • IaaS: Principles and Patterns for the Private Cloud (Alberto Boczar, Technical Architect for MS Corporation): this was a very good speaker which made his talk interesting. He presented a general overview of the principles and patterns of cloud computing.

  2. Started reading "The Big Switch: Rewiring the World from Edison to Google" by Nicholas Carr on my Kindle. (http://goo.gl/GBBu)

  3. I experimented a bit with Amazon EC2. Became a bit familiar with the AWS Management Console interface, ran an instance for the first time and connected to it through SSH, ...

Wednesday, September 29, 2010

Introduction

My final year at the University of Antwerp started, which means that I'll have to write a thesis this year.
My interest in cloud computing lead to the choice of the ESP assignment titled "A Broker for Cost-efficient QoS aware resource allocation in EC2".
A description in Dutch can be found here.
I look forward on doing a lot of research on the topic this year.
I'll try to post on this blog what is going on from time to time.