Thesis Kurt Vermeersch: November 2010

Monday, November 22, 2010

Decision Model Java (Memory Fix + Link EC2 API)

I handled two problems that still existed with the Java port of the spot model software attached to the "Decision Model for Cloud Computing under SLA Constraints" paper.

The first problem I attacked is the fact that the software needed a lot of memory. This was the case because it first of all read a couple of 100000 records to memory from the input CSV file to then use this data to do a lot of simulations ... and to finally write all the results from memory back to file. I fixed this issue by adjusting the source code such that only the history spot prices of one instance type is read at once. Since there are no correlations between the data records of different instance types, it is possible to do the simulation (for the different task lengths and different checkpointing schemes) for every instance type/category separately.

A second thing that I changed is the input file for this application, previously it took one data.csv file as input containing the spot price history data for every instance type (different columns). This file did not contain any date information, every record contained the price for a minute of time. Now the application takes the CSV files from cloudexchange.org as input, which means there is a separate file for every region-os-instance combination that has two columns: the first one contains the date, the second one the corresponding spot price for that instance at that time. Also, to use up-to-date spot price history information and to ensure that all my applications would remain usable would cloudexchange ever cease operations, I made a link with the EC2 API. So, in a separate project (called AmazonSDK) I created an application that connects with the different Amazon EC2 endpoints and requests the spot price history of the corresponding region. This data is then processed by my application and the output files are organised and formatted the same way as the ones that can be found on cloudexchange.org.

Both projects can be checked out from the SVN repository, but backups can be found here:

The Decision Model project here.
The AmazonSDK project here.

Wednesday, November 17, 2010

OVMP Software on Neos Solver

I succeeded in running the OVMP software I received (GAMS model named namely StoIP_norm1.gms) on a free solver. By downloading the Neos software from this location (command to run this submission tool: java -cp NeosClien.jar:apache/xmlrpc-1.2-b1.jar NeosClient) and using the following parameters:

Server host: neos.mcs.anl.gov
Server port: 3332
Solver: Mixed integer nonlinearly constrained optimization => AlphaECP:GAMS

The results the solver returned can be found in this pdf.

GAMS Versus AMPL: it looks like a free solver is available as well, called MathProg. The syntax of AMPL is more natural/easier to read, it more closely resembles the algebraic formulations. After reading the first couple of chapters from the "AMPL: A Modeling Language for Mathematical Programming" book by Robert Fourer, David M. Gay and Brian W. Kernighan. Since the GAMS model seems to be pretty easy to understand, I see no problem in translating it to AMPL. I have to admit before stumbling upon these solvers while doing my thesis research, I had no idea about the existence of them ... but now I see the power of these solvers for optimization problems.

Tuesday, November 16, 2010

Decision Model Java Implementation

I made some little changes to the java port of the 'Decision Model ...' paper software. And added Javadoc comments to this source code, which can be checked out from the SVN repository. This software needs a lot of memory because it first of all reads a couple of 100000 records to memory from the input CSV file then uses it to do a lot of simulations ... and finally writes all the results from memory back to file. (Actually this process starts all over for every task length). I think to boost the performance it would not be a bad idea to only take the history spot prices of one instance type as input and do the simulation only for one category/instance type at a time. This would almost make all arrays used in the program a dimension smaller. And it's not that hard to then write something around it that runs the program a couple of times with different input for the different instance types. Also the program still takes its own input file, but this can easily be changed by implementing a data input reader that takes the cloudexchange CSV files as input. A backup of the source code of this program can be downloaded here.

Amazon EC2 News (ctd)

On Novemer 15th Amazon EC2 once again had an EC2 related announcement to make: a new instance type, called "Cluster GPU Instance". The incentive to launch this instance type according to Amazon: "GPUs are increasingly being used to accelerate the performance of many general purpose computing problems. However, for many organizations, GPU processing has been out of reach due to the unique infrastructural challenges and high cost of the technology". For the moment this instance type is only available in the US East (N.Virginia) region (with Unix/Linux) as on-demand and reserved instances. The on-demand version is priced 2.10 dollar per hour, while the reserved instance has a 1-year fixed price of 5630 dollar and a hourly rate of 0.74 dollar. Some further reading can be done on the following locations:

This article on Werner Vogels blog gives a good overview of the incredible power of these instances (over a TeraFlop per instance).
Amazon makes it possible for anyone to use a supercomputer by offering it on-demand, but is it performance the same as with in-house hardware? It seems to be according to these benchmarks.
Nvidea explains its architecture and what makes GPU computing so attractive here.
In the official press release some applications that could benefit from this instance type were already mentioned: medical imaging visualization software, financial data analysis and simulation, rendering of sophisticated CGI,...

Monday, November 15, 2010

Prototype Broker for Resource Allocation

I posted earlier what I was doing during the long weekend

I'm working on a prototype (A Broker for Cost-efficient QoS aware resource allocation in EC2) by combining the tools and knowledge I already acquired during the last couple of weeks"

Except for the fact that I keep feeling sick (for a couple of weeks already), everything went okay. The only issue is the actual scheduling in the time of the different tasks, I wonder if I can use some existing deadline scheduler for this. The prototype (it literaly is: e.g. undocumented, overlapping model and view, ... ) now handles everything for reserved and on-demand instances correctly. The next step (after refactoring and implementing a scheduling algorithm) will be to start adding spot instances. I created two programs:

A TaskGenerator that takes some program arguments specifying the task as input and creates a
CSV file that describes the task and a corresponding CSV file describing the randomly generated workload of the task.
The actual EC2 broker prototype which takes the output of the previous program and some CSV files with the EC2 pricing details as input. I chose to create a Swing GUI that creates Gantt charts containing the scheduled tasks. I also created an overview of the corresponding costs. Some screenshots can be found below.

Overview of the scheduled tasks

Every task has a detailed tooltip

Cost overview

The source code of this prototype can be checked out from the SVN repository (EC2Broker project) or an archive can be downloaded right here.

Thursday, November 11, 2010

What's next?

I'm working on a prototype (A Broker for Cost-efficient QoS aware resource allocation in EC2) by combining the tools and knowledge I already acquired during the last couple of weeks. An update will follow during the weekend.
I also started reading in 'AMPL: a modeling language for mathematical programming', because we decided that it would be a good idea for me personally to learn a bit about solvers.

Thursday, November 4, 2010

Analysis EC2 Spot Pricing

I made a little document that gives an overview of my findings about the output graphs and statistic values of my spot history pricing analysis tool. This document also states how these findings are noticeable in the output values and thus can be used to make the EC2 capacity planner more intelligent. Download the pdf file here.

Tuesday, November 2, 2010

Amazon EC2 News Links

The Amazon Web Services are always evolving, with new instance types being introduced in EC2, pricing models being changed, ... The last couple of weeks some announcements were made:

A Free Usage Tier for AWS is introduced: which means that since 1 November 2010 new customers get a certain amount of free services e.g. 750 hours of Micro Linux instances on EC2. More information can be found on the official website.
The Amazon S3 prices got an update: in general they decreased and a 1TB tier was added and the 50-100TB tier was removed. The new prices can be consulted on the official website. To conclude this little news post I quote TechCrunch: "Amazon is continuing to drive down the developer costs for storing existing data, emphasizing the advantages of using cloud computing versus a fixed hard drive for storage."

An updated version of my Excel worksheet can be downloaded here.

Monday, November 1, 2010

Comparison Instance Pricing in Regions

I updated the excel worksheet I created to analyse the price differences of the EC2 instances across the different regions. An updated version can be downloaded here, in this spreadsheet some corrections are made and all previous done Excel research is combined in this one file.

I also created a normalised version of the instance price comparison, for the US East region this resulted in this graph:

For the EU region we got the following graph:

What can be concluded from these graphs is:

The Cluster Compute Instance is not available in the EU Region. (CloudExchange.org does not provide spot price history for Micro Instances either)
In the US East the reserved prices (when assumed that they are bought for a year) are about 65 percent of the on demand prices, while in the EU region this is about 70 percent.
Spot prices lay just below 40 percent of on demand prices in the US East region, in the EU region this is just above 40 percent.
There is one remarkable price: Standard Large Spot Instances are relatively more expensive in the US East region.

Thesis Kurt Vermeersch