<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Über Software &#187; Terracotta</title>
	<atom:link href="http://www.uebersoftware.com/category/terracotta/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.uebersoftware.com</link>
	<description>Opinions and thoughts on Software and Technology.</description>
	<lastBuildDate>Thu, 29 Apr 2010 06:04:30 +0000</lastBuildDate>
	
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Scaling Enterprise Applications and all that jazz: Terracotta, GigaSpaces, and Azul</title>
		<link>http://www.uebersoftware.com/2009/08/scaling-enterprise-applications-and-all-that-jazz-terracotta-gigaspaces-and-azul/</link>
		<comments>http://www.uebersoftware.com/2009/08/scaling-enterprise-applications-and-all-that-jazz-terracotta-gigaspaces-and-azul/#comments</comments>
		<pubDate>Sat, 22 Aug 2009 17:53:36 +0000</pubDate>
		<dc:creator>Ben</dc:creator>
				<category><![CDATA[Scalability]]></category>
		<category><![CDATA[Terracotta]]></category>
		<category><![CDATA[Azul]]></category>
		<category><![CDATA[enterprise applications]]></category>
		<category><![CDATA[GigaSpaces]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[scaling]]></category>

		<guid isPermaLink="false">http://www.uebersoftware.com/?p=69</guid>
		<description><![CDATA[I like scaling and the architectures that attempt to solve those issues. Below I tried to bullet point 3 prominent players in this area, all solving scaling problems with different architectures at different levels.

Terracotta 3.1

Clustering JVM using Network attached Memory
Only the field-level changes are sent over the network
Uses TCP to communicate within cluster
Open Source and recently [...]]]></description>
			<content:encoded><![CDATA[<p>I like scaling and the architectures that attempt to solve those issues. Below I tried to bullet point 3 prominent players in this area, all solving scaling problems with different architectures at different levels.</p>
<ul>
<li>Terracotta 3.1
<ul>
<li>Clustering JVM using Network attached Memory</li>
<li>Only the field-level changes are sent over the network</li>
<li>Uses TCP to communicate within cluster</li>
<li>Open Source and <a href="http://www.terracotta.org/web/display/orgsite/Terracotta+Acquires+Ehcache">recently acquired EHCache</a></li>
</ul>
</li>
<li>GigaSpaces
<ul>
<li>Cloud enabled Middleware Platform (PaaS)</li>
<li>&#8220;<a href="http://en.wikipedia.org/wiki/Space_based_architecture">Space Based Architecture</a>&#8221; &#8211; inspired by JavaSpaces</li>
<li>Partitioning &amp; Co-location as essence: Ulitmate goal: &#8220;share nothing architecture&#8221; &#8211; eliminate costs of copying</li>
</ul>
</li>
<li>Azul Systems
<ul>
<li>Proxy JVM with transparent redeploy to Azul Hardware</li>
<li>Integrated hardware, kernel and JVM Design</li>
<li>Build their own Multicore System running their own Chips</li>
<li>Systems are high number of cores</li>
<li>Optimistic Thread Concurrency  &amp; Pauseless Garbage Collection Technology</li>
</ul>
</li>
</ul>
<p><strong>Terracotta @ JavaOne</strong><br />
<a href="http://www.uebersoftware.com/2009/08/scaling-enterprise-applications-and-all-that-jazz-terracotta-gigaspaces-and-azul/"><p><em>Click here to view the embedded video.</em></p></a></p>
<p><strong>Gigaspaces Highlevel:</strong><br />
<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="480" height="347" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="src" value="http://blip.tv/play/AYGVwB8C" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="480" height="347" src="http://blip.tv/play/AYGVwB8C" allowfullscreen="true"></embed></object><br />
<strong><br />
Azul &#8211; Very technical Google TechTalk </strong><br />
<a href="http://www.uebersoftware.com/2009/08/scaling-enterprise-applications-and-all-that-jazz-terracotta-gigaspaces-and-azul/"><p><em>Click here to view the embedded video.</em></p></a></p>
<p>Many new developments also in the cloud space. What is your favorite scaling technology?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.uebersoftware.com/2009/08/scaling-enterprise-applications-and-all-that-jazz-terracotta-gigaspaces-and-azul/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Internet-scale Java Web Applications</title>
		<link>http://www.uebersoftware.com/2009/07/internet-scale-java-web-applications/</link>
		<comments>http://www.uebersoftware.com/2009/07/internet-scale-java-web-applications/#comments</comments>
		<pubDate>Wed, 22 Jul 2009 15:30:00 +0000</pubDate>
		<dc:creator>ben</dc:creator>
				<category><![CDATA[EC2]]></category>
		<category><![CDATA[Scalability]]></category>
		<category><![CDATA[Terracotta]]></category>
		<category><![CDATA[Web2.0]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[scaling]]></category>
		<category><![CDATA[webapps]]></category>

		<guid isPermaLink="false">http://www.uebersoftware.com/2009/07/internet-scale-java-web-applications/</guid>
		<description><![CDATA[I am currently working on 2 application architectures. One is a PHP Facebook app (IFrame) with Postgresql in the backend, the other is a Glassfish/Jersey/Toplink/PostgreSql stack.
When reading the glowing web 2.0 tech stories in the news and sites like highscalability it seems like just about everyone requiring a &#8220;internet-scale&#8221; architecture is using MySQL, many are [...]]]></description>
			<content:encoded><![CDATA[<p>I am currently working on 2 application architectures. One is a PHP Facebook app (IFrame) with Postgresql in the backend, the other is a Glassfish/Jersey/Toplink/PostgreSql stack.</p>
<p>When reading the glowing web 2.0 tech stories in the news and sites like <a href="http://www.highscalability.com/">highscalability</a> it seems like just about everyone requiring a &#8220;internet-scale&#8221; architecture is using MySQL, many are using stacks in the line of {Phyton,Django|PHP, Zend}{memcached/MySQL} and  take advantage of the new offerings of <a href="http://aws.amazon.com/ec2/">Amazon </a>or <a href="http://code.google.com/appengine/">Google</a> to push their infrastructure to the cloud ( <a href="http://www.microsoft.com/azure">Microsoft Azure</a> is another big one, <a href="http://www.sun.com/solutions/cloudcomputing">Sun </a>has something cooking, and there are many smaller cloud service providers).</p>
<p>I actually also had in the back of my head to go for EC2 in the near future for my apps &#8211; thinking of EC2 just of a vserver with more/less power on demand.</p>
<p>However when thinking about it, I was not so sure anymore if the architecture I am using is even ready for the Cloud &#8211; and ready to scale.</p>
<p>When hearing the advocates of BigTable, traditional RDBMS are not suited for such endavours. Nowadays all the hype seems about simple data structures, like hashtables, and doing the Joins in a Application Layer.  Another approach is to do sharding  &#8211; divide the database into shards which are exact replicas of each other, direct e.g. usergroup x to shard z, ensuring that they mostly only need data from this shard.</p>
<p>Where do JEE technologies fit in those high-scalability scenarios and why not Postgres &#8211; is the transactional db a scalability killer?</p>
<p>Lets examinate my concrete questions for my 2 use cases:</p>
<p><span style="font-weight: bold;">a) MySQL vs. Postgres</span><br /><a name="pooling"></a> <br />The traditional PHP application goes within Apache with mod_php using process forking &#8211; so every request is basically a new php process. Very different from the concept of a container. This implies that in regards to data caching there is nothing out of the PHP box. Maybe not so astonishing anymore that PHP does not have connection pooling support &#8211; yes it just wouldn&#8217;t make sense. Quoting Rasumus Lersdorf, Creator of PHP from a 2002 <a href="http://www.sitepoint.com/article/phps-creator-rasmus-lerdorf/4/">interview</a>:<a href="http://www.sitepoint.com/article/phps-creator-rasmus-lerdorf/4"></a></p>
<p><span style="font-style: italic;">A pool of connections has to be owned by a single process. Since most people use the Apache Web server, which is a multi-process pre-forking server, there is simply no way that PHP can do this connection pooling …</span><br /><span style="font-style: italic;"> If/when the common architecture for PHP is a single-process multithreaded Web server, we might consider putting this functionality into PHP itself, but until that day it really doesn’t make much sense. Even Apache 2 is still going to be a multi-process server with each process being able to carry multiple threads. So we could potentially have a pool of connections in each process</span>.</p>
<p>Connections to MySQL <em>MyISAM</em> storage engine are apparently only 4KB and quite cheap. On the other hand Oracle connections</p>
<p><a style="font-style: italic;" href="http://vsbabu.org/mt/archives/2003/02/12/php_oracle_performance.html">every single connection takes up 5MB in NT4 for Oracle 8i 816</a></p>
<p>The truth is, that most of the MySQL-PostgresSQL comparisons found on Google are really outdated.  Postgres made huge performance increases from in their last version 8 as well as MySQL had significantly improved their transactional INNODB engine. So in terms of performance it more depends on the optimal configuration and design than MySQL vs. Postgres. Both are good databases and after going over an excellend <a href="http://www.slideshare.net/xzilla/scaling-with-postgres">presentation &#8220;Scaling with Postgres&#8221; </a>by Robert Treat given at the Percona Performance Conference 2009, I feel in good hands using Postgres.</p>
<p><span style="font-weight: bold;">b) Data caching</span></p>
<p>1) facebook app with PHP/Postgres</p>
<p>How would i cache to improve performance when i see that direct database access is taking too much. Well actually <a href="http://www.danga.com/memcached/">memcache </a>can be used by many database systems, it just happens that a lot of people use it with MySQL, but it also <a href="http://pgfoundry.org/projects/pgmemcache/">integrates with Postgres </a>and many more.</p>
<p>Besides memcached i am sure  there are other distributed caches usable with PHP.</p>
<p>2) Glassfish/Jersey/Toplink/Postgres</p>
<p>Here I am using JEE JPA with Toplink Essentials. The later does not have clustering support &#8211; or at least no production quality. The open source Toplink code base EclipseLink 1.0 was last time i looked at it (ca. Jan 2009) a bit unstable.</p>
<p>So I guess I would have to look at other distributed caches. Fortunately the choices here are not too little &#8211; hibernate integrates with EHCache, OSCache to name a few. So I guess I do not have to worry too much about distributed caching for my JEE app right now.</p>
<p><span style="font-weight: bold;">c) Physical Infrastructure</span></p>
<p>My current vServer provider (which i can absolutly not recommend but this is another story.. ) charges about 100 CHF / a month for 1 GB (3GB burstable) RAM, 60GB Hd, 2 GHz Xeon processor. I am already a bit short of RAM at times, so the next bigger package is dedicated which starts 200 CHF / month , or more realistcally a 350 / month for a dual core Xeon and 4 GB of RAM.</p>
<p>From the <a href="http://aws.amazon.com/ebs/">Amazon Website</a>:</p>
<p><span style="font-style: italic;">&#8220;As an example, a medium sized website database might be 100 GB in size and expect to average 100 I/Os per second over the course of a month. This would translate to $10 per month in storage costs (100 GB x $0.10/month), and approximately $26 per month in request costs (~2.6 million seconds/month x 100 I/O per second * $0.10 per million I/O).&#8221;</span></p>
<p>Given my app  is no video/media sharing the scenario would be a small instance always on, and moderate  Elastic Block Storage (EBS) requirements for the data storage. This gives me a rough estimate using their  handy <a href="http://calculator.s3.amazonaws.com/calc5.html">calculator:</a>
<ul>
<li>Small (1.7 GB RAM,..) Linux Instance (always on 1 month, 36$ EBS costs): 118 $</li>
<li>Large (7.5 GB RAM,..) Linux Instance (always on 1 month, 36$ EBS costs): 363 $</li>
</ul>
<p>The <a href="http://aws.amazon.com/ec2/instance-types/">instance types</a> of EC2 also include high performance CPU instances and different OS. For me right now something between a small and a large instance would be ideal, just in terms of RAM. (I mean just for a single glassfish instance the recommended memory allocation is 1 GB ..). Maybe having 2 small instances would be the best solution in my case.</p>
<p>So overall I guess I will go with EC2. There are a bunch of articles, <a href="http://www.mahalo.com/answers/web-development/godaddy-virtual-dedicated-server-vs-amazon-ec2-vs-rackspace-which-is-better-for-a-startup-and-why">questions </a>and <a href="http://www.simplyhaddad.com/pigeon-box/design-tutorials/88-which-one-is-for-me-amazon-s3ec2-vs-dedicated-server-vs-shared-hosting-vs-vps.html">comparisons </a>out there that list all the pro and cons between dedicated servers, cloud providers, and vservers. Fact is that Amazon has been a leader in the Cloud space and improved their services constantly. Also the usablility with the Management Console has increased significantly.</p>
<p><span style="font-weight: bold;">d) Impacts of EC2 on Application Architecture / Clustering</span></p>
<p>On the WebServer / DNS tier EC2 offers <a href="http://aws.amazon.com/ec2/#functionality">Elastic Load Balancing</a>. This is 1 public static ip adress per AWS account. The ip adresses of the instances will change upon reboot, but their only private so don&#8217;t have to worry about this. Furthermore the elastic ip feature implies a load balancing included for you to distribute load to the instances.</p>
<p>One problem with EC2 though is in the application tier because is there&#8217;s no multicast &#8211; makes sense when you think about the potential network flood it would possible generate. This s a problem, because most of the applications/frameworks/application servers usually rely on multicast for their clustering solutions &#8211; in order to the discovery of other service instances</p>
<p>I found a nice <a href="http://blog.decaresystems.ie/index.php/2007/02/12/amazon-web-services-the-future-of-datacenter-computing-part-2/">article </a>on a <a href="http://www.terracotta.org/">Terracotta</a> architecture solving this problem. Terracotta provides clustering and caching for Java objects by instrumenting the Java byte-codes and doing things like (pre)fetching content or updating copies. They do this via TCP/IP and therefore enable clustering and distributed caches that do not rely on multicast. What&#8217;s really cool is that they went recently OSS and you can download their software for free!</p>
<p>How does Terracotta  work?</p>
<p>A few interesting quotes from their forum:</p>
<p><span style="font-style: italic;">Every application node is connected to the Terracotta Server Cluster via a TCP connection. There is no multicast. Terracotta is very efficient over the network. Because it intercepts field-level changes, only the changes to your objects are sent across the wire. In addition, objects do not live everywhere, so Terracotta only sends changes where objects are resident. In the case where you have a well partitioned application, this means that on average, your changes will only be copied to the Terracotta Server Cluster, and not to all of the application nodes (because they don&#8217;t need a copy of objects they do not have a reference to in Heap)</span></p>
<p><span style="font-style: italic;">Just because one has 1000 clients running the same application doesn&#8217;t mean all data is everywhere. One of the features of Terracotta is that it has a virtual heap. Objects are only where they need to be when they need to be there. Some users do have large numbers of clients and it works quite well. Scale is more of a question of concurrency and rate of change than number of clients.</span></p>
<p><span style="font-style: italic;">The Terracotta server uses an efficient mechanism to send changes using Java NIO under the covers to achieve high scalability. </span></p>
<p>There are integrations with several App Servers, among them <a href="http://www.terracotta.org/web/display/orgsite/GlassFish+Integration">Glassfish</a>. Yes!</p>
<p><span style="font-size:130%;"><span style="font-weight: bold;">Summary</span></span></p>
<p>Without further ado, my takeaways to this rather long post are:</p>
<p>Postgres does not <span style="font-style: italic;">per se</span> underperform MySQL<br />Memecache can be used with Postgres<br />Do not use Persistent DB Connections in PHP ever<br />EC2 will fit my bill for infrastructure/hosting<br />Terracotta will be a good candidate  for clustering in a EC2 environment without multicast<br />Hibernate with EHCache, JBoss Cache, OSCache is your distributed cache replacement for Toplink Essentials</p>
]]></content:encoded>
			<wfw:commentRss>http://www.uebersoftware.com/2009/07/internet-scale-java-web-applications/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
