In the recently held annual EMC World Conference in Boston, the vendor
introduced the Data Domain Boost software. Jackson elaborated on the advantages
of the software and why it is imperative for customers to adopt it
What are some current trends in deduplication solutions?
Deduplication frees up memory, disk space by getting rid of redundant data,
physical space and provides for a smarter backup system. These benefits prompt
enterprise users to redesign their backup systems overall. Deduplication is
hence catalyzing the change in the manner of backup systems.
There is a considerable amount of interest in all implementations of data
deduplication. Backup deduplication can occur in two main places-at the data
source or the backup target. With source-based deduplication, data is
de-duplicated as the back-up process begins and before the data is sent over the
network to be stored. We are seeing customer interest for source-based
deduplication for remote back-ups, virtual environments, large file servers, and
other environments where the backup process is hampered by network or other
resource bottlenecks.
For target deduplication offerings the main challenge being addressed is the
growth of back-end storage, tape reduction/elimination and faster replication to
disaster recovery site. The backup application sends data to the target storage
device and the data is de-duplicated at the device, either immediately or at a
scheduled time. It is found in VTLs and LAN backup to disk appliances or
platforms and provides the benefit of plug and play with existing back-up
applications.
The maximum interest is coming from a need to provide single step recovery.
With traditional backup software solutions, customers had the need to restore
the last full backup and all the incrementals which were taken after that. This
is very time consuming and inefficient.
Some customers ask for solutions, which can efficiently backup/restore remote
distributed environments without the need to deploy appliances/tapes across all
the remote offices. Deduplication allows users to centrally manage the backup of
multiple remote sites.
Deduplication at source can shrink the amount of time required for backup,
improve performance of production storage and shrink network utilization
substantially. It can also stop the avalanche of full and incremental backup
data before it forms-reducing backup times, client resource consumption, and
network utilization.
Most companies still rely on traditional backup software to back-up to tape.
The problem with traditional backup for example Tape is that it is extremely
inefficient and slow, especially for remote sites and virtual systems. When it
is time to recover data, the traditional process is tedious and unreliable,
usually involving the layering of incremental backups onto the last full backup
to reach the desired recovery point. And in many cases, the tape-based backup
data can't be recovered, but this is not discovered until it is just too late.
And finally, for all the effort of conducting traditional backup, the result
is little more than disaster insurance. Data is locked on tape in chunks that
often can't be immediately leveraged. Such reasons are compelling customers to
adopt data deduplication technologies as compared to that of backup to tape.
What significance does the integration of Networker with Data Domain hold?
How does it change the scenario for customers?
The tight integration of enterprise deduplication storage and backup
software is the foundation for effective backup redesign to minimize tape and
support the needs of virtual data centers. EMC's approach allows IT
organizations to better manage backup processes, fully maximize the benefits of
disk-based data protection systems and use efficient network replication to
minimize reliance on slower tape-based backup methods. Unlike other backup
vendors, who typically do not offer deduplication storage systems or an
integrated method for managing them, EMC is utilizing its broad portfolio of
products to provide both-enabling a much simpler, more predictable and more
supportable solution.
The new EMC Data Domain Boost software comes with the next generation
disk-based data protection capabilities. Its the first solution to optimize and
accelerate traditional backup software interaction with deduplication storage.
By distributing parts of the deduplication process to the backup server, DD
Boost speeds up aggregate backup throughput on EMC Data Domain deduplication
storage systems by an average of 50 percent. It significantly reduces load on
backup LANs and backup servers. DD Boost, is an advanced transport and
management interface to Data Domain systems, is already supported by third-party
backup software such as Symantec NetBackup and Backup Exec.
For non-NetWorker customers, with the introduction of DD Boost Software for
Data Domain Deduplication Storage Systems, EMC is enabling faster backup and
efficiency right from the backup systems to the Data Domain systems. DD Boost
enables backup applications to control network-efficient EMC Data Domain
Replicator software so multiple copies of data can be managed from the backup
application's console, giving administrators a global view of their backup and
disaster recovery (DR) environment.
Leveraging the replicator software, DD Boost can automate WAN vaulting for
use in disaster recovery, remote office backup, or multi-site tape consolidation
initiatives.
For NetWorker customers, the EMC NetWorker integration with EMC Data Domain,
combined with its existing integration with EMC Avamar, makes it the only backup
software application to provide seamless integration with the industry's two
leading deduplication solutions. The new capabilities of EMC NetWorker backup
and recovery software will dramatically increase backup speed and management
through the innovative new integration with DD Boost software for EMC Data
Domain deduplication storage systems.
What are the factors pushing the demand of data deduplication solutions?
Data deduplication is the process of comparing data with existing data and
identifying if the data match or are identical. If the data is found to be
identical, instead of saving a second copy of the same data, the technology
links the file to the original data, thus eliminating the need for another copy
of the same data. This saves disk space, enhances system memory and improves
system performance, and also enables a faster backup process.
Deduplication solves the customer's need for faster backup and importantly
retention problems, when compared to backup to tape option. Data deduplication
technologies allow for a much more reliable disaster recovery compared to tape
backup as the Data Domain replicator copies the data in a separate remote site
backup location.
Deduplication can reduce network bandwidth and backup storage by a factor of
300. This kind of savings in storage and network bandwidth is the biggest driver
for data deduplication.
What are the challenges that customers (CIOs) should be aware of while
adopting data deduplication?
While adopting data deduplication or running deduplication tests, CIOs must
ensure that the data represents the type that will be prevalent in the relevant
organization's environment, as due to the varied type of storage needs across
industries, no one single data set will be accurate in representing the results/
efficiencies from a deduplication exercise. CIO's should also ensure that
replication is supported. Deduplication is a win-win proposition. And, the RoI
improvement amplifies if it implemented correctly.
Challenges customers should be aware of:
- Know the mix of data types because as based on that data dedup ratio
varies and will impact the RoI. User created data like MS Office, dedups data
very well unlike natural data like seismic data or encrypted data. - Understand the data deduplication process well whether to use source based
or target based. - Must understand that the backup policy highly impacts dedup ratios. How
often they do full or incremental or differential backups. Higher number of
full backups, higher dedup ratios. - As each customer's data is different, to avoid any confusion later, they
should always try to do the dedup assessment.
komal Langar
komall@cybermedia.co.in
The author was hosted in Boston by EMC