Advertisment

Deduplication can reduce network bandwidth by a factor of 300

author-image
DQC Bureau
New Update

In the recently held annual EMC World Conference in Boston, the vendor

introduced the Data Domain Boost software. Jackson elaborated on the advantages

of the software and why it is imperative for customers to adopt it

Advertisment

What are some current trends in deduplication solutions?



Deduplication frees up memory, disk space by getting rid of redundant data,
physical space and provides for a smarter backup system. These benefits prompt

enterprise users to redesign their backup systems overall. Deduplication is

hence catalyzing the change in the manner of backup systems.

There is a considerable amount of interest in all implementations of data

deduplication. Backup deduplication can occur in two main places-at the data

source or the backup target. With source-based deduplication, data is

de-duplicated as the back-up process begins and before the data is sent over the

network to be stored. We are seeing customer interest for source-based

deduplication for remote back-ups, virtual environments, large file servers, and

other environments where the backup process is hampered by network or other

resource bottlenecks.

Advertisment

For target deduplication offerings the main challenge being addressed is the

growth of back-end storage, tape reduction/elimination and faster replication to

disaster recovery site. The backup application sends data to the target storage

device and the data is de-duplicated at the device, either immediately or at a

scheduled time. It is found in VTLs and LAN backup to disk appliances or

platforms and provides the benefit of plug and play with existing back-up

applications.

The maximum interest is coming from a need to provide single step recovery.

With traditional backup software solutions, customers had the need to restore

the last full backup and all the incrementals which were taken after that. This

is very time consuming and inefficient.

Some customers ask for solutions, which can efficiently backup/restore remote

distributed environments without the need to deploy appliances/tapes across all

the remote offices. Deduplication allows users to centrally manage the backup of

multiple remote sites.

Advertisment

Deduplication at source can shrink the amount of time required for backup,

improve performance of production storage and shrink network utilization

substantially. It can also stop the avalanche of full and incremental backup

data before it forms-reducing backup times, client resource consumption, and

network utilization.

Most companies still rely on traditional backup software to back-up to tape.

The problem with traditional backup for example Tape is that it is extremely

inefficient and slow, especially for remote sites and virtual systems. When it

is time to recover data, the traditional process is tedious and unreliable,

usually involving the layering of incremental backups onto the last full backup

to reach the desired recovery point. And in many cases, the tape-based backup

data can't be recovered, but this is not discovered until it is just too late.

And finally, for all the effort of conducting traditional backup, the result

is little more than disaster insurance. Data is locked on tape in chunks that

often can't be immediately leveraged. Such reasons are compelling customers to

adopt data deduplication technologies as compared to that of backup to tape.

Advertisment

What significance does the integration of Networker with Data Domain hold?

How does it change the scenario for customers?



The tight integration of enterprise deduplication storage and backup

software is the foundation for effective backup redesign to minimize tape and

support the needs of virtual data centers. EMC's approach allows IT

organizations to better manage backup processes, fully maximize the benefits of

disk-based data protection systems and use efficient network replication to

minimize reliance on slower tape-based backup methods. Unlike other backup

vendors, who typically do not offer deduplication storage systems or an

integrated method for managing them, EMC is utilizing its broad portfolio of

products to provide both-enabling a much simpler, more predictable and more

supportable solution.

The new EMC Data Domain Boost software comes with the next generation

disk-based data protection capabilities. Its the first solution to optimize and

accelerate traditional backup software interaction with deduplication storage.

By distributing parts of the deduplication process to the backup server, DD

Boost speeds up aggregate backup throughput on EMC Data Domain deduplication

storage systems by an average of 50 percent. It significantly reduces load on

backup LANs and backup servers. DD Boost, is an advanced transport and

management interface to Data Domain systems, is already supported by third-party

backup software such as Symantec NetBackup and Backup Exec.

For non-NetWorker customers, with the introduction of DD Boost Software for

Data Domain Deduplication Storage Systems, EMC is enabling faster backup and

efficiency right from the backup systems to the Data Domain systems. DD Boost

enables backup applications to control network-efficient EMC Data Domain

Replicator software so multiple copies of data can be managed from the backup

application's console, giving administrators a global view of their backup and

disaster recovery (DR) environment.

Advertisment

Leveraging the replicator software, DD Boost can automate WAN vaulting for

use in disaster recovery, remote office backup, or multi-site tape consolidation

initiatives.

For NetWorker customers, the EMC NetWorker integration with EMC Data Domain,

combined with its existing integration with EMC Avamar, makes it the only backup

software application to provide seamless integration with the industry's two

leading deduplication solutions. The new capabilities of EMC NetWorker backup

and recovery software will dramatically increase backup speed and management

through the innovative new integration with DD Boost software for EMC Data

Domain deduplication storage systems.

What are the factors pushing the demand of data deduplication solutions?



Data deduplication is the process of comparing data with existing data and

identifying if the data match or are identical. If the data is found to be

identical, instead of saving a second copy of the same data, the technology

links the file to the original data, thus eliminating the need for another copy

of the same data. This saves disk space, enhances system memory and improves

system performance, and also enables a faster backup process.

Advertisment

Deduplication solves the customer's need for faster backup and importantly

retention problems, when compared to backup to tape option. Data deduplication

technologies allow for a much more reliable disaster recovery compared to tape

backup as the Data Domain replicator copies the data in a separate remote site

backup location.

Deduplication can reduce network bandwidth and backup storage by a factor of

300. This kind of savings in storage and network bandwidth is the biggest driver

for data deduplication.

What are the challenges that customers (CIOs) should be aware of while

adopting data deduplication?



While adopting data deduplication or running deduplication tests, CIOs must

ensure that the data represents the type that will be prevalent in the relevant

organization's environment, as due to the varied type of storage needs across

industries, no one single data set will be accurate in representing the results/

efficiencies from a deduplication exercise. CIO's should also ensure that

replication is supported. Deduplication is a win-win proposition. And, the RoI

improvement amplifies if it implemented correctly.

Advertisment

Challenges customers should be aware of:

  • Know the mix of data types because as based on that data dedup ratio

    varies and will impact the RoI. User created data like MS Office, dedups data

    very well unlike natural data like seismic data or encrypted data.
  • Understand the data deduplication process well whether to use source based

    or target based.
  • Must understand that the backup policy highly impacts dedup ratios. How

    often they do full or incremental or differential backups. Higher number of

    full backups, higher dedup ratios.
  • As each customer's data is different, to avoid any confusion later, they

    should always try to do the dedup assessment.

komal Langar



komall@cybermedia.co.in



The author was hosted in Boston by EMC

Advertisment