TBW ADVISORS LLC

Category: Research

Whisper Report: Cloud Advantages Found on-Premise and at the Edge

Online Research Summary

ABSTRACT

Clouds are known for the ability to be purchased via operational expenses (OpEx) versus capital expenses (CapEx). In addition, clouds provide software, infrastructure and even the platform itself ‘as-a-service,’ enabling auto-scaling, auto-update capabilities and ease of administration. This research examines vendor products that bring cloud-like features to the edge and on-premise. This includes capabilities that enable a seamless edge/cloud hybrid experience.

October 8, 2019
Conference Whispers: PACK EXPO Las Vegas 2019

Conference Whispers: PACK EXPO Las Vegas 2019 Video Playlist TBW1010

ABSTRACT

PACK EXPO Las Vegas 2019 allowed 30,000+ attendees to see over 2,000 machine providers over 900,000 net square feet of exhibits. There were vendors ready for Industry 4.0 that could collect any and all of the data you would like – and those that weren’t. Some data requires a separate license, some is stored in historians, and some is stored in the cloud. There were machines that are reconfigurable and those that can teach themselves. There was even a robot with a neural network embedded on a chip that only needs to see examples to learn.

September 30, 2019
Whisper Studies: Migrating an Enterprise Data Warehouse to the Cloud
Online Research Summary

ABSTRACT

This Whisper Study demonstrates two “lift and shift” as organizations migrate their enterprise data warehouses to the cloud. The first is a “lift and shift” from Teradata Active Enterprise Data Warehouse on-premise to Teradata Vantage deployed in Amazon Web Service (AWS) leveraging the Teradata Everywhere license. The second is an update to near real-time product data on the web followed by a “lift and shift” from ETL jobs into Exadata to IPaaS integration storing in Amazon Redshift.

The Studies

Bayer Crop Science EDW Migration to the Cloud
- Scenario: Enterprise data warehouse migration to the cloud
- Start State: Teradata Active Enterprise Data Warehouse on premise
- End State: Teradata Vantage on AWS
- Data Size: 60-80 terabytes
- Transition Time: 6 months discovery, week move, weekend cutover
- Interview(s): July 2019
Retailer Moving to Near Real-Time Data and Cloud Data Warehouse
- Scenario: Desired near real-time e-commerce and data warehouse migration to the cloud
- Start State: Manual e-commerce site and batch Exadata-based data warehouse on premise
- End State: Near real-time e-commerce and Amazon Redshift (cloud data warehouse)
- Data Size: 2 terabytes
- Near Real-Time E-Commerce Transition Time: 1 month planning, 2 months implementing
- Cloud EDW Transition Time: 45-day pilot, 5 months lift and shift
- Interview(s): August 2019
  
  Key Takeaways
- When transferring large amounts of data, such as during a lift and shift, be sure to check your network packet size. Using a default packet size involves significant overhead creating network latency.
- Leverage IPaaS to make up for the differences between on-premise data warehouses with integrity checks and a cloud data warehouse without integrity checks. IPaaS can also accommodate data format differences.
- IPaaS templates with dynamic binding in the new world can make up for years of ETL jobs burdening older data architectures.
Migrating to the Cloud

Organizations are regularly considering migrating to the cloud, which raises the question: how difficult is it to migrate an organization’s enterprise data warehouse (EDW) to the cloud? This Whisper Study examines how two organizations succeeded in this task that often seems insurmountable. The first case remains with the same data vendor after the cloud migration. The second case utilizes a different vendor on the cloud from on premise and follows a near real-time update of quality product information to the web application, which is also discussed.

Bayer Crop Science EDW Cloud Migration

Bayer Crop Science was migrating its data warehouse to another location when they were approached to use Teradata Everywhere¹to migrate to the cloud. Teradata Everywhere is an approach that allows users the flexibility of leveraging the license on premise or in a cloud. At that point, Bayer Crop Science decided to go ahead and “lift and shift” data from the enterprise that was on premise in a Teradata appliance to Teradata Vantage on AWS, as depicted in Figure 1.

Bay Crop Science EDW Background

When the Teradata appliance was first stood up at Bayer Crop Science, it was directly connected to their ERP system. It operated near real time and included all major enterprise data from order processing to finance. Regions had their own data warehouse and data marts. Global functions leveraged the Teradata appliance since all data was already present in the appliance.

How to Move the Data

Bayer Crop Science planned to move the data in three phases. The first phase involved testing how the other two phases were going to be executed. The test had to be successful before they attempted to move development and testing environments’ data. Only after a successful test move and development move did they attempt to migrate production data.

The first hurdle was to execute a successful test data move to the cloud. The following approaches were made:

Amazon Import/Export Service²: This is a service provided by Amazon to assist clients in shipping their data to the cloud. Once received, the turn-around time to install the data is over 24 hours in addition to the transport time. Unfortunately, the ability to be down for several days was not an option. In addition, the data import did not check for errors, leaving the data unusable without extensive work.

Teradata Backup and Recovery (BAR)³: Backup and recovery is a common capability. It enables a system to quickly change from one copy of the data to another. As it is used for recovery from disasters, the second or back-up copy of the data is frequently at a different location in case the disaster was location based. Most organizations’ business continuity requires this capability to be somewhat seamless and regularly tested. This solution appeared to be straightforward until network latency issues were found while moving the data.

Hunting down the Latency Issue

When transferring data over the network, it is easy to monitor the amount of data being transferred. Unfortunately, in the early test moves, data did not seem to be flowing. Instead, Bayer Crop Science experienced significant latency in the actual data coming across the network. This problem stumped the entire team for a couple of months. The team searched high and low to try to understand why the desired data wasn’t flowing to the cloud!

The problem causing network latency turned out to be the default TCP/IP packet size. Each packet has a header and footer with routing information and other overhead information to ensure the packet makes it to the destination. The payload is the portion where the data you are trying to transfer is put within each packet. Unfortunately, by using the default packet size, too much room was taken up in the overhead of the packet instead of the payload. By not enlarging the payload size in the packet, too high a percentage of the transferred data was the overhead vs the payload. Figure 2 depicts an example of the variable payload size within a packet. The payload can be increased up to the maximum segment size⁴.

The Actual Move

With the data latency issue solved, the team verified the test move. With the success of the test data move, the team moved onto Phase 2: migrating the development environment. Success again. For the third phase, they told BAR to back up their production Teradata appliance to AWS. The team then kept track of all changes since the backups were created in AWS. These final changes were the only remaining updates required in order to officially have a full data set on AWS and were migrated over the weekend. It is valuable to note that a back-up copy of the logged changes was made prior to using the logs to update the AWS data version. Once the final changes were migrated to AWS, the switch was flipped to move the system to their backups. In this scenario, the backups on AWS were the new primary home. The cloud migration was complete.

Since the migration, the corporation was bought and changed its name to the form in this document. The acquirer was 100 percent on premise, and as such, the environment is now technically hybrid. The team plans on leveraging the success from their cloud migration to evolve their new enterprise.

Retailer Moving to Near Real-Time Data and Cloud Data Warehouse

The second study involves a large retailer outside of North America that had the general business problem of being unable to update their full product catalog online for e-commerce. If a sweater only has two colors available for any size, the e-commerce site should never show all colors that were once available, only the current colors and sizes. Their solution was not real time as the website was not able to directly connect to the product information and provide complete, accurate, near real-time information. Instead, a highly manual process involving spreadsheets was used to get product information to the web application platform. Of course, once they updated their systems to real time, they desired near real-time analytics. As you might imagine, not only did their web application lack near real-time data, their data warehouse was also not in real time. Generally, it was an old-fashioned data architecture. The desire was to modernize, enable the e-commerce platform to become real time, and then transition the warehouse to near real time as well.

Phase 1: Near Real-Time Web Application

In the beginning, the web application lacked the ability to reflect current inventory levels, and the data was not fresh and accurate. In addition, the e-commerce site only represented a very small portion of their product portfolio – well below 25% of all potential products. They had a strong suspicion that representing their entire product line with near real-time data on their e-commerce site would result in revenue increases. In the beginning, the process for loading information to the web application was fundamentally manual involving a lot of Excel, as represented in Figure 3.

Of course, if it were easy to connect the ERP system to the web application directly, they would have likely already integrated the two systems. The first stop was to leverage Informatica’s Product Information Management Solution (PIM⁵) which is tailor-made to clean up your view of your product data by creating master data management (MDM) for your product data. As with most migrations and upgrades, there is always a surprise or two. In this scenario, once they had MDM, they realized that not all data sets used the same time standard, in addition to other data quality issues. By leveraging Informatica Data Quality (IDQ⁶), this problem was quickly and permanently solved. More important, within a quarter, they were able to successfully represent accurate inventory for 60% of their product portfolio. They were able to reach 100% online within one year of the project. The new solution is represented in Figure 4.

Phase 2: Near Real-Time Cloud Data Warehouse

Now that their e-commerce site represented their entire product portfolio with accurate real-time information, the business desired a near real-time enterprise data warehouse for near real-time analytic insights to match. The largest difficulty was estimating how much work everything was actually going to take to accomplish the migration. Their existing data warehouse was an Exadata, on-premise warehouse they started creating 10 years ago. Like most data warehouses, their ETL developers were busy constantly adding new data combinations to the warehouse for the business.

The challenge was how to capture and reflect all these ETL jobs that took 10 years to create.

To address the differences and accomplish the migration with all ETL jobs represented without spending 10 years on the task, the team decided to leverage Informatica’s IPaaS (Integration Platform as a Service) solution, known as Informatica’s Intelligent Cloud Services (IICS⁷). Next, they leveraged IPaaS templates to represent the ETL jobs. These templates enable dynamic binding of specific jobs to the templates. Thus, the templates were simply called with the data set needing that template’s represented data transformation. Now, a single template could represent a thousand different ETL jobs from the old data warehouse. In addition, the use of IICS allowed the IT team to confirm success when data was transferred, as data integrity was not provided by Amazon Redshift.

To accomplish the lift and shift, each data source was first connected to the IPaaS platform. Next, the dynamic binding of the templates was used to create the new data warehouse for those data sets. In the beginning, they prototyped and tested a single topic over 45 days. The new solution was in the cloud and provided a continuous flow of data, thus enabling near real-time analytics. Not a single data model had to change. The new EDW being tested in the cloud ran concurrently with the on-premise, soon-to-be-retired EDW. This gave the business and IT time to verify the plan was working, and the new EDW was as expected.

After the first successful topic was in the new data warehouse, it took the team another five months to finish the “lift and shift.” No redesign of data models was required. In order to efficiently accomplish the “lift and shift,” multiple IPaaS accounts were leveraged, providing additional bandwidth for the project.

TBW Advisors Recommended Reading

“Conference Whispers: Informatica World 2019”

“Whisper Report: Digital Transformation Requires Modern Data Engineering”

“Whisper Report: Six Data Engineering Capabilities Provided by Modern Data Virtualization Platforms”

Citations
©2019-2024 TBW Advisors LLC. All rights reserved. TBW, Technical Business Whispers, Fact-based research and Advisory, Conference Whispers, Industry Whispers, Email Whispers, The Answer is always in the Whispers, Whisper Reports, Whisper Studies, Whisper Ranking, The Answer is always in the Whispers, are trademarks or registered trademarks of TBW Advisors LLC. This publication may not be reproduced or distributed in any form without TBW’s prior written permission. It consists of the opinions of TBW’s research organization which should not be construed as statements of fact. While the information contained in this publication has been obtained from sources believed to be reliable, TBW disclaims all warranties as to the accuracy, completeness or adequacy of such information. TBW does not provide legal or investment advice and its research should not be construed or used as such. Your access and use of this publication are governed by the TBW Usage Policy. TBW research is produced independently by its research organization without influence or input from a third party. For further information, see Fact-based research publications on our website for more details.
September 18, 2019
Whisper Report: Seven Security and Governance Data Space Issues CxOs Don’t Know About

Online Research Summary

ABSTRACT

CCPA and GDPR expect CxOs to be able to answer the question, “who shared what customer data with whom.” Unfortunately, due to configuration errors, missing driver updates, missing log files or lack of understanding of vulnerabilities, many CxOs are not aware of which data copies exist, let alone how they are being shared. This research shares seven security and governance issues in the data space that compromise security and governance, yet, unfortunately, remain generally unknown by CxOs. Remedies are also discussed.

August 29, 2019
Conference Whispers: Black Hat USA 2019

Conference Whispers: Black Hat USA 2019 Video Playlist TBW1006

ABSTRACT

Black Hat USA 2019 provided 20,200+ attendees with an opportunity to dive into training, try their hand at white hat activities, attend briefings, listen to the keynote and attend a large variety of sponsored parties. Artificial intelligence-based security tools have arrived and may foreshadow the future state of cyber warfare. Solutions for data management and related actions focused on security operations centers were displayed. White hat tools to self-test Windows 10 against process injection and self-protecting data software development kits were announced, as were vulnerabilities in common consumer products. Finally, the gaping privacy loopholes created by GDPR were also presented.

August 15, 2019
Whisper Report: Six Data Engineering Capabilities Provided by Modern Data Virtualization Platforms

Online Research Summary

ABSTRACT

When creating your data architecture, it is valuable to understand the extent of data engineering capabilities provided by modern data virtualization platforms. Many data virtualization platforms provide data as a service, data preparation, data catalog, logically centralized governance, the ability to join disparate data sets and an extensive list of query performance optimizations.

July 30, 2019
Whisper Report: Digital Transformation Requires Modern Data Engineering

Online Research Summary

ABSTRACT

Digital transformation is a highly sought-after end-game for corporations. Scalable digital transformation cannot occur without the modernization of data management to enable business-led analytics. The most common, successful, and scalable data management modernizations involve data virtualization, IPaaS, and data hub technologies to provide a data layer.

July 9, 2019
Conference Whispers: Predictive Analytics World 2019

ABSTRACT

Mega-PAW 2019 clearly demonstrated that predictive analytics is ready for primetime and can lead to fantastic results, but success is far from guaranteed. The 720 attendees had seven tracks across five co-located conferences to choose from. With so much desired content, many found themselves missing a presentation if they hadn’t brought colleagues to cover concurrent sessions—a positive sign of a good conference. While there were some food misses, attendees found the conference interesting and had a takeaway from each speaker.

June 26, 2019
Conference Whispers: Agile + DevOps West 2019

ABSTRACT

Agile + DevOps West 2019 provided attendees with an opportunity to dive into training, attend breakout sessions and listen to keynotes. Artificial Intelligence-based testing tools have arrived to provide additional options to quality assurance teams. The almost 600 attendees had a healthy mix of males and females. While there were some food challenges, attendees all reported a handful of actionable takeaways. TBW takeaways include opportunities to leverage DevOps practices in data pipelines and increase standardization for enterprise quality.

June 10, 2019
Conference Whispers: Informatica World 2019
Conference Whispers: Informatica World 2019 Video TBW1000

ABSTRACT

Informatica World 2019 was great, and seeing how technology keeps charging ahead was exciting. AI is definitely more mature compared to last year. Copying and moving data is easier than ever, as well as maintaining data lineage over all these copies. However, just because you can doesn’t always mean you should; before you let any technology loose, particularly for end-users, make sure you have thought out what it looks like at scale. I look forward to CLAIRE becoming more prevalent in suggesting when data should be copied and where it should be stored for optimal performance and cost.

The Conference
- Informatica hosts their customers, partners and analysts once a year at a conference called Informatica World. This was their 20^th annual conference.
- Informatica is a software company founded in 1993 and headquartered in Redwood City, California. Informatica considers its core products to be Cloud Data Management and Data Integration. Since the company’s beginning, they have assisted over 9,000 customers.
- Informatica World 2019 had 2,600 attendees, 1,200 customers, 460 partners, 44 countries represented and 1200+ sessions.
Highlights
- Well-organized conference with an exciting vibe filled with possibilities. The attendees were engaged, and even the food was better than at standard conference gatherings.
- CLAIRE is a metadata-driven artificial intelligence software embedded in Informatica’s platform. CLAIRE was everywhere; Informatica is leveraging AI for data management engineering in all products.
- Data quality, data movement and the Enterprise Data Catalog (EDC) are areas where Informatica continues to shine and were on display throughout the conference.
- Informatica IPaaS (integration platform-as-a-service) native on the Google Cloud Platform is big news for enterprises with a multi-cloud environment.
Cautions
- Just because you can doesn’t mean you should. Informatica provides powerful capabilities to move and copy data anywhere technical professionals desire, but don’t let the power to easily make endless copies of data loose in your organization at large unless you want to create the chaos of useless similar copies of data.
- There is a missed opportunity to leverage AI to help determine where best to store data. These capabilities would fit nicely in the Informatica Data Hub, IPaaS architecture or their data virtualization.
- AI should never be used to solve a problem that already has a known solution. Most organizations have at least one application with databases where the data model is known and used successfully. AI should never be used to infer a defined data model, only to infer for data sets without a known or defined model.
Conference Vibe

Informatica World 2019 was a fun and pleasant conference. Walking around it was obvious that the memo to wear Informatica orange was read by their employees—whether socks, ties, shirts or pocket squares, the Informatica Team came in orange to show their team pride. The keynote address opened with a local youth orchestra playing as everyone filed into their seats. Before the speakers stepped up onstage, Brian King Joseph, the sensational electric violinist from America’s Got Talent, charged onto the stage dancing around while playing his violin without missing a note. During this introduction the screens displayed notes about data, some of which insinuated that the harmony of the music represents the harmony you can achieve while using Informatica’s various products.

Everything starts with data quality

Data technical professionals understand that garbage-in, garbage-out environments do not yield good answers, and Informatica’s data quality products are one way to start the journey towards better solutions. Various breakout sessions provided examples of business analysts who have cleaned their enterprise data to standardize it for analytics using Informatica. Data quality can be hindered by simple global cultural differences, such as how names are formed, how addresses are created and whether government IDs are public or private. These differences add additional strain on analysts seeking data quality. Audience members seemed pleasantly surprised that solutions for many complex global data quality issues are available off-the-shelf. Once standardized, data can then be combined to provide quality analytics and insights, and can even be used to create trustworthy machine learning models.

Cloud Migration

Whether your enterprise desired simply to move one data set to the cloud or to move all your data, there was an announcement at Informatica World 2019 for you. Currently, Informatica is involved in 8.0 trillion cloud transactions, 1 billion jobs and processes, 45 million continuous cloud security checks and adds 17,500 new endpoints per month. Informatica states that 58% of all companies have a hybrid strategy with data footprints both in-cloud and on-premise (Image 2). In fact, Informatica and Amazon Web Services (AWS) along with Cognizant, announced a free self-service assessment for your data regarding migrating to AWS. The tool specifically assists in creating an ROI business case for a technical professional to get management approval for funding to migrate to the cloud.

EDC

Data Lineage

Preserving data lineage was an exciting underlying theme of the conference. Using Informatica’s products to move and manage your data does not cost you visibility to your data lineage—quite the opposite. Throughout the major functions in the major products, data lineage is carefully preserved. This fact was demonstrated regularly on both the main and supporting stages. Business users and analysts want to make sure that they are using the correct data source, and the ability to verify the data’s source builds confidence in users. Thus, readily preserving lineage is great news.

Informatica IPaaS on GCP

Universal cloud access at scale is here! Informatica has the first official IPaaS solution that is native to AWS, Azure and Google Cloud Platform (GCP). Informatica Intelligent Cloud Services (IICS) is Informatica’s IPaaS powered by CLAIRE^tm. IICS can support microservice architectures across a great diversity of hybrid data locations. Enterprise clients, particularly those in retail, consumer packaging and logistics spaces, have been searching for public cloud alternatives. This announcement puts GCP within reach to all Informatica IPaaS clients and makes Informatica IPaaS a must-have for anyone who desires native connectivity to all three major public clouds.

Next Year’s Conference

Next year’s conference will once again be at the Venetian in Las Vegas, Nevada. The event will occur Monday May 18 – Thursday May 21, 2020.

Corporate Headquarters

2884 Grand Helios Way

Henderson, NV 89052

©2019 TBW Advisors LLC. All rights reserved. TBW, Conference Whispers, Technical Business Whispers, Fact-based research and advisory are a registered trademark of TBW Advisors LLC. This publication may not be reproduced or distributed in any form without TBW’s prior written permission. It consists of the opinions of TBW’s research organization which should not be construed as statements of fact. While the information contained in this publication has been obtained from sources believed to be reliable, TBW disclaims all warranties as to the accuracy, completeness or adequacy of such information. TBW does not provide legal or investment advice and its research should not be construed or used as such. Your access and use of this publication are governed by the TBW Usage Policy. TBW research is produced independently by its research organization without influence or input from a third party. For further information, see Fact-based research publications on our website for more details.
June 10, 2019