Search This Blog

Friday, February 12, 2016

Big Data Disruptions, best practices

In my previous two blogs I touched upon what is data architecture, how it is important, overview and its relationship with the Information architectural layer.
This blog I wanted to touch on the best practices for Data Stewardship and the big data disruptions.

Lets start with what is Data Stewardship? According to http://searchdatamanagement.techtarget.com/definition/data-stewardship, data stewardship is the management and oversight of an organizations' data assets to help provide business users with high quality data that is easily accessible in a consistent manner. These roles are common in organizations that are attempting to exchange data precisely and consistently between computer systems and also making them available for reusability in the future (ref: https://en.wikipedia.org/wiki/Data_steward)

Some of the benefits of data stewardship that I see, with reference to those listed in wiki https://en.wikipedia.org/wiki/Data_steward
  • consistent use of data management resources
  • easier mapability between various computer systems
  • lower costs during migration
  • avoiding redundancy and overlapping of information across the layers
  • compliant with the organizations informational layer
  • reusability
Now lets focus on some of the viewpoints shared in a Gartner article (https://www.gartner.com/doc/554646/best-practices-data-stewardship)

- data quality needs to be considered within a business process
- need of an effective governance strategy for data quality across the entire organization

 With big data comes the efforts of managing this large amount of complex data within an organization. If the data quality is not maintained, as noted in the Gartner article, it can lead to the failing of many strategic business initiatives. Programs such as CRM (customer relationship management), BI will not be able to generate enough business if the quality of the data is not improved. The article talks about various ways in which this problem can be alleviated. One of the proposal that I don't completely agree with is " stewards residing in business and not in IT organization". In my opinion, even though we identify that the business is responsible for the quality of data, stewards should reside in the middle layer that connects both business and IT. As much as Data stewards need to understand the business, they also need to understand IT infrastructure, so that they can make sound decisions when it comes to harnessing data with quality, and if the current data architecture of the organization can even sustain such proposal. The only way data stewards can be respected within the organization is if they have a combined knowledge of both business and IT infrastructure. Having one of the two will not give them a holistic perspective of the entire data setup within the enterprise architectural layer.
Data steward in the middle of Business and IT layer


 
Lets quickly touch on various roles/ responsibilities of data steward (ref: https://www.gartner.com/doc/554646/best-practices-data-stewardship)
  • ensuring the consistency and accuracy of data
  • implementing governance tasks and achieving data quality metrics
  • responsibility in master data management objectives
  • identifying issues with source systems
  • updating and maintaining documentation, taxonomies
  • proactively finding errors/ bugs or issues within data
  • ensuring data is compliant with industry standards
Lets shift gears and talk a little on Big Data Disruptions. Ref another Gartner article (https://www.gartner.com/doc/1964716/big-data-disruptions-tamed-enterprise)

Big data is an eye opener to the challenges that are being faced by the organizations on a daily basis related to data. It provides an insight to the deepest levels of the organizations, their relationships within the company and outside the company. If not handled properly by the business/ enterprise architects, they big data challenge can have a negative impact. Some of these impacts are, also as discussed in the Gartner article, reference above,
  • although big data shows patterns on data types, if this information is not turned into a competitive advantage, then there is no use in spending money on all of these tools.
  • big data will give a visual representation of cultural issues to business and IT leaders, this can be a positive impact, since the leaders can identify and work on solving these issues in a more proactive and effective manner
I do want to highlight one point that is mentioned in the Gartner article (ref URL above); "big data creates business value by enabling organizations to uncover previously unseen patterns, and to develop sharper insights about business environments". I strongly agree with this statement, since I believe Big data makes an organization very powerful, since they have a better understanding of the business processes within their environment, and the steps they can proactively take to either enhance the processes or work on the gaps and issues that got identified during this process.

Big Data - Handle with Care
 

To summarize, as long as we as an organization understand the criticality and importance of Big Data, convert the findings through data mining into effective and impactful strategies, as an organization we can be more efficient, have stronger relationships with our employees, vendors and other organizations.
 

Relationship between Information architecture and Data architectural layer

Data and information layers are interconnected in my opinion. According to Platt (2002), and as mentioned in one of my course readings " the information layer describes data that the organization needs to run its business processes and applications", in other words, without data information layer cant exists, and vice versa.
While we are working on streamlining data architecture within the information layer, it is important to take into consideration, and something that a data architect should keep in mind;
  • data recovery
  • data modeling and analytics related to the organization
  • data handling
  • big data usage and how it can be used within the enterprise
  • data security, when the data is being used from outside the organization
  • data storage, whether its physical servers or cloud.
  • easy data accessibility and availability
From my experiences in the Information Systems domain, we have to be very careful in layering the data layer. This layer not only interacts with the inner architectural layers within an enterprise but it also talks/ interacts with the external applications both for in flow and out flow of the information. To ensure that this is well detailed, the use of data modeling/ UML diagrams and analytics software come into use, and allow us as architects to get a visual of the entire process. This also helps in cleaning out redundancy, overlapping of information assets, that could arise over the course of the data flow/ information flow within the enterprise.

A great blog that I ran across on IBM, layers the interaction of data in the BI (Business Intelligence world), data repositories, data warehouse architecture,  http://www.ibm.com/developerworks/data/library/techarticle/dm-0505cullen/
One of the diagrams that really caught my eye is where the analytics layer that provides the business analytic applications and their underlying capabilities and services, adding value to all areas in the enterprise.
Referencing http://www.ibm.com/developerworks/data/library/techarticle/dm-0505cullen/part4image3.gif
 

Data Architecture - quick overview

Over the past couple weeks, i have talked about Enterprise Architecture as an overview, the various layers within it, and about application architecture. In this weeks blog, i will discuss about Data Architecture. As the name suggests, data architecture is the way data is organized. Just as the application architectural layer is important, for this layer to work we need data. That is why I feel it is very essential to have a well organized and consistent data flow throughout the architectural layers in order to maintain consistency overall. It is essentially a lifeline to the entire enterprise.

Data - lifeline to all the architectural layers


This brings me to the other topic where data is either in house production or is being fed by the dependent applications. In order to ensure that we are getting a constant supply of data from the outside systems, it is very important to have a consistent and systematic infrastructure in place. The way we can achieve this by proper planning using data models and the other tools that are available in the marketplace. Here is a quick wiki link to the various data modeling tools and their comparisons.

Since the data architectural layers value is in its planning and predictability model, the data architect divides the information into 3 architectural processes as noted in my classroom readings;
  • Conceptual
  • Logical
  • Physical
where all three processes represent three different components, such as conceptual is for business entities; logical is the logic behind these entities interaction; and physical is the meaning behind where and how are we sharing and storing data (servers, history, integration, analytics, etc.)