Week2: Data, Information, & Knowledge–R&R Jinxuan Ma No unread replies. 3 3 replies. Read & React Posts (original posts due on 6/2; comments due on 6/4)

General Guidelines for Writing Read, Reflect & React Posts:

(1) READ & REFLECT: Read the assigned number of articles on the reading list for this week. Post your reflections and critical comments on the course topic/s. For your post:(a) identify TWO main takeaways from this week’s readings/news report and (b) briefly explain why these resonated with you–these should be critical reflections and NOT summaries.

Discussion board postings must be substantive. Original (Read & Reflect) posts should be at least 300 words.

(2) REACT: Post a substantive comment on another student’s posting for this week. These comments (React) posts must be substantive (cite authoritative sources and provide reasons for your position) and should be at least 100 words.

You may post more than one reply and continue the discussion if you wish. At the end of each post, list the articles/resources you chose to read for that week. Use APA 6th edition format for your citations.


Week2 Reading List and Discussion Tasks:

1. Week2 SlidePreview the documentView in a new window

2. Data Scientist_ The Sexiest Job of the 21st Century (Links to an external site.)Links to an external site.

3. The Data Science Venn Diagram — Drew Conway (Links to an external site.)Links to an external site.

4. Search and review the online sources listed in the Week1: Course Learning Resources regarding trends of data science research and application.

I began this week by reading A Nation Transformed by Information: How Information Has Shaped the United States from Colonial Times to the Present edited by Alfred D. Chandler and James W. Cortada which is a collection of 9 essays written by historians and technology professionals. After doing research on the origins of informatics for last week’s class, I knew I wanted to put the ideas of Big Data/Data Scientist into historical perspective. Having previously read Chandler’s 1977 Pulitzer Prize-winning work Visible Hand: The Managerial Revolution in American Business, I knew that the idea of business analytics really began in earnest with the rise of American Railroad network in the 1840-1870s. These enterprises were the high-tech startups of the 19th century and they required new methods to enable efficient and effective control for traffic, passengers, and services to be effectively routed and processed. New methods of accounting were created that relied on the collection of data. This, in turn, created the need for information processing technology – called office appliances in the parlance of the day. And scientist and engineers to manage and design the systems of data collection. Massive punch card readers and sorters, adding machines, cash registers, file cabinets, telegraphs, stock tickers, telephone, mimeograph machines, typewriters were just some of the information technology that was developed and deployed to manage these new complex networked enterprises. Life Insurances companies which were an almost purely information based business had its origins in this period as well. Creating the need for trained statisticians to draw up actuarial tables to guide the managers to the most profitable segments of their markets.

I could continue with even more examples but I think the point has been demonstrated that Data Scientist is not a new profession. What is new is the type of information technology that is available to guide decision making and how to utilize it in the 21st-century context. As Conway illustrates with his Venn diagram, Big Data analytics and the data scientist who are responsible for taming the data into forms that kind guide decision making choices for managers of institutions, businesses, and governments has continued to be drawn from the engineering and scientific disciplines as were the scientific management and system analyst of the 19th century. Davenport makes clear that there has been amplification on the types of data available, the tools to overlay varying data sets, and the types of questions that can be addressed. Because of the high knowledge barriers to the field, it is highly paid work and inadequately supplied with qualified personnel. This situation is analogous to the 19th-century need for trained personnel and the response has been not surprisingly similar. Create professional journals to share information, develop educational training programs to supply the need, and create various professional credentialing organizations to meet those needs. Robb in his article Big Data Certifications lists 15 certifications from companies like Microsoft, Oracle, IBM and education companies promoting Hadoop certifications.

Let’s bring things into the library context.

The library has always been a major part of the Big Data infrastructure. Being one of the largest repositories of data sets to draw upon and creating finding tools like the card catalog, classification systems, metadata, and self-improvement training. But as the tools of data analysis have evolved the library as an institution hasn’t kept up. Academic research technology is still limited mostly limited to older methods of finding and retrieving. Use any of the standard academic search databases available, and you will be struck by how much better the search tools could be. When you click on the name of article author, you aren’t directed to an analytics page of their body of research showing data from citation analysis, influence, chronology, a complete bibliography, biography, or summarization of their work. Instead, you are giving a scroll-through list of the author’s books and articles available in the database. Assuming no prior knowledge on the part of the person using the discovery database, there is no guidance on which article or book you should choose. Wouldn’t it be better to know if a particular work is foundational to the topic you a researching? Or that the researcher is under a cloud because of ethics lapses? Big Data can be fruitfully applied to academic research and librarians trained in informatics should be at the forefront of that mission. As an example of what’s possible I direct you to https://www.worldcat.org/identities/lccn-n81054764/ (Links to an external site.)Links to an external site. which is provided by OCLC and is the scholar identity page for Carol Kuhlthau. There is a wealth of information that can more efficiently guide a student or scholar into the work of Kuhlthau. I especially like the timeline which is helpful in situating her in historical context as well as providing a developmental path of her research. As Chen notes in her article, “When creating graphs, charts, maps, or other graphics, you want to make certain that the data depicts your message with clarity and precision so your target audience can gain useful insights and discern relevant trends.”

How much could more value be added to existing academic search tools by increasing the visual display of information?

Takeaways from this week's reading: 1) data science is not a new profession and there is value to looking at how past organizations responded to challenges that required new methods of information gathering to guide decision making. 2) It strikes me that the data scientist role is part technical maestro and part artist. Knowing what data sets to mash up, and how best to get value from them as in the case at LinkedIn demands a high degree of creativity. And though it is not explicitly mentioned in the Davenport article, there is a good case to be made that creativity practices like Prince and Gordon' Synectics, Altshuller's TRIZ, or the 6 thinking hats of Edward de Bono would be excellent parts of the data scientist curriculum.

These two points resonated strongly with me because I think informatics while being a highly technical discipline is also highly creative. Much like engineering and science in general, informatics has the potential to lead to a better future by creating new insights and offering better tools that can guide managers to better decisions. Of course, like engineering and science, it can also be used to better target marketing messages and influence consumer behavior in ways that many will find unsavory and even malign. And by studying the challenges posed in a previous century, we can think about some of the issues that will inevitably arise in the 21st century context. What legislation will be enacted to protect consumers? Will a new type of anti-trust law come into being? How will liability laws be affected by the actions artificial intelligence algorithms? Will the use of copyrighted works for machine learning affect copyright laws? And given the ability to manage larger enterprises using Big Data, can we expect to see Hyper-Corporations as the next evolution of Big Business?

Chandler Jr., A. D. & Cortada, J. W. (2000). A nation transformed by information: How information has shaped the United States from colonial times to the present. Oxford, England: Oxford University Press.

Chen, H. M. (2017, May 1). Design and conquer: Create compelling graphics. American Libraries. Retrieved from https://americanlibrariesmagazine.org/2017/05/01/information-design-and-conquer/ (Links to an external site.)Links to an external site.

Conway, D. (2010). The data science Venn diagram [blog post]. Retrieved from http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

Davenport, T. H. & Patil, D. J. (2012). Data scientist: The sexiest job of the 21st century. Harvard Business Review, 90(10), 70-76.

Robb, D. (2017, May 31). Big data certifications. Datamation. Retrieved from http://www.datamation.com/big-data/big-data-certifications.html (Links to an external site.)Links to an external site.