Readings are up for next week :) Following up on class discussion yesterday, here is the paper that I thought of assigning but was concerned about overloading people: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/05/Bishop-MBML-2012.pdf This is not assigned, but it's an excellent paper and feedback is welcome. Also, someone asked about Gaussian processes yesterday. In thinking about the whole context of that discussion, I thought it might be helpful to recap some terminology. A random process is something that creates non-deterministic (uncertain) outcome Examples: Rolling dice (fair dice or not) Flipping a coin (fair coin or not) Random sampling Counterexamples Flipping a coin with two heads (because there is only 1 outcome, it is deterministic/certain) A random variable maps the outcome of a random process to a number. Example: Flipping a coin Random process: flipping a coin Random variable X where 1 is heads and 0 is tail...
Helbing et al. (2017)
ReplyDeleteFirst I would like to begin with a quote from Helbing et al. (2017) on page 39: “We must realize that big data, like any other tool, can be used for good and bad purposes.” I believe this quote is an excellent summary of this paper as well as this semester. Yes, there are many legal and ethical concerns with AI and Big Data; however, there are also ways that both AI and Big Data can improve our world. The key will be to enact processes and laws that facilitate the use of AI and Big Data in an ethical manner. The article starts by discussing many different concerns about AI and Big Data. The ones I take particular interest in are the “Citizen Score” and if Big Brother is now becoming a reality. Another particularly interesting topic to me is the idea that nudging can allow us to have fewer choices without us really even noticing. I think back to the idea of the oppressed group of people we discussed a few weeks ago. I think this nudging could limit thought and diversity and this is very concerning to me especially in the context of “social polarization” (p. 7). In a time of political instability, we do not need for the gap in understanding to become even larger and the thought that suggestions that my phone gives me may make me more likely to be stagnant in my opinions worries me. Opinions and beliefs should evolve with experiences and exposure and if our current beliefs continue to be reflected back to us through the tracking and suggestions or our technology, how will we ever grow? Another concern that I have voiced in my personal life is the Terms of Use Policies most companies employ. For the most part, if you do not agree with their terms of use, you can’t use their product so it is taking our choice away. Recently there was a 60 Minutes episode that addressed many of these concerns as well as the push for legislation that will protect consumers and distinguishes who owns the data that is collected about the consumer. These are issues that need to be addressed to protect consumer rights and privacy and have been largely overlooked. The principles outlined on pages 14 and 15 are an excellent starting point to address many of these issues. Although what I have listed is looking at the negative side of AI and Big Data, if used and controlled properly, the positive impacts that AI and Big Data can have on society are promising. The one that I am probably the most interested in is the impact on health.
Monroe (2013)
The five Vs outlined in this article are volume, velocity, variety, vinculation, and validity. Although this paper is written in the political science view and in regards to other political science papers in the issue, I feel that it is a good summary of much of what we have been discussing this semester and has many parallels to the different topics each week. For instance, Monroe (2013) discusses the idea of crowdsourcing and the reCAPTCHA system and Amazon’s Mechanical Turk in particular. Crowdsourcing seems to be a topic that is gaining in popularity because of its ability to scale up at a lower cost. Crowdsourcing is not something I was very familiar with but is probably one of the most interesting topics to me this semester. Amazon’s Mechanical Turk is something I do not know much about but plan to use in my research. Monroe (2013) also refers to the topic from the previous week about the text data in the context of volume of data. Validity also discusses many topics we have addressed including Kaggle & Netflix’s competition. Altogether, although this paper is in the context of political science, the themes across disciplines and the positive uses for big data were described in the context of the 5 Vs.
Monroe (2013):
ReplyDeleteI found this article to be very informative and relatable. I’m not sure if it is due to the fact that we have mentioned so many of the models incorporated into this article in previous classes, but I seemed to comprehend so much of what the author was getting at and thought he brought up some really valid points regarding Big Data in the perspective of social and political sciences. Part of me kind of wishes I had read this at the beginning of the semester to lay a solid foundation of what Big Data refers to, but then again, I appreciate the recap of so much we have discussed. The familiar examples (ie Netflix, Twitter, MTurk, crowdsourcing, etc) helped give relevant context to the five points he was aiming to get across. The section on MTurk did stick out to me. Although I have not personally used MTurk for a subject pool, I’m very familiar with it and I thought some of the considerations/precautions that were mentioned brought up important concerns. If MTurk, being a very popular form of crowdsourcing and collecting samples, has this many setbacks, what might be done to improve this? or in comparison, what are better options? as this is obviously a very important aspect of experimentation.
Helbing et al (2017):
Looking at the overarching big picture in each of these articles, I felt like Helbing et al’s (2017) was almost contrasting (or at least taking a very different perspective) the point Monroe was trying to make. While he was arguing the benefits/ advantages AI gives to political and social scientists, Helbing et al. (2017) gives this impression of “hey AI is good and powerful and helpful, but let’s make an effort not let it consume the way we live and how we are controlled by it.” I think one of the quotes that stuck out the most to me was “The digital revolution provides an impressive array of possibilities: thousands of apps, the Internet of Things, and almost permanent connectivity to the world. But in the excitement, one thing is easily forgotten: innovative technology needs competent users who can control it rather than be controlled by it.” AI and Big Data are rapidly evolving and advancing, and I agree that it is very important to recognize to what extent we as users control the use of it.
Another subject matter that caught my eye was the Citizen Score implementation in China. I was not aware of this and it brought up a lot of curiosity as to the pros and cons of running a group of people in such a way? Overall, I thought this was super thought-provoking and it gave me a wider perspective of how AI and big data are contributing to our society as a whole as well as individuals.
Monroe (2013):
ReplyDeleteMonroe proposes five Vs of data science: volume, variety, velocity, vinculation, and validity. I’m interested in his section on validity, where he mentions a war between empiricists and theorists in political science in the 90s. Data science is empirical and inductive. I come from a discipline—philosophy—that is almost exclusively what this paper would call “theory.” Even if some of us, especially those who do philosophy of science, are in dialog with the sciences and with what Monroe calls “empirical” work, our methods are still “theory.” Throughout this course, I have been thinking about what data science and machine learning means for me as a philosopher (or a future one at any rate). Studying texts like Heidegger’s Being and Time or Deleuze’s Difference and Repetition are things that take years of humanistic, hermeneutic study to master. The data science revolution is already starting to come to the humanities under the rubric of “digital humanities.” They’re using things like R and they’re starting to play with the five Vs. But philosophy seems to be particularly recalcitrant. Sifting through tweets or mining texts for stylistics could be really useful for some humanities, but that’s hardly close to helpful for philosophy. For example, text mining can reveal hidden syntactic relations (“vinculation”). It doesn’t show semantic relations, however. We could show all the usages of the word “transcendental,” but what beyond understanding the texts they show up in will show us how radically different that word is in meaning between Kant, Husserl, and Deleuze? It seems that is still something only a conscious thinker can do. I do not think there is anything wrong with “theory” methods in this domain—not at all—but I do wonder, what could a philosopher do with data science? We’ll never have a “theory” vs. “empiricism” war like Monroe talks about, but I still think there is something really interesting that is still untapped and unthought of.
Helbing et al. (2017):
This article critically discusses the erosion of individual autonomy by big data and AI, and then proposes a set of principles by which to integrate them into society towards the ends of increasing autonomy and flourishing. That is, they paint a dystopian picture of current trends and sketch a utopian future escaping from these trends. They’re definitely right that we’re undergoing a massive societal transformation. What I’d like to think about is escaping big nudges and other forms of social engineering. Today it can mean dropping off of social networks, which are used by governments and police for surveillance and big nudges and which also have unintentional polarizing and echoic-chamber effects, or using newer anonymizing and cryptographic technologies like bitcoin, P2P anonymity networks, and hidden services. These are ways of recovering autonomy, but of course they also come with their own worries (like the dark side of the darknet, or bitcoin fostering cybercrime transactions). I wonder what you all think of these techniques and technologies of anonymization?
Helbing et al. (2017) expose the rapid growth of data collected from citizens. They imply caution on how to use the data collected and infers ethical considerations to protect privacy and rights.The fact that Singapore and China use individuals data in a way to police and control people liberties is dumbfounding.
ReplyDeleteMonroe (2013) reviewed 10 Political Analysis articles to describe volume, velocity, variety, vinculation and validity with the goal to establish the benefit of Big Data.
Helbing 2017
ReplyDeleteI enjoyed this article, but it lacked in-text citations and so some of it read more like an editorial rooted in pessimistic opinions than facts. Subliminal advertising is not illegal in the United States, and I’m not sure that it works. If AI controlled our lives but made us happier on average and improved our quality of life, would it be worth it? When the article talks about competitions to incentivize innovation I thought of Kaggle. The case made against Big Nudging seems very weak to me. Global warming and recessions will continue to happen around the world despite almost anything, even in the absence of big nudging. I agree that people should have a right to all data collected about them digitally, but I doubt we will ever get close to that.
The article briefly talks about the misuse of algorithms. A student can use software to write the thesis for him. Andrew, you said you’ve caught a lot of Memphis students plagiarizing. When do you think that algorithms will get so good they can write a paper for a student with a good chance of not being caught, even by you?
Monroe 2013
I like that this paper defined Big Data with the five Vs, including volume which is more digital observations. The paragraph about Mechanical Turk and how it compares to wild, representative population samples and to undergraduates was good to read about since I’ve done work using Amazon Mechanical Turk. The battles in political science over validity in the 90s were fun to read about. I had no idea there was a theory vs. models war going on.
Helbing 2017
ReplyDeleteThis article was interesting, and like other articles we have read this semester, somewhat frightening. At the same time, however, I feel like we have become so desensitized to the technology surrounding us that most people just think "oh, of course data about everything I do is being collected. What else is new?". I also wonder if we would be able to comply with the principles outlined in this article. Though they seem simple enough, even basic concepts like "supporting social and economic diversity" can face backlash in some areas of the country.
Monroe 2013
I appreciate the breakdown of "Big Data" into the five V's here, as it is helpful in wrapping up overall what really defines what big data is. I also like the addition of vinculation and validity to the original three V's, as they expand on aspects of big data that are not as obvious as the enormous volume or wide variety of information you would have. I am also curious to know more about Amazon's Mechanical Turk. I know basics about it, and that it is used both for research and by companies like MoviePass to sort through their data, but I am not sure about what the user base is actually like. Is it available to anyone to use, and is there a way to control for people falsifying answers to complete the work, etc?
Monroe provides an overview of both the defining characteristics and outstanding challenges of Big Data in a relatively current state. Hearing the political science perspective is interesting, because it hits on some important issues other than the “causal vs. descriptive explanations” topic that we tend to focus on as primarily experimentalists. One of the most interesting components of this paper was the section about MTurk. I looked up the paper that is cited, and Monroe gives an accurate description of the main takeaways of that study. It isn’t just bashing MTurk; the researchers were able to more or less replicate certain experiments, though effects were typically weaker. I’m sure we’ll discuss why that might be (and why MTurk may or may not be a more generalizable sample group than undergrad students).
ReplyDeleteThere are a lot of really fascinating topics within the Helbig et al. article. Some of the implications laced in here have been the source material for sociopolitical thought experiments for ages, and as Monroe would have us know, the velocity of these implications is always increasing. I think the thing I’d like to bring discussion towards most is the Data for Humanity Initiative’s ethical code outlined on page 39. I don’t think many people would disagree with these tenets, but I wonder how these will take shape. What would each of these maxims look like in the present day? What about in 20 years? Can these things really be policed, and to what degree?
Monroe (2013)
ReplyDeleteThis article raises a lot of very interesting ideas, but one that I really found important was validity. With all the movements in academia toward more data being better, are we losing our focus? For example, in the “replication crisis” in psychology, the common response is that we need larger samples or more sophisticated statistics. However, does this really get at the heart of the matter? When we are looking at relationships between “wellness” and “resilience” through “flourishing,” is our problem really the small sample, or the fuzzy constructs we are purporting to investigate? Are we be better off diving deeper into a smaller sample, making more broad generalizations from larger more shallow data, or do we need all of these parts?
Similarly, in “big data” there are very fundamental issues with social desirability and other forms of “idealized displays of the self.” For example, many people on Facebook may like or share posts (e.g., GoFundMe), but how well does this align with actual observable behavior? By linking larger data and including longer observation windows, we may be able to move past some of these limitations, but can repeated prediction and learning truly understand the idiosyncratic nature of human behavior (see YouTube or Pandora music recommendations for an example).
Helbing (2017)
There are some interesting points to consider in this article, and I really enjoyed some of the subtle challenging of this alarmism. For example, they mention that “We are experiencing the largest transformation since the end of the Second World War.” This raises an interesting point that technology, itself, has always advanced. From the advent of the original unifacial stone tools (essentially rocks with a piece broken off for an edge) to the complex projectile points (“arrowheads”) of recent times, from the cotton gin to the supercomputer, technology has been a central part of culture and society. However, while I am not a fan of the alarmism expressed by some, this new wave of technological advancement poses a unique new challenge, as it is not technology so much as it is intelligence that we are creating.
They mention the idea of “nudging” which raises further questions: how do these systematically timed/placed probes shape our society? Is it alright to attempt to “nudge” what we see as ideal behavior (e.g., “going green”)? Who gets to dictate what these ideals and does this necessarily lead to polarization (e.g., the backfire effect)? I know I, personally, believe people should be exposed to contradictory positions as many have asserted that without realizing the possibility that one could be mistaken, you have not really THOUGHT about anything. However, does this mean we should be exposed to harmful and inflammatory ideas? It seems to me, anecdotally, that the more we try to intervene, the more polarizing our society seems to be getting.
(Sorry for the verbose responses, these are very fascinating topics)
Monroe (2013)
ReplyDeleteThe author summarized five Vs of big data: volume, velocity, variety, vinculation, and validity. The simplest measure of volume is file size. Every day people generate lots of data. How will people deal with past data? Maybe some data is time-sensitive and it can become junk. Will people reserve all the data or refresh the data?
Helbing 2017
This is an interesting paper talking about the worries of super-intelligence which might harm Democracy and how to solve it. I think it is true that artificial intelligence could make mistakes and be manipulated. However, people won't be controlled by super-intelligence. They are just the tools we develop and use in our life. We use data and algorithms to build models to predict future behaviors or trends, but we can't make sure the model is 100% accurate. If something can harm democracy, it is human ourselves. Someone might change the data to influence others' decisions. We should care more about the how to use and protect the data. I am curious about the data-controlled society. What would a data-controlled society look like?
Monroe 2013
ReplyDeleteThis paper is a really good, concise overview of different challenges in big data in the field of political science. I am curious though how some of the sub challenges in each of the five Vs are actually tackled. In volume, for example, it mentions high dimensionality, which I know can be tackled by SVD and PCA, but what are the specific techniques to deal with stuff like linkage? Also, how does the Bayesian metric that was mentioned work to tackle high dimensionality? I am also curious as to the section on M-Turk. They seem to mention that M-Turk does pretty well by a lot of metrics, but I have certainly heard some stark criticisms of it, so I’m wondering if the negatives only lie on certain dimensions, (ie ones not measured by the study mentioned)? Obviously with certain types of studies it is very difficult to get around the college freshman/WASP problem, but I particularly liked that in the variety section, at least for some social science survey studies, spatial sampling using GPS devices can be used to get better samplings of a population. I understand that velocity can be a problem with analyzing data, but how closely linked are velocity and volume? For example, if a massive dataset needs to be analyzed, maybe it needs to be partitioned or tackled in many subsets. Of course that is a volume issue, but it seems that volume and velocity are linked in many situations. As for validity, he speaks of face validity as a necessary condition, which makes sense to me, so long as good visualization techniques can be applied to the data. I know that SVD and PCA can be used to reduce dimensions making visualizaiton for high dimensional data possible, but what are other forms of Big Data that are much more difficult to visualize?
DeleteHellbing 2017
In this paper, Hellbing discusses the automation and/or pre-programming of everything including the population. One important issue he touches on is unintended side effects and the fact that it is often not clear what would have been best until after a measure has been taken. While negative side effects are of course cause for concern, I think this problem is not quite as serious as many would think. When faced with a certain issue, if it’s serious or if we care about it enough, we attempt to tackle it. Hopefully, and usually we do that, if not in the absolute objectively best way, in a relatively efficient and practical way. The way in which we solve said problem may have unintended negative consequences, but then that simply brings more information for us to learn from and new problem to solve. That new information can then be integrated and the new problem that arose can be added to the list of problems to be tackled. I fail to see how this is a huge flaw considering perhaps the only other option would be to do nothing about the thing originally deemed a problem for fear of consequences. In that case though, aren’t we just sitting around not solving the societal problems that need to be solved and thus halting progress? This entire process, to me, applies equally to human driven problem solving equally as it does with automated problem solving. We only need to ensure that the consequences are not so undesirable so as to be our undoing, which I think likely just involves taking baby steps in the solving of each problem instead of trying to tackle one large problem with some sweeping, grand solution. Additionally, simply by pointing out many of these problems of manipulation and control that he does, he has already contributed to bringing these issues to light, which is the first step in solving the problems he lays out. I do however, completely agree with prioritizing diversity and reducing the echo chamber effect. Recommendations tailored to an individual can certainly be useful, but I think we must encourage people to try to consume new information that is quite different from that which they usually consume. Should this be done on an individual level though or through the recommendation system? If it is done through the recommendation system, how is it any different than the benevolent totalitarianism the author is afraid of? Finally, what are the best ways to guarantee that citizens retain self-determination? Surely through both laws and societal norms (especially those at the corporate level), but it seems like a very difficult problem to solve. How specifically do we regulate these kinds of automation and how do we implement laws regulating them that are also ethical?