Readings for 10/31/18

Readings are up for next week 😀

Please also take a look at the data description for the latest Kaggle lab (link distributed yesterday).

The lab for next week will step through the Zuur protocol to build a predictive model for this dataset.


Comments

  1. Silberzahn et al. (2018):

    The authors’ conclusion focused on the positive effects of crowdsourcing data analysis on science, and on techniques individual researchers can use in lieu of crowdsourcing. I’m interested in the implications this has for statistical findings in scientific research more generally. The authors do not explore the skeptical consequences of what they’re proposing. There are hidden, idiosyncratic decision-making processes occurring that are, by their admission, not being corrected for during peer review. Of course, this was only one exploratory paper. Nevertheless, there is a nice philosophy-of-science skeptical question to be raised about the meaning of statistical findings in science given that the same data resulted in different conclusions for different researchers. Just how biased and idiosyncratic are scientific teams? I’m not sure there is even a good way to quantify things like this. The skeptical question is even more pertinent if you look to the paper they cite on multiverse analysis (Steegen et al., 2016), which only further complicates the story. The Silberzahn paper discusses the many possible analytic approaches and the different results that obtain from them. The Steegen paper complicates what happens prior to that analysis: “raw data do not uniquely give rise to a single data set for analysis but rather to multiple alternatively processed data sets, depending on the specific combination of choices—a many worlds or multiverse of data sets” (Steegen et al., 2016, p. 702). Silberzahn: multiple possible analyses on one data set; Steegen: multiple possible data sets from one raw data pool. Add those together, and you get some interesting questions about the meaning of scientific findings. At the very least, I don’t think any philosopher of science has addressed these complexities.

    Fröhlich et al. (2016):

    It’s interesting that this paper focuses on bringing the hype of personalized medicine and AI down to the earth. AI in the social imaginary is a completely different beast from real AI and has more affinities with science fiction than any real science. Does it matter, and is there any good reason for researchers to try to communicate their work more to the public?

    ReplyDelete
  2. Silberzahn 2018

    Silberzahn et al (2018) addresses the issue of how different analytic choices can have different results on the same set of data. The concept of this article was very interesting to me. Their process for crowdsourcing seemed reliable and their evaluations seemed thorough. It makes me wonder what the larger implications of this idea are as techniques may differ from researcher to researcher, school to school, and subject to subject. As a graduate student taking my first graduate course in statistics, it makes me wonder what the implications are for me and my future research? Conclusions are likely to differ based on the approach used but, I question, like Silberzahn, “how much variability in results is too much (p. 352).” Silberzahn et al (2018) also recommend using a specification curve or a multiverse analysis. Is this something that is already commonly done and when it is done, is it often reported in academic journals?

    Frohlich 2018

    Frohlich et al seek to address machine learning & big data in the goals of personalized medicine. Although few approaches in machine learning have reached the point of clinical significance, the authors are optimistic about the future. They outline both the current bottlenecks and steps that could bring machine learning closer to providing better, useful, ethical, and legal personalized medicine. Two highlights of this article that I found particularly interesting were the black box legal implications and future potentials. In particular, I had never thought about the implications of the “black box” on clinical explanations and legal actions. It makes sense as a patient and advocate to want to know where and why we have been led to a certain conclusion but sometimes I forget that doctors are humans (and researchers) too and not to blindly trust “science.” Unfortunately, I think that a lot of people do. I hope the implications of legal issues discussed in this article will deter clinicians from using the unexplainable and will protect patient rights. Playing devil’s advocate, however, what if a “black box” finds a cure for cancer and we know it works but we can’t explain it? Although I think this is far-fetched, what are the implications in this case? The second point that I found very interesting was the future possibilities. As always, I wonder how far away we are from these possibilities listed. I think some of them are very feasible (i.e., automatically collect information from various databases, compose reports, etc.) or at least seem to be; however, I, again, wonder about the “black box.” The idea of an “interactive disease landscape” also seems particularly interesting and has very useful implications for medicine.

    ReplyDelete
  3. I found the crowdsourcing of analysis article to be really alarming. So many people who essentially base their entire careers on being able to conduct the most appropriate analyses for their research having such disagreements about the approach to the same question from the same data seems unsettling. Maybe that’s a closed-minded thought from me, though. In stats classes, I’ve been given so many printouts of flowcharts that say things like “if you have >3 groups, use this analysis, if not, use that,” and while I’ve always known that it isn’t that cut-and-dry in a lot of ways, I would have guessed that there would have been more similarities in the approaches of the various teams than what was observed. Even further, the paragraph about how the researchers’ ratings of the quality of the other approaches didn’t yield any significant results, so it appears that not only were they doing different things, they also weren’t able to distinguish between successful and unsuccessful models. The authors of the article suggested crowdsourcing analyses out if possible in order to observe the directions that the PI’s can take and report, and while I agree that open science is an important and mutually beneficial movement, I don’t know how we’re going to deal with disagreements between approaches and the search for the most appropriate approach in any given scenario. On a very broad note, we talk sometimes about what role small research designs play as information science continues to expand. I wonder if small causal designs can, at the very least, be more robust to the uncertainty that various (but relatively equally valid) analytical approaches can engender in big data questions.

    Of course, that’s not really a bold statement, given that its much more eloquently posited by the figure on page 7 of the personalized medicine article. Hopefully, though, biomedical mechanistic causality can be linked to these developing models to help be both more accurate and at least as interpretable as current disease models. One thing I found interesting in this article is the idea that machine learning approaches “neither require a detailed prior understanding of cause-effect relationships nor of detailed mechanisms.” While on one hand we obviously want our medical professionals to completely understand what’s happening in our bodies and why they’re doing what they’re doing to approach any problems, could this particular limitation create a new kind of nurse, one that understands the output of a disease model enough to apply treatment, but would refer to other doctors for more detailed information is necessary? I’m not sure if anyone in this class has any kind of medical experience, but I was wondering if this kind of approach was starting to make headway into medical training, as well as which level it tends to appear (residency, med school, nursing school, etc.).

    ReplyDelete
  4. Silberzahn (2018)
    I like soccer, and I played it when I younger so I was more interested in this paper. It would be have been interesting to try and test this in women’s or youth soccer. Also, did they take into account the skin color of the referee? I think that crowd sourcing is an approach that saves the individual researcher lots of time, but as the article said, it’s less efficient. The researchers crowdsourced the analysis by posting it online, and this seems a very uncommon method, especially to have the teams share and review each others’ work. I totally agree though that best defense against accusations of bias is to be transparent in your methods.

    Frohlich(2018)
    I have heard a lot about personalized medicine and how artificial intelligence and biomarkers are going to revolutionize medicine but I haven’t had it yet affect me. Hypothesis free testing is a method I’ve heard about it too but been even more skeptical towards. Even if cross-validation methods are used to test large datasets, but any discovery would still need to be validated by a clinical experiment with an external cohort. It would also take time and still be hard to interpret, or have to much noise. I think this should be used as just another tool by a doctor, but shouldn’t replace other practices. Are patients going to be comfortable that their data and information is private when they hand it over to big, new companies trying to analyze it? A minimum understanding of complex diseases like Alzheimer’s also seems necessary to me before AI can start testing better treatments.

    ReplyDelete
  5. Silberzahn (2018)
    I enjoyed this article because soccer is something I am familiar with and can engage with and wrap my head around, and also because I thought the crowdsourcing approach was a very interesting way to analyze the data. Like some other commenters have mentioned, I was a bit concerned by how much variability there was among the different tactics used. While not terribly surprising, it is still worrisome that results could be so dependent on the personal choices made by the researcher, especially considering the groups in this study could not even agree on which data points to include in their analyses.

    Frohlich (2018)
    The use of AI and big data in medicine is very interesting, but I wonder if people, especially those in older generations, would be willing to have even more of their info up "in the cloud," to quote my mother's common phrasing. While I do think concepts like the disease maps could be helpful, especially with less common diseases where the research is harder to find, I would worry about "black box" cures for diseases. Since we wouldn't know exactly how the cure is found, there would be no immediate way to know if this "cure" for one disease could lead to possible ramifications down the road for that patient.

    ReplyDelete
  6. Fröhlich et al. (2018):
    On top of not having an extensive prior knowledge of AI, I definitely have not been in-depthly exposed to machine learning within the domain of medicine, but I am not surprised that it is a rigorously growing field of research. On page 3, within the discussion of data increasingly impacting personalized medicine, the author states, “Some of these algorithms have been reported to achieve above-human diagnostic performance in certain cases.” The level and extent of potential that these models have to perform is profound to me. Because I am not well versed in the field of AI, I wonder if and when these types of algorithms are created, how much control we as humans (limited to a certain depth of knowledge) will have over them (especially taking into consideration his black box effect)—if they have the capability to produce results higher than our own understanding. While this has benefits of advancement, I feel like it also has weighty consequences and limitations as well. I also wonder, with the rate the research is at now, (being that there are so many issues/setbacks/challenges still being taken into consideration), does the synonymous ever-growing and steady continuation of disease development complicate things even more? That is, new diseases are being discovered all the time, the rate at which machine learning is advancing is a slow process, so do the continuous rise of these new diseases reduce the progress of advancement even more?

    Silberzahn et al. (2018):
    The crowdsourcing article was interesting, I am not all that sure what my opinion is of it though. I do get that this was a study that I think has several subjective setbacks, but there was a lot of different analyses going on with a lot of different results and a lot of disagreement and re-evaluations. An issue I guess I didn’t realize was super prevalent. I’m also not all that familiar with this concept of crowdsourcing. How often is this done? if at all, really? And what is the main goal? This particular study seemed to be strictly explanatory, but if done often on studies, I wonder what the overarching implications of it are. I wonder if there are alternative resources and ways to come to the same results? Because if it was overall beneficial and causal, I would think it would be a more well-known practice?

    ReplyDelete
  7. Frohlich 2018 focus on the application of AI in healthcare. It makes clear that the number of clinical validated models is limited, time consuming and costly. Much still needed to find a balanced marriage between personalized healthcare and machine driven data prediction to advance disease treatment and cure.
    Silberzahn et al. 2018 crowd-sourced data of 29 soccer teams and 61 analysts with the question if referees are more likely to give red cards to dark skin players. My initial response became very visceral but after reading the amount of statistical methods utilized it enhanced my understanding of why. It is fascinating that the responses of the evaluating teams changed over time and more teams agreed that skin tone affected the number of red cards received.

    ReplyDelete
  8. Frohlich 2018
    I think it is promising to use data science in the personalized medicine. However, there may be more challenges than they think. In reality, one patient can get different diagnoses from different hospitals. If he/she has a complex disease, different doctors may have different treatments. Sometimes what is the best choice for a patient is unknown, then how can we build a predictive model? Personalized medicine is kind of an ideal expectation which may work for some diseases but not all.
    Silberzahn (2018)
    It is striking to see different results when using different analytical methods. I am curious that whether the results are still different if the same team uses different methods . In a another word, how the subjective behavior influences the analysis? I have heard that using different software to do structural equation model might have different results. So maybe people should use the same standard when designing software or unify different analytical methods.

    ReplyDelete
  9. Silberzahn (2018)
    This paper consisted of 29 teams that independently analyzed the same data, showing how different analytic techniques and decisions can influence the results of your data. All these teams could be hired by researchers to complete their analyses, with the assumption that they are experts and will do everything correctly and choose the right techniques to use. This is really interesting, because there are so many different ways that this particular dataset was analyzed by these teams, showing that there are many subjective decisions that must be made when computing analyses, which makes me wonder if I will use the "wrong" analysis for my own research in the future - if we can distinguish a "right" and "wrong" method. While reading, this article also reminded me about the p-hacking that we talked about earlier this semester, and it ended up addressing p-hacking a bit too. Similar to how researchers may adjust the significance level to suit their hypotheses, do researchers change their data analysis methods to get significant results?

    Fröhlich (2018)
    I think that personalized medicine is an awesome thing that would be extremely beneficial to such a large number of people. The current problem to personalizing medicine to an individual's needs is that it is expensive and time consuming - I'm currently experiencing that first hand with one of my cats and our vet's office, where we might have to do tests to see which antibiotic will properly fight against an infection. Delegating these types of problems to machines would be helpful in all sorts of medical situations. Right now, we all just seem to trust what doctors say, like how Courtney mentioned that we often blindly trust doctors and science, even though they are able to make mistakes. Since I am reading this paper after reading the previous crowdsourcing one, it makes me wonder how we can even train models or program machines to have a "best" analytical solution to get valid results, since so many "experts" in that paper used different analyses and got different results. I feel like this concept in theory is awesome and I am 100% for it, but I am not sure that it would work in a way that would protect patient information, produce correct/helpful results, AND would be found/explained in a way that the medical community would be able to understand (to avoid the black box problem). I still believe any personalized medicine could be helpful, but I wouldn't want it to entirely replace all doctors visits and pharmacists. Furthermore, I'm just curious - would people care if the solution was a black box, and a doctor couldn't explain it, as long as it helped your medical condition? What if the black box solution didn't help?

    ReplyDelete

Post a Comment

Popular posts from this blog

Readings for 10/17/18