Readings for 9/19/18
I've put up the readings for next week (see tab above). You will need your UoM email id/password to log in (e.g. foobar@memphis.edu would log in as foobar).
One of the readings is what's called a "scrolly." As you scroll down, it will run animations that accompany the text. You can also scroll up to reverse the animations. Scrolling up and down might be particularly helpful (like instant replay).
Make sure your respond with your comments by 9/18 at noon. Make your comments by replying/commenting to this post (the one you are looking at now).
One of the readings is what's called a "scrolly." As you scroll down, it will run animations that accompany the text. You can also scroll up to reverse the animations. Scrolling up and down might be particularly helpful (like instant replay).
Make sure your respond with your comments by 9/18 at noon. Make your comments by replying/commenting to this post (the one you are looking at now).
A visual introduction to machine learning website & An Introduction to Statistical Learning
ReplyDeleteI found the website extremely interesting and entertaining. I have actually never been to a website that gave the visualizations in this manner. The example of the homes in New York versus San Francisco was a simple, easy to understand visual. This website helped me understand a little more about machine learning. When coupled with the Tree-Based Methods chapter in the book, I finally feel like I am beginning to understand something about machine learning. I looked at the website first, read the chapter, and then went back to the website. In the chapter it makes the statements on page 315 that decision trees are “very easy to explain to people” and “more closely mirror human decision-making.” I would agree with both of these statements. It just makes sense. The comparison to a tree and pruning the tree makes sense. However, I felt a little lost when the discussion began about bagging, random forests, and bootstrapping. I guess my question would be, what method is most commonly used and why when using decision trees? How often are decision trees used in research methods? What are some examples in which decision trees are more beneficial than other regression and classification approaches? Is there a certain field where decision trees are seen more than other fields?
Pavlik (2013)
The Review of Student Models Used in Intelligent Tutoring Systems paper was a lot of information that I did not know about. I guess I have never thought about the “behind the scenes” of a tutoring model. There are so many different types of models and technology that goes with each model. Something I found particularly surprising was that models can be traced back to as early as the 1950s. I found the figures on pages 59 & 60 particularly useful to sum up the outlined information. I can understand why there is a “significant challenge in developing a standard, or framework, to support an existing field (p. 61).” I would like to know how GIFT has changed/developed over the last 5 years since this article was published. Were the authors (aka Dr. Olney) able to create GIFT in all the ways they wanted to based off of the conclusion section? I can’t wait to hear about this in class on Wednesday.
Reading through Pavlik and colleagues (2013), I’m wondering how ITSs could help with language acquisition. The range of estimates varies widely, but one figure is 600 hours for a European language (assuming an Anglophone learner) and 2200 hours for a level 4 difficulty language like Chinese (Jackson & Kaplan, 1999). Whatever the true numbers, it takes a hell of a lot of time to learn a language. These figures don’t include transit time to the actual courses, which may double (or more) that time. Besides courses and self-study books, there are programs like Rosetta Stone. These programs are highly linear in structure (I don’t know if it’s anything like Skinner’s linear programming referred to in the reading) and are rarely as effective as their marketing claims they are. There are also apps like Duolingo that take a gamified approach to language learning. Let’s call them “dumb” in contrast to Intelligent TSs. I wonder if ITSs can be implemented towards language learning, something that is very time-costly and traditionally requires years of courses. And, would they be any more effective (and profitable) than gamified but dumb systems like Duolingo such that the cost of development would be worth it?
ReplyDeleteThe website: The use of visuals to complement the text really facilitated my learning. I wonder how the humanities, which tend to be less tech-savvy, could learn and benefit from presentation styles like this. Those of us in the humanities don’t typically learn skills like programming or graphic design, although the so-called “digital humanities” is the next big thing coming our way.
ISLR: Philosophy is one of the core disciplines on the wheel of the cognitive sciences (see the cover page image from the Sloan Foundation, 1978). This was a state of the art report on cogsci from 1978. This wheel of the cognitive sciences would need to be updated today (esp. for data science). However, what still holds true is that the different spokes work together, yet aren’t redundantly experts in each other’s fields (as Andrew said in the 1st class, we can’t do 2 or 3 PhDs). This is all a convoluted way of saying: as someone on the philosophy spoke, I’ll leave the bagging and boosting to the baggers and boosters :). I'm glad I'm being exposed to it, though.
Reading the material from ISLR, I continue to be appreciative for the exposure to new design models that I have never previously seen or heard of, such as boosting/bagging/forests but I am not sure that there has been adequate clarification on when to perform these analyses and techniques? The explanations have been super thorough and the examples are useful to connect all the dots together, but I repetitively wonder how I know when to use each method?
ReplyDeleteThe tree scrolly website was really fascinating. Being a heavy visual learner, I valued the break down of explanation throughout the entirety of the demonstration. I think the method of building upon the model step-by-step was beneficial to understanding the reasoning of what was actually going on. I wonder why more models like this have not been built to explain statistical thinking? Browsing around the website, it seemed as though it was fairly new, but I couldn’t find a specific date. It makes me wonder what the motivation behind creating it was? and if it becomes more popular in the filed of education in general, what the effects would be in learning?
Overlapping with the content of the ITS article, it makes me wonder if there would be a significant difference in learning using a website/module like the interactive design implementing ITS components/ideas compared to simple text comprehension learning? I think it would make for an interesting observation. The article itself covered a lot of bases, and think did a thorough job of outlaying the combinations of all the different components. Considering that it’s been 5 or so years since the initial review, I would be interested to learn of any updated analyses/followup work done on GIFT.
Tree scrolly: I tend to understand material better when it is laid out in front of me, and I wish I could learn everything through a scrolly like this one. I found the animations to be extremely helpful in understanding how boundaries and trees work to increase the accuracy of machine learning as each new layer is added. I have never learned anything about modeling before now, and visuals about how training data is used vs testing data was also nice to see. I thought it was interesting how the testing started with the intuition about the elevation difference between the two cities, which leads me to ask if having human prior knowledge/intuition about data is routinely used as a starting point of machine learning, or do machines just figure that point out on their own without the aid of human intuition? I assume that the aid of the intuition would speed up the process of machine learning a bit because a starting point would already be given - as long as the intuition is correct.
ReplyDeleteISLR: Because trees and other models we have already looked at can become so large and complex, I also wonder, like Sam, when each of these methods should actually be used. Similarly, while I think the process of bagging to reduce variance is practical, I wouldn't know if or when I would use bagging. All of these newly introduced techniques seem to be helpful to improve model predictions and prevent overfitting, but I don't have a good grasp on when they would be used.
Pavlik (2013)
I'm somewhat familiar with the ITS that have been used here at UofM, but I didn't know that so many models have been developed and tested for ITS. In hindsight, it makes sense that several methods have been created, and I also wonder what methods have been created, or how existing methods have evolved and improved, since this was published. Technology advances so quickly, and I wonder if the increase in computer sciences has impacted the cost of creation, flexibility, ease of understanding, or any other key parts of ITS.
One thing that the tree scrolly website helped me understand more fully is the concept of overfitting. I thought I had a grasp of it, and I knew that it was bad for some reason, but I never knew exactly why. The website helped me nail down that it's due to the intricacies of the training data and how they don't apply to the hypothetical testing data. The fact that models are built to be adapted to new, unseen data it the key point here, so of course having a highly specified model that fit your training data like a glove is going to have nooks and crannies that don't fit new data as well. I also loved the knob visualization that tweaked the node size in the trees and displayed their accuracies, you really got to see the tradeoff in action there.
ReplyDeleteSpeaking of demonstrations, the ISLR reading did a good job of outlining some more advanced techniques, but I would like to further some of the questions that other people have asked. There are some clear differences in how these techniques are structured, but I'm not clear on what the pros and cons of each are and what situations may favor one over the other. The book also mentioned that they can be done with other analyses besides decision trees, but I'm not sure how to visualize what this might look like with logistic and linear regression (or any other analysis that I don't know that I don't know about).
The ITS article provided a lot of information about the different paths that a design team can lead a program towards that have many different strengths and weaknesses. Comparing the various learner models is a challenge, but I was particularly interested in an aspect that received a smaller amount of attention in the article. Specifically, it seems that many learner models assume a certain level of self-efficacy, motivation, and goal-understanding. The article mentioned a few potential measures and system responses to differences in self-efficacy, but I was wondering if any of the systems collect and utilize other aspects of baseline cognitive and emotional information from learners in order to individualize and optimize task and feedback structures. For example, do participants ever complete an assessment battery of working memory, attention, or psychosocial factors like grit? Of course, there are some state/trait issues there, which the article notes, but it seems that factors such as these are important thing to consider when thinking about learning outcomes and optimization.
A visual introduction to machine learning
ReplyDeleteThe writer said that you can see if a model overfits by having test data flow through the model. If there is less data flow through certain branches, does it mean that these branches are useless? And, the title is a visual introduction to machine learning which is too big. I think it may be better to add the decision trees or classification trees into the title.
An Introduction to Statistical Learning
After reading this chapter, I get a better understanding of the regression trees and classification trees. Comparing to the linear regression models, does the decision trees need a larger data set?
For bagging, random forest and boosting, what are the advantages and disadvantages to use them?
Pavlik(2013)
I think it is confusing when talking about domain model and student model.They are overlapped, so what are the boundaries of these two models? For example, I thought psychological states are a part of students models, when will they be included by the domain model? And, for the knowledge space model I think it should be domain model while the knowledge space student
modeling is the student model which assesses where the students are in that domain. Another question is what are the inner loop and outer loop? Finally, how can we use machine learning techniques in the student modeling?
The Review of Student Models Used in Intelligent Tutoring Systems chapter provided lots of models totally new to me. In essence some models interacts with the same information and execute a new variation of the same. To my surprise according to Aleven, et al 2006, 1 hour of instructional content requires 200-300 hours of general ITS System design development time. In 2009 Heffernan, et al reduced the time cost of program development to 30:1. An interesting fact finding is that graph- based systems are more rigid and restrict the user input.
ReplyDeleteI really enjoyed the scrolly demonstration, and found it incredibly helpful in trying to understand more about tree-based methods. I have a very basic understanding of decision trees and their use in statistics, so it was nice to see it laid out step by step, really illustrating how it is similar to normal human decision making. It was also useful in that it helped me better understand the ISLR reading. In the ISLR reading, the part I found most confusing was the last section on bagging, random forests, and boosting. While I vaguely understand why each technique is helpful, I am not sure I understand when it would be best to use each method. I am also curious as to when decision trees get used in general, as I feel like I have never seen them talked about as often as other methods, even though they seem like they could be, at least in their basic form, one of the more easily interpreted methods.
ReplyDeleteFor the Pavlik (2013) article, I had known that Intelligent Tutoring Systems could potentially be very helpful in facilitating student learning, but I had never considered the background work of what would go into creating such a system. I am curious as to what advancements have been made in adapting to students' self-efficacy. I agree with the article's statement that human tutors are able to focus on not only the student's understanding of the material, but also their motivation to learn, which I believe is an important aspect of learning anything. It would be interesting to see how technology could mirror this seemingly uniquely human ability without it seeming forced or disingenuous.
Intelligent Tutoring Systems are complex and multi-faceted, and this article helped me to understand the rules and theories by which its made. It surprised me at the start how models are evaluated for usefulness and less so for correctness. This seems to have a lot to do with practicality and making sure it's low cost and can be understood. The article was written to describe the GIFT system but didn't have studies done using it. Are there recent results using GIFT that test different models?
ReplyDeleteThe Visual Intro to Machine Learning gave a great introduction to building models that's easy to understand. Why aren't decision trees used more often instead of other machine learning techniques?
Tree-based methods provided a much more thorough and detailed explanation of modelling. The distinction between linear and tree models cleared things up for me, as I didn't know that trees are better for non-linear and complex relationships. Are random forests always better to use than Bagging? When you average observations together with bagging, what is done about outliers?
ReplyDeletePavlik et al. (2013):
Individual characteristics (e.g., personality traits like need for cognition, or predispositions like topic-interest), though mentioned briefly, do not seem to be given much consideration in these systems, nor do tangentially related domains of knowledge/skill. For example, being interested in a topic may raise the bar for how challenging example problems should be, as the appropriate level of difficulty is shown to benefit learning (e.g., “desirable difficulties”), or at least make it more efficient. Similarly, someone with higher reading proficiency may experience more global benefits in learning from all written sources of information. Although I do tend to view many processes as somewhat universal (e.g., while an ITS catered perfectly to the individual would be great, there is enough similarity between people to create systems great for most), I wonder if there may be benefit in considering some of these characteristics in student models, as well. That is, highly interested students, with a higher predisposition to engage deeply, may benefit more from varied examples as they are more likely to reflect on their answer to infer the underlying knowledge structure, while an unmotivated and disinterested student may require a vastly different method of learning (e.g., step-by-step instruction). Would adding something like Need for Cognition, Topic-Interest measures, or general knowledge measures improve the selection of the optimal learning conditions?
James et al. (2013):
Decision tree analyses seem very interesting to me for certain applications, such as genetic contributions to behaviors or disorders. The more nuanced applications of it, though, I am not as sure about. For example, I have a dataset of sarcastic and matched literal responses to some questions, which I had previously fit a logistic regression model to. I tried using a Decision Tree (in SAS Enterprise), and of all the variables, it only kept one, which was the largest predictor. It is tricky as these predictors/features tend to vary by how many are present. When there is only one feature, it tends to be trait A (based on a simple frequency distribution); when there are 2, it tends to be B & C; at 3 it is A, B, C; etc… Analogously, returning to disease diagnosis, there are several genes with almost inevitably lead to the development of certain psychopathologies (e.g., ADHD), but which are only present in a small portion of individuals who have these conditions (e.g., < 5%). There are many other genes which slightly increase the odds of these conditions. To me, it seems like this would be a perfect Decision Tree problem, but I’m not too sure based on the performance of the comparable sarcasm decision problem (to be fair, there is almost CERTAINLY some user error). How might a decision tree fare compared to more traditional analyses in these peculiar situations?
Scrolly:
I was really impressed by the graphical interface for this information. It also did a great job illustrating the decision tree classification process, as well as the overfitting problem. I don’t really have any questions about this that aren’t covered by my questions about Decision Trees, generally. Though, I also wonder (like others have mentioned) why decision trees aren’t more popular?
I agree with John Hollander about the visual intro to machine learning. That was perhaps the best/easiest to grasp demonstration of what actually happens during overfitting and how something seemingly intuitive like an exhaustively trained model being a good thing can actually not be. My biggest question though is regarding how it would be proper to train a decision tree like this then. Would it be more advantageous for this specific model to stop at a certain point, like, say 85% accuracy? How is that determined? Do we just have to continually train and test until we find the optimal balance of getting the least error in testing?
ReplyDeleteFor the ISLR textbook reading. I found it interesting that they were using continuous data when first explaining decision trees. I know that decision trees can be used for continuous data, but I was under the impression that finding where to segment each feature in branching the tree is not a trivial problem and I was pleased at the detailing of actually how to achieve an optimal cutpoint by making it a simple minimization problem. Using decision trees for continuous features like this adds a dimension of complexity and utility to them that I was not exposed to when I previously learned about them. It is useful to know when to use and how to determine what techniques are best for certain problems so the explanation of these kinds of decision trees are better when working with data whose features and response have a complex, non-linear relationship. I was also not exposed to bagging in very much detail previously. Using OOB observations seems like a similar kind of technique to k-fold cross validation. It’s pretty nice that we can get lower error simply by using a large number of child models to create a parent model. Random forests made a lot of sense related to last week’s causality paper in that introducing randomness can make a more powerful model. Boosting is still rather confusing. Are we still using multiple trees? Or is it just one tree that has grown slowly using the stated algorithm?
The Pavlik et al paper proved difficult to grasp at parts in that each model discussed seemed to require background knowledge on several theories or techniques, however several concepts were easy to tie to previous knowledge regarding AI construction. The Dialogue Student Models seem to be an essential component for a tutoring system to be what I think most people would want in a tutoring system. Two of several central problems here seems to be that of language and the matching of near infinite possible responses as topics increase in complexity, and the related ever pervasive problem of generality. Since no one model is a perfect fit for every circumstance, what approach is needed to develop systems that are more and more general?