Sister blog of Physicists of the Caribbean in which I babble about non-astronomy stuff, because everyone needs a hobby

Friday 21 September 2018

Talent-luck : subtle statistical anomalies

More investigations on talent-versus luck. Still trying to figure out why my faster method gives different results.

The standard, original method treats events and agents as objects. The events move around the world randomly. If they overlap with the position of an agent, who are fixed, then the agent experiences that event. The event's type (good or bad) is a fixed property of the event object itself, however tests showed that it made no difference if the event type was reassigned randomly at each timestep.

For the fast method, I measured how many events of all types typically occur during each timestep (running 10 simulations by the random walk method to get a large number of values). This appears to be a good approximation to a Gaussian so I measured the mean and standard deviation. Then at each timestep, a random number (call it X) is drawn from this distribution. Agent numbers are then picked at random X times, so that the same number of events should happen and there's a chance of individual agents experiencing multiple events at the same timestep, just as in the original model.

In practise this seems to only nearly work. You can see below that the overall wealth distribution is in rough agreement, but the wealth fraction of the top 20% is significantly less than with the random walk method (the variation in the top 20% most talented can be ignored as that does tend to vary by chance anyway). This difference is large and consistent across multiple simulations. Also, here I'm using the original parameters where the only effect talent has is to determine the chance that an agent will be able to exploit a lucky event. If I have it affect other things (such as whether the event will be lucky at all), then the overall shape of the wealth distribution becomes much more visibly different using that fast method compared to the standard method.

I wondered if maybe these method was just slightly wrong. Maybe the random walk doesn't quite give a true Gaussian event frequency distribution to the agents. Altering the event movement speed certainly makes a big difference, so that seems plausible. Perhaps just slightly more agents experience more events each timestep by the random walk method than by chance alone.
And indeed, my first check seemed to indicate this was plausible. I measured the fraction of timesteps in which the maximum number of events (to individual agents) occuring was 1, 2 or 3 for both methods. The mean fraction of timesteps with a maximum of only 1 event per agent was 33 for the slow method and 40 for the fast method (average of 4 simulations for each method). It seems that the fast method gives slightly fewer cases of multiple events per agent than the random walk approach.

So what I tried for the histograms was to measure the number of agents experiencing different numbers of events (0, 1, 2 or 3) at different timesteps. Tests showed that there were never more than 3 events occuring to any agent, and the number of agents experiencing 3 events is already so small (about 5) that there's not enough data to plot. These distributions (red for the standard method, blue for the fast method) were both obtained from sets of 5 simulations. And they look near as dammit identical to me.

Which means I'm somewhat stumped. There clearly is a difference between the two, but it's damn subtle. Next I'll try measuring the number of lucky and unlucky events rather than events of all kinds.

One thing that did go according to plan was to increase the speed of the random walk check : instead of evaluating the precise distance to each agent, it now only does so if the linear offset in both x and y is less than a specified threshold. This increases the speed by about a factor 3, from 6 minutes to run the simulation to just under 2. Great, but that's still nowhere near the factor 100 difference that the fast method gives...

Code, as always, is here : https://repl.it/@RhysTaylor1/TalentVersusLuck



19 comments:

  1. Having only skimmed this because I've got to get on to the next thing soon, could it be that the walk method produces more "luck hurricanes" (with the event circling and hitting an agent again and again) whereas the random assignment of luck spreads out the impacts more evenly? That's kind of what you were reaching at with the number of events per agent, but perhaps it's a combination of number and their distribution among the population? First thoughts. It does seem odd, though...

    ReplyDelete
  2. Yeah, that's my guess. Could be that the movement speed of the event objects is not quite high enough to be truly random, so there's a higher than random chance for an agent to get hit by the same event twice. Hence a few may get more lucky events than expected, even though the total number of events is basically the same as the random method. Should be reasonably easy to measure this.

    Another thing that may contribute to this is that if the movement of events takes them outside the boundaries of the world, they're forcibly moved back in. I don't remember how exactly that's done, maybe there are a few more events persisting near the boundaries than there should be.

    ReplyDelete
  3. Rhys Taylor Oh, that's a good idea, too. If the event smoothly moves from the right side to the left side as though the agents are on a flattened globe, that'd give you different results than if the boundaries acted like a wall, rebounding events into agents on the edge.

    ReplyDelete
  4. Michael J. Coffey Actually I hadn't thought of that at all ! Currently I have them rebounding. That gives such good agreement with the original I presume that's how they did it too, but I don't they state it explicitly. Should be easy to implement a toroidal universe... :)

    ReplyDelete
  5. Rhys Taylor -- I blame Minecraft and Eco for my small amount of useful idea.

    ReplyDelete
  6. Rhys Taylor -- Haven't played that one very recently. Played both of the mentioned games in the last week. Minecraft has a (very difficult to reach) edge, but Eco wraps the map around a tiny planet (tiny as in it's a 10 minute walk around the equator...)

    ReplyDelete
  7. It always comes down to boundary conditions. :-)

    ReplyDelete
  8. Well, the distribution of the number of lucky events occurring per timestep is identical for both methods. But that's overall total, not per agent, and luck events aren't necessarily successfully exploited. That can be measured tomorrow.

    If the boundary conditions do make a significant difference, that could have interesting consequences for the original claims.

    ReplyDelete
  9. My comment was just a tongue-in-cheek reference to some of your (and Ethan's and Sabine's) foundations-of-science postings, i.e., that even the most awesomely well-tested theory needs its boundary conditions (initial and/or otherwise) specified in order to make useful predictions.

    On a more serious note, and without having thought about it nearly as deeply as you have, it would make sense to me that the original probability distribution has some second-order features that your approximation misses, and the iterative nature of the original might be amplifying those differences.

    Probably a stupid question (remember, I haven't thought about this too much), but is there a hybrid approach that merges your approach and the original in any substantive sense? If so, do the results look more like your version, the original, or something in between? It would be ideal if it were a parameterized hybrid; then you could tune it in both directions and see if the results vary smoothly or if there's a "phase transition" somewhere.

    Just random thoughts, don't mind me...

    ReplyDelete
  10. I'm rapidly running out of possible solutions. The distribution of the number of agents experiencing different numbers of lucky events is near-as-dammit identical for both methods. There's a slightly higher rate of lucky events for the random walk, but it's so marginal I can't believe it's meaningful (mean of 18.8 agents have 1 lucky event per timestep c.f. 18.3 for the fast method). Visually the distributions look like identical, randomised Gaussians.

    I suppose I can try a few other things : the number of successfully-exploited lucky events (though both methods use the same module for this so there shouldn't be a difference), maybe look at the numbers of lucky events for the wealthiest individuals, compare the total money generated. See if there's some larger problem that I've missed, because the details are virtually identical however they're measured. So far the only difference seems to be in the maximum number of events occurring to individual agents per timestep, which is higher in the random walk method.

    Altering the boundary conditions was a good idea, but it made no difference at all.

    I'm not sure about the hybrid approach; I kind of think of the fast method as the hybrid itself. It still uses agents with individual properties, it just replaces the event objects with probabilistic events. I'm not sure a purely statistical approach could be done - I don't think the agents could be replaced, since their history is crucial to determining their individual wealth at any point. I suppose one could approximate that distribution as an evolving power law and randomly draw from that at any time... hmmm....

    Probably also worth mentioning the earlier test runs, which showed that the random walk method is highly robust to changes in everything except movement speed :
    plus.google.com - Album - Google+
    Though one thing I didn't check was the distance range within which events are said to affect agents.
    https://plus.google.com/u/0/+RhysTaylorRhysy/posts/aRi6gkaVmqd

    Random thoughts are always welcome !

    ReplyDelete
  11. OK, the total wealth is more interesting : the slow method produces a total wealth which is consistently 2-3 times the fast method. Now I would have thought that impossible if the event frequency distribution per agent is identical... the plot thickens.

    ReplyDelete
  12. OK, so it's definitely due to an event distribution difference of some kind. Of 5 runs, the mean total number of events for the 20% wealthiest agents is 509 for the slow method but just 401 for the fast method. There's some very subtle difference that produces a large cumulative effect.

    I thought about maybe measuring something from the perspective of the event objects rather than agent objects, but the problem is that would be hard to compare with the fast method where the events aren't objects.

    My guess is still that certain events tend to repeatedly occur to agents more often than chance would suggest. I might try visualising this in some way...

    ReplyDelete
  13. Suppose you were to divide both methods' runs into 10 intervals (or 4 or whatever), capture the state of all the agents at the end of each interval, and transfer that state into a clone of the opposite run style? That is, you would start with four runs, two fast (A, B) and two slow (C, D). (I'm assuming you can use the same random seeds so each pair truly is identical.) A and C would be the "pure" runs; B and D would be the hybrids, i.e., the ones that receive "state transplants" from C and A, respectively.

    Hmmm, maybe you can do so only once per run...I was imagining that there would be some low-level, "native" state that couldn't be transferred (and therefore would carry forward some amount of history in each of B and D's runs despite the transplants), but if the entire state consists of accumulated wealth, then transferring it at all the intervals would effectively be a complete mind-wipe, and there would be no point in it. (Is "talent" a constant, or does it include an "experience" component? That's the kind of thing I was thinking might allow transplant-hybridization to make sense. From the comments at the end, it sounds like that's a direction you've considered but haven't yet implemented, so this probably isn't a useful idea.)

    ReplyDelete
  14. I'm at an ALMA workshop for the rest of the week but my current findings are as follows (extensive ramblings start on line 94) :

    - The maximum number of events experienced by any agent at any timestep tends to be higher with the random walk method
    - The number of lucky events experienced by the wealthiest 20% of agents is higher for the random walk method
    - The overall number of events is lower with the random walk method
    - The event frequency per agent per timestep distribution is identical for both methods
    - Both methods give an almost exactly equal fraction of good and bad events
    - The total wealth generated by the random walk method is much higher
    - The wealth of the wealthiest individual is comparable by both methods (maybe a bit higher sometimes with the random walk)
    - If we completely randomise the event positions each timestep, instead of doing a walk, we get an excellent agreement (total wealth, wealth fraction of richest 20%) with the fast method

    I suspect there must be more repeat encounters via the random walk method than from chance alone. Not many, as it only shows up collectively, but enough to make a difference. I'll try monitoring the event numbers the agents experience and looking for repeats.

    In principle the entire state of any run can be saved and transferred to the initial conditions of another run using a different method. The agent objects store a bunch of information about what happened to them. Interesting idea to try...

    ReplyDelete
  15. I'm having trouble reconciling these two statements:

    - The overall number of events is lower with the random walk method
    - The event frequency per agent per timestep distribution is identical for both methods

    If the number of agents and timesteps is identical, then the latter implies the total number of events is the same, too, does it not?

    Or is there a "per ..." missing in the first statement?

    ReplyDelete
  16. Finally have time to return to this.

    The mean total number of events is about 7% lower for the slow method (2867) than the fast method (3067). The lucky/unlucky ratio is almost exactly 1:1 in both cases. However, for the wealthiest 20%, the mean number of lucky events is about 25% higher in the slow method (509) compared to the fast method (401).

    It's the second statement that needs rephrasing. The distribution of the number of agents experiencing a given number of events (0, 1, or 2) per timestep is the same (i.e. similar shape, mean, and range) for both methods. They are of course not literally identical. I think this just means that the difference is too subtle to detect in a single timestep on an agent-by-agent basis.

    Whatever the result turns out to be, it's an interesting example of very subtle statistical differences having a big impact on the outcome.

    ReplyDelete
  17. Rhys Taylor -- Have you tried additional visualizations of the luck distribution? When you said the shape, mean, and range are the same, I thought of the Datasaurus Dozen (http://blog.revolutionanalytics.com/2017/05/the-datasaurus-dozen.html)
    blog.revolutionanalytics.com - The Datasaurus Dozen

    ReplyDelete
  18. Michael J. Coffey That's essentially what I'm trying to do : find the parameter that gives the crucial difference. I started adding in some test files to print the positions of the events as a function of time, but it's not working correctly yet. I'll try and fix it later, have some more urgent things to deal with first.

    The Datasaurus is one of my all-time favourites. I've used it in lectures . :)

    ReplyDelete

Due to a small but consistent influx of spam, comments will now be checked before publishing. Only egregious spam/illegal/racist crap will be disapproved, everything else will be published.

Review : Human Kind

I suppose I really should review Bregman's Human Kind : A Hopeful History , though I'm not sure I want to. This was a deeply frustra...