Showing posts with label Big Data. Show all posts
Showing posts with label Big Data. Show all posts

Wednesday, September 28, 2022

Space Rendezvous.

 One of the most technically difficult and impressive acts humanity has ever achieved is the rendezvous of spacecraft in orbit. Interestingly, the American and Soviet approaches to the problem were different; while astronauts until quite recently controlled the process manually (Buzz Aldrin's Ph.D thesis concerned how to do so), since 1967 the Soyuz spacecraft have been able to do so autonomously. The Igla system, followed by Kurs, paved the way for autonomous resupply of Mir space stations at a fraction of the cost of manned spaceflight, a feat only matched by the Dragon capsules of SpaceX as recently as 2012. 

From the Economist.
This American tendency to rely upon a skilled human operator also found its way into the distinction between earlier Airbus and Boeing aircraft, with the European consortium leaning towards fly-by-wire, systems with the then-Seattle based manufacturer only following later. 

I mention this point today in light of the rapid perceived advances in AI (artificial intelligence) and ML (machine learning). Much ink has been spilled regarding recent advances in language processing and image creation (this article is a nice example from the Economist). 

But such technologies have long existed in some of the most challenging engineering spaces faced by humankind. Is the current surprise regarding AI/ML more due to the fact that it is now able to address the routine activities formerly used to sideline (if not belittle) the technology, when the practical application of the same has long since progressed past triviality to indispensability? 

Previous posts on Economist issues:

  1. Nordic Success.
  2.  @TheEconomist (Ann Wroe?) on Dr. Robert McClelland and #JFK.
  3. Further Reading.
  4. Where Newspapers Are Headed ...
  5. @TheEconomist on a hybrid #VirtualParliament.
  6. @TheEconomist on #Homelessness in @SFGov.
  7. The Life Pressed Out.
  8. Why Travel Matters.
  9. @econbartleby and @billswindell at @TheEconomist and @NorthBayNews, respectively.
  10. @AmExperiencePBS @RobertKenner-- the 1918 Pandemic.
  11. The Return of #Cash.
  12. California, where Malala Yousafzai becomes Janet Yellen.
  13. The Plutonium Standard.
  14. Beikoku and Eikoku.
  15. Secession is a bad idea, full stop.
  16. QE4.
  17. Brown, Budgets, Prisons, and Contempt.
  18. Executive Orders.
  19. #rebeccapurple.
  20. The Streets Should Fit the Trees.
  21. @TheEconomist on Alcohol and Health.
  22. What Do Bubbles Look Like, Pt. 2.
  23. "Bringing Up Baby Bilingual"
  24. Freshman Teams, Student Performance, and the Case For SVUSD's Master Plan.
  25. Dual Immersion Enhances Attention.
  26. Trust Levels of News Sources.
  27. Slouching Towards Utopia.

Friday, April 5, 2013

... While the ATF Moves on Big Data.

R--OPTION - Investigative System
Solicitation Number: DJA-13-AOSI-PR-0238-1
Agency: Department of Justice
Office: Bureau of Alcohol, Tobacco and Firearms (ATF)
Location: Administrative Programs Division (APD)
available at
From Wired: "[ATF] is looking to buy a 'massive online data repository system' ... to process automated searches of individuals, and 'find connection points between two or more individuals' ... instead of requiring an analyst to manually search around for your personal information, the database should 'obtain exact matches from partial source data searches' such as social security numbers (or even just a fragment of one), vehicle serial codes, age range, 'phonetic name spelling,' or a general area where your address is located. Input that data, and out comes your identity, while the computer automatically establishes connections you have with others."

"... the ATF is widely perceived as a weak, stagnant and underfunded agency. Even if it has a database that can track you down and find out who your friends are, it won’t necessarily be able to apply that to tracing gun transactions due to Congressional restrictions. If the agency finds a gun linked to a crime, and then traces the gun to someone who bought it from someone else, all of that work figuring out the who’s-who will still likely have to be done manually."

It takes a few months for the Federal Government to get moving after something like Newtown.  They're making progress.  However, as the article notes, Federal Law bars the ATF from creating a centralized database tracing gun transactions.  It doesn't bar anyone else from doing it, just ATF.  Which is a policy with certain downsides.

Wednesday, December 26, 2012

New York Times: Law Blocks ATF's Use of Big Data.

"Legal Curbs Said to Hamper A.T.F."
Goode, Stolberg et al., New York Times, Dec. 26, 2012
available at
The New York Times, like other news sources, is picking up on the fact that the Obama administration is preparing to overhaul how the Federal government learns about the potential for gun violence.

The description of the Department of Alcohol, Tobacco, and Firearms (ATF)'s ability to track gun purchases is positively ridiculous, when one thinks about how easy these things are for department stores like Target.  If they're really still using printouts and paper, that's a travesty.

The worst part, of course, is that the law is written so that the government agency is less effective.  After Newtown, the administration is (presumably) getting ready to propose extensively changing the law, which makes sense.

While I can imagine that there are a series of careful provisions that may be put in place, I think the government should make clear that not just ATF, but also state-level social service organizations, should always be given access, at the very least, to the same kind of information that is already possessed by the likes of Wal-Mart and Amazon.

Thursday, December 20, 2012

Unintended Consequences of Makerbot.

Slashdot drew my attention to this Forbes story today, which made me shake my head in sadness.  Makerbot has always seemed to have the promise to transform the world -- as  Chris Anderson of Wired has described it, "[w]hat desktop fabrication represents is a laboratory for the future, not just of manufacturing but of stuff itself."

"3D-Printing Firm Makerbot Cracks Down ..."
Andy Greenberg,, December 21, 2012

However, I think the Maker movement has now found its equivalent of the Internet's Rule 34. In short, if you give people a way to make things, someone will use it to create a gun, and using Makerbot to create automatic weapons is just about the worst idea possible.

I respect the effort to prevent electronic distribution of the designs, but that's not going to cut it, because darknets can move those around.  DRM on a Makerbot, which might prevent these designs from being used in the first place, is something that very bright people have been thinking about for a while. But the somewhat dismal history of DRM suggests DRM can, at best, mitigate, rather than prevent such problems.

Ultimately, we need to know more about the people prone to doing these things. If someone's trying to  print parts to create an automatic weapon, stopping them is important.  But so is letting the right person know they're trying to do it in the first place. 

DRM helps -- but the solution is probably going to involve Big Data ...

Monday, December 17, 2012

Obama: It's Time To Use Big Data To Protect Our Children.

President Obama interrupted the awesome 49ers-Patriots game last night for a speech.  I didn't have time to go over it until this morning, when I started putting the pieces together to figure out what the heck he was talking about.  Plus, I needed to read it, rather than hear it, to process the substance.  

The criticism I heard this morning was that the word "gun" was never mentioned in the speech.  How can the President be talking about protecting children and not mention gun control?  After thinking about it, I don't think it was an accident.  While control of guns may be his goal, I think he's planning on getting there through something on an entirely different level from mere background checks. 


"Government seeks to shut down NSA wiretapping lawsuit"
Joe Mullin, Ars Technica, Dec. 14, 2012
available at
President Obama has amazing tools at his disposal to protect America from threats around the world.  He gets daily reports on what terrorists around the world are up to, with fairly accurate predictions concerning what they'll do next.  He can do so because of the power of the NSA's computers, and because of the careful use of statistics.

Anyone who wants to get really specific on the cutting edge of the technology can read about the Electronic Frontier Foundation (EFF) litigation over Room 641A at AT&T's building at 611 Folsom Street, San Francisco. While the lawsuit talks about a lot of computer hardware, like the Narus STA 6400, what the program is really about is the NSA collecting essentially everyone's electronic communications, and analyzing them probabilistically, to anticipate and prevent attacks. 

I strongly suspect those tools are not used by domestic law enforcement, and as near as I can be certain about anything, I believe those tools are never made available to mental health professionals and social workers. 

However, these kinds of tools are no longer just available to the Federal Government. The last ten years have seen these tools proliferate throughout American industry and academia. Charles Duhigg of the New York Times wrote a superb article on the subject this past February, detailing specifically how Target, for example, uses such information -- entirely legally -- to market to pregnant mothers in their first trimester:
"About a year after Pole created his pregnancy-prediction model, a man walked into a Target outside Minneapolis and demanded to see the manager. He was clutching coupons that had been sent to his daughter, and he was angry, according to an employee who participated in the conversation."

“'My daughter got this in the mail!' he said. 'She’s still in high school, and you’re sending her coupons for baby clothes and cribs? Are you trying to encourage her to get pregnant?'”
"The manager didn’t have any idea what the man was talking about. He looked at the mailer. Sure enough, it was addressed to the man’s daughter and contained advertisements for maternity clothing, nursery furniture and pictures of smiling infants. The manager apologized and then called a few days later to apologize again."
"On the phone, though, the father was somewhat abashed. 'I had a talk with my daughter,' he said. 'It turns out there’s been some activities in my house I haven’t been completely aware of. She’s due in August. I owe you an apology.'"
Corporate America knows some of our deepest secrets, without us explicitly telling anyone. The importance of that can easily be overlooked -- the computer system at your local Target has access to incredibly personal information about you that we would never have dreamed of providing to your average social worker or school psychologist.


To understand the President's speech, though, I think you have to understand how difficult it was to explain the situation in Newtown, Conn. to my daughter.

When she asked me why the flags were at half-staff on Saturday, I started by explaining to her that, just like her stomach might feel bad, or her leg hurt, sometimes our heads get sick, too.

She wanted to know why the person wouldn't just go to the doctor if that happened.

I told her that sometimes part of the sickness is that the person thinks they can't ask for help. And that we all depend on one another to ask each other for help when we can't handle something, to keep us all safe.

I then told her that we are all sad because someone got sick like that on Friday.  And that the person decided that the only way they could get better was by hurting themselves, and a whole lot of other people, people who are just like her Mom, her Dad, and her, in a small town just like ours. And that we lowered the flags because we are all so sad.

It was still on my mind that evening, when I sat down with a good friend who's in education.  We were talking about when we can take action based on information we receive -- I explained the oath lawyers take in California, and he talked to me about what it means to be a mandated reporter.  We both reflected on how our options are limited if someone won't ask for (let alone accept) help.

But of course, with modern technology, we don't have to sit around waiting for someone to ask for help.  Your local Target has all the information needed to predict when you're pregnant, and the same types of databases probably light off like a Christmas tree when someone's thinking about taking the kind of action Adam Lanza took.  I suspect it's routinely possible to use the same computer systems to alert local school and social workers when something like Newtown's about to happen -- perhaps far earlier than any of us suspect.


If you think of government using such databases in as ominous terms as Room 641A is described in the EFF litigation, the whole situation probably freaks you out. It's not hard to see why.

Think for a moment -- if the government can just go buy the same information about you as your local Target can -- about who's pregnant -- imagine the use of such information in, oh, say, the abortion context.  Now, there's no invasion of a woman's right to private communications with her doctor -- the government can know a woman's thinking about getting an abortion long before the event occurs, without invading that relationship at all.

Legislators on both sides of the aisle are aware of how politically explosive that technology's use by government could prove. The consequences of tearing down the anonymity veil are, at the very least, unpredictable.  No-one has wanted to walk down that road.

Until now.
"[E]very parent knows there is nothing we will not do to shield our children from harm."

"This is our first task -- caring for our children. It’s our first job. If we don’t get that right, we don’t get anything right. That’s how, as a society, we will be judged."
"And by that measure, can we truly say, as a nation, that we are meeting our obligations? Can we honestly say that we’re doing enough to keep our children -- all of them -- safe from harm? Can we claim, as a nation, that we’re all together there, letting them know that they are loved, and teaching them to love in return? Can we say that we’re truly doing enough to give all the children of this country the chance they deserve to live out their lives in happiness and with purpose?"

"I’ve been reflecting on this the last few days, and if we’re honest with ourselves, the answer is no. We’re not doing enough. And we will have to change."

"Since I’ve been President, this is the fourth time we have come together to comfort a grieving community torn apart by a mass shooting. The fourth time we’ve hugged survivors. The fourth time we’ve consoled the families of victims. And in between, there have been an endless series of deadly shootings across the country, almost daily reports of victims, many of them children, in small towns and big cities all across America -- victims whose -- much of the time, their only fault was being in the wrong place at the wrong time."
"We can’t tolerate this anymore. These tragedies must end. And to end them, we must change. We will be told that the causes of such violence are complex, and that is true. No single law -- no set of laws can eliminate evil from the world, or prevent every senseless act of violence in our society."

"But that can’t be an excuse for inaction. Surely, we can do better than this. If there is even one step we can take to save another child, or another parent, or another town, from the grief that has visited Tucson, and Aurora, and Oak Creek, and Newtown, and communities from Columbine to Blacksburg before that -- then surely we have an obligation to try."
"In the coming weeks, I will use whatever power this office holds to engage my fellow citizens -- from law enforcement to mental health professionals to parents and educators -- in an effort aimed at preventing more tragedies like this. Because what choice do we have? We can’t accept events like this as routine. Are we really prepared to say that we’re powerless in the face of such carnage, that the politics are too hard? Are we prepared to say that such violence visited on our children year after year after year is somehow the price of our freedom?"
The President knows we are not powerless -- we are anything but.  He knows the computer system at the Target located at 7 Stony Hill Road, Bethel, Connecticut, 8.5 miles from Sandy Hook Elementary, had a better idea of what was about to happen than any social worker or educator in the State of Connecticut. And he's had enough of tying the government's hands. 


This could make anyone afraid -- how will such information be used?  Is this Orwell's 1984? The concern must be about how such information will be used, and our fear is that it will be misused.  

I submit to you, though, that we as Americans can create a solution to this problem -- for managing such incredibly difficult problems is what we do.  

I was struck this morning, after watching Disney's Prep & Landing over the weekend (if you haven't seen it, you should, it's hilarious) about how our artists reimagine Santa Claus in the 21st Century.  

Rather than magic, Santa's operation is run by a charming, elven-staffed combination of NASA, the CIA, FedEx, and SOCOM, all rolled into one. And why?  For the kids, of course. These elves coming down the chimney with night vision goggles and sparkle ornaments are hardly fear inducing -- after all, we lay out cookies and milk for them, and carrots for the reindeer. 

So that brings this back to the subject matter of this post. What's behind the door to Room 641A? It's the real world's version of the technology and organization from Prep & Landing. The power of that technology may be misused.  But we as Americans specialize in organizing ourselves to wield that kind of power.  When we think of the power of Santa, our artists imagine the bureaucracy it would take to get something like that done the right way -- because that's what we as Americans happen to be pretty good at. If there isn't a set of agencies, experts, lawyers and officials designed to oversee the application of this technology, of this information, and the use of the probabilities it calculates, there soon will be.  And it will probably work pretty darn well.

It's not that Adam Lanza couldn't get help from anyone. It is that we as a nation refused to examine the data. America will, no doubt, be a much different place when your town advertises, via postcard, about free counseling clinics, and the postcard is sent to only one home. Our President's point, however, is that our freedom does not depend on our government ignoring the very information driving the core of American business.  Innocent lives depend upon us paying attention, and protecting the vulnerable must be our starting point, no matter how serious the consequences will be politically. 

The President, not being specific?  Hardly.

Wednesday, December 5, 2012

Sonoma County vs. Odessa, Texas.

I was thinking this morning about a Press Democrat article from February of this year, which discussed high school athletes who received scholarships from Sonoma County.  The striking element of the story was the number of female student-athletes receiving scholarships to play college soccer.  Fifteen (15) seniors signed letters of intent to play women's soccer at the college level.  By way of contrast, there was only one Division I football scholarship offered, and that was to attend an FCS school, UC Davis.

"High School Girls Soccer Reigns on National Signing Day"
 Howard Senzell, Santa Rosa Press-Democrat, February 1, 2012
Available at 
In Sonoma County, the importance of women's soccer is almost a given for the County's 488,116 residents, so this outcome isn't considered too unusual.  However, if football and soccer scholarships were evenly distributed across the country, you'd expect Sonoma County to generate seven (7) D1 football scholarship players every year, but only 3-4 women's soccer scholarship players.

Sonoma County's lack of D1 football players isn't necessarily that surprising -- the county is relatively remote, and Northern California's football culture is more concentrated in Sacramento and the East Bay.  While there are good athletes here, this isn't a place like San Diego, with great athletes across the board.  Scouts thus come to the area infrequently, so a D1 player on the bubble is less likely to be noticed here, even when that caliber of athlete exists.

The reason the football data is interesting, though, is because it disproves the "San Diego" model.  Sonoma County isn't like Contra Costa County, where the success of Danville and San Ramon in soccer is complemented by the football prowess of De La Salle. If that rule were true, you'd expect almost no women's soccer scholarship athletes to come out of Sonoma County.

Expected County Population1 D1 Football Scholarship.
Instead, of course, the truth is the opposite.  There is something quite unusual about Sonoma County and women's soccer, statistically speaking.  I suspect this is the most uneven distribution in this direction between the two sports in the United States.  To put the distribution in perspective: 
  • A normal county that produces only a single D1 football player is about 117,124 people, or somewhere between the size of Humboldt County (134,623) and Mendocino County (87,553), the 35th and 38th largest counties in California.
  • However, a normal city that produces 15 women's soccer scholarships is about 2,017,830 people, or somewhere between the size of Houston (2,145,146) and Philadelphia (1,536,471), the fourth and fifth largest cities in the U.S.
Of course, you'd kind of expect there to be a women's soccer magnet somewhere.  You know, the equivalent of, say, Monongahela, Pennsylvania. Or, perhaps, Odessa, Texas.

If there is, it looks like it might very well be Sonoma County, California.

Friday, November 16, 2012

25,982 Reasons Why Pedestrian Deaths On 5th St W Are "Statistically Significant."

"Searching for Answers on Fifth Street," Sonoma Index-Tribune
November 16, 2012
available at
Two people have been killed at the same intersection near my home in the last seven years. My city government believes that two deaths in that time period at the same intersection are not statistically significant.  My local paper instinctively senses they're something amiss despite the city's assertion. Guess what, Sonoma Index-Tribune? I think you're right, and pro bono publico, here's what I think the problem is with the city's argument.
"Busy" Intersections in Sonoma.

To set the scene for non-local readers, I live in a relatively small town, Sonoma, California, with about 10,000 people (10,741, according to Google). Per the National Highway Transportation Safety Administration, there are approximately 1.73 pedestrian deaths per 100,000 population in the United States per year.  21.2% of these deaths happen at intersections.  I've looked over a map of Sonoma, and there are a lot of intersections; but I've tried to count only the substantial ones -- I think there are 29 (the list is on the right).  Please note that if I were more conservative, and counted each intersection, it would only make the chances of a second fatality at the same intersection less likely.

Thus, I think the chance of a pedestrian fatality at any given intersection, if the intersections are roughly equally dangerous, in any given year, to be a relatively straightforward application of the multiplication rule -- it's (1.73/100000) * 10,741 * 212/1000 * 1/29, or 1 in 736.  Long odds - you'd have a better chance of drawing a full house in a single draw of the cards at poker.

OK, but what are the chances of getting another pedestrian fatality at the same intersection within seven years? My old statistics book from Berkeley came in handy here -- it's an application of the binomial formula.  The formula is on the right; the binomial function from Excel made calculation pretty straightforward.  The chances of another pedestrian accident happening at the same intersection, if the intersections are equally dangerouswithin seven years, is 1 in 18,518 25,982.  It's not quite as hard as drawing a straight flush, but it's pretty close.
Freedman, Pisani, Purves & Adhikari
"Statistics, Second Edition," p.241.

It's unlikely that Sonoma was so unlucky. Instead, it's more probable that the intersection in question is vastly more dangerous than normal. Indeed, 1 in 25,982 is somewhere between a 4σ and 5σ event; mere "statistical significance" usually requires only 2σ (95%), and anything beyond 3σ is typically "highly significant."

But of course, I am no statistician, and this is all the work of an amateur. The problem is that the City staff aren't either, and I suspect they're even worse at it than me. The City shouldn't be saying something is statistically insignificant without talking to someone who has the education and experience necessary to determine that fact. This isn't a $30,000 study, it's something a grad student from UCB can handle in an afternoon. The City needs to do the work to prove this is merely bad luck, and judging by the staff report, they simply haven't.

Spreadsheet with formulas.
The I-T knows there's an issue here--for instance, they have been raising hue and cry about installing sidewalks in the Boyes Hot Springs area, based on the argument that pedestrians aren't safe (and they're right).  The hard question, though, is whether the I-T, given the economic vise the newspaper industry has been placed in, still has the resources to challenge arguments like those advanced by the City, that in incidents of these types that "the pedestrian or bicyclist was the party most at fault."  Personally, I think the I-T is on the right track, and I say, please keep pushing, because the voters are depending on you to do so, to keep us informed.  And public safety depends upon you making sure our government isn't just hand-waving in response to citizen concerns -- our officials need to do the math to prove their points, and need to show us the results.

Updated 4:55 PM Saturday, November 17:  The odds of two deaths in the same intersection in 7 years were updated to reflect the 21.2% NHTSA figure, rather than 25%.  Further, John Capone, the writer for the Index-Tribune, pointed out in his article that Beatriz Villanueva was killed in the same intersection in 1996.  The chances of three pedestrian fatalities in 17 years occurring at random at the same intersection under the assumptions detailed above is 1 in 597,956. By way of comparison, the chance of drawing a royal flush in a single hand of poker is 1 in 649,739.