These are my notes from today’s Data IO conference
Next Generation Search with Lucene and Solr 4
- near real time indexes (used by Twitter for 500 million new tweets/day)
- can plug in your own scoring model
- flexible index formats
- much improved memory use, regexs are faster, etc
- new autocomplete suggester
Solr (Lucene server – managed by the same team as Lucene)
- if someone chooses the red shirt, do we have large in stock (pivot faceting – anticipating the next question)
- improved geo-spatial (all mexican restaurants within 5 blocks, plus function queries to rank them)
- dstributed/sharded indexing and search
- solr as nosql data store
- recommendation engine (LinkedIn uses Lucene for people recommendations). Recommend content to people who exhibit certain behaviors
- avoid flight delays – one facet – flights out of airports, pivot to destination airports (Ohare to Newark) – origin, destination, carrier, flight, delay times – look at trends over time. Solr has a stats package – you can get averages, max, min, etc
- for local search, how to show only shops that are open? (Yelp also uses Lucene).
You added Zookeeper to your stack, now what?
Old way of system management: active and backup servers, frantically switch to backup when active fails
Common challenges with big distributed system
- Operational complexity
A common deficiency: sequential consistency (handling everything in the “right” order, when data is coming from multiple places)
- Zookeeper is a distributed, consistent data store – strictly ordered access
- Can keep running as long as only a minority of member nodes are lost (usually want to run 3 or 5 nodes)
- all data stored in memory (50,000 ops/sec)
- optimized for read performance (not write); also not optimized for giant pieces of data
- it’s a coordination service
- A leader node is elected by all the members. Leader manages writes (proposes it to followers, they acknowledge it, then it is assigned and written)
- nodes can have data, and can have child nodes
- has “ephemeral nodes” – created when a client connects, destroyed when client disconnects (these do not ever have child nodes)
- watches: clients can be kept informed about data state changes (just lets you know it has changed, but not what it’s changed to – you need to request it again if you want to know the current value)
Zookeeper open-source equivalent of Chubby
- good for discovery services (like DNS)
- Use cases: Storm and HBase, Redis – http://www.slideshare.net/ryanlecompte/handling-redis-failover-with-zookeeper
- Distributed locking
Beware – Zookeeper can be your single point of failure if you don’t have appropriate monitoring and fallbacks in place
Graph Database Use Cases
- nodes connected by relationships
- no tables or rows
- nodes are property containers
- cypher is neo4j’s query language
- also used for content management, access control, insurance risk analysis, geo routing, asset management, bioinformatics
- “what drug will bind to protein X and not interact with drug Y?”
- performance factors: graph size (doesn’t matter), query degree (this is what matters – how many hops), graph density. RDBMS doesn’t scale well with data size, neo4j does
- the more connected the data, the better it fits a graph db,
- NoSQL – 4 categories – key value, column family, document db, graph db
- popular combo is, e.g. mongo for data, neo4j for searching it (then hydrate the search results from mongo)
- optimized for graph traversal, not, e.g., aggregate analysis of all the nodes
- top reasons to use it: problems with RDBMS join performance, evolving data set, domain shape is naturally a graph, open-ended business requirements
- Gartner’s 5 graphs: interest, intent. mobile, payment
I didn’t take notes during those one (a drop of water from the bottom of my glass got under my Mac trackpad, and my mouse was going crazy for a while)
All the data and still not enough?
- No matter how much data you have, it’s never enough or never seems like the right type
- Predictive modeling – will someone default on a loan? Look at data for people who’ve had loans, and who defaulted and didn’t. Use data to make a predictive risk model
- IID = independent and identically distributed
Example IBM sales force optimization
- Can we predict where the opportunities are – which companies have growing IT budgets?
- Couldn’t know what was most important – where were these target companies spending their IT budget (not disclosed)
- Companies who are spending with us are probably representative of similar sized companies in the same market – use the “nearest neighbor” technique
- Compared model prediction to expert salesmen’s opinions, except for 15% of them, where the expert’s put the chances at zero. Why the difference? The model mis-identified some of the companies (no good way to cross-reference millions of internal customer records with independent sources)
Siemens – compter aided detection of breast cancer
- patient IDs ended up predicting odds for cancer. It turns out the ID was a proxy for location (whether they were at a treatment facility or a screening facility)
Display ad auctions – how do we decide who to target?
- multi-armed bandit – exploration vs exploitation
- what do we know? urls you’ve visited
- for something like advertising luxury cars, very few positive examples (people don’t buy them online)
- There is no correlation between ad clicks and purchases
- Better to look at – did the person eventually end up at the company’s home page at some point after seeing the ad?
- target people who the ad can actually influence (i.e. not people who already bought the product, or never will)
- but there’s no way to get data for that
- Counterfactuals – you can’t both show and not show someone and ad, and observe subsequent behavior. You have to either show it or not show it
- Ideally, build a predictive model for those who see the ad, and another model for those who don’t
- But the industry doesn’t do that – it’s all about conversion rate
- Malware on sites generating http requests
- Very difficult for ad auctions systems to detect
- Detect by looking at traffic between sites. Foe example, malware site womenshealthbase generates massive traffic to lots of other sites, not about womens health
- they make money by visiting a site with a real ad auction system. Then bid prices go up because of your traffic, which drives up ad revenue traffic on womenshealthbase
- Auction systems now put visitors from these sites in a penalty box, until they start displaying normal behavior again
What’s new with Apache Mahout?
- Amazon: customers who bought this item also bought this item – best known Mahout example
- Mahout implemented most of the algorithms in the Netflix recommendation contest
- In general, finds similarities between different groupings of things (clustering)
- Key features: classification, clustering, collaborative filtering
- Automatically tags questions on stack overflow
- recommend friends, products, etc
- classify content into groups
- find similar content
- find patterns in behavior
- etc – general solution to machine learning problems
[I’m leaving it most of the details about performance improvements and the roadmap for upcoming refinements – below are other interesting points]
- Often used with Hadoop, but it’s not necessary. Typically included in Hadoop distributions
- Streaming K-means – given centroids (points at the center of a cluster) determine which clusters other points belong in
- References: Mahout in Action (but a bit out of date), Taming Text http://mahout.apache.org
- Topic Modeling He wasn’t sure what the full feature set is – he’s pretty sure it doesn’t generate topic labels for you
Shashin 3.4 is now available for download at wordpress.org. It has great new features for sharing your photos, and for responsive design (so your pictures will look good on any size display). Check out Post to Post Links II error: No post found with slug "shashin-3-development-progress" to see it in action.
Important upgrade notes:
- This version of Shashin comes with a customized version of prettyPhoto. On the Shashin settings page, you will want to pick prettyPhoto as your viewer to take full advantage of the new responsive design and social sharing features.
- If you customized your Shashin stylesheet (shashin.css) in a previous version and placed it in your theme folder, you will need to update it to incorporate the latest changes in the new version.
- Sharing: you can now share a link that will take you directly to any Shashin photo on your site, and automatically open it in prettyPhoto. The sharing links appear below the caption in prettyPhoto.
- Mobile display of slideshows: I’ve customized the version of prettyPhoto that comes with Shashin to improve its display on mobile devices (if you are using Fancybox, I’ve disabled it on mobile displays, as it simply doesn’t work very well on them).
- Two thumbnail design options to choose from: one is almost exactly the same as the design Shashin has always used – showing the captions underneath the thumbnails and putting a border around each thumbnail. The other design gives the thumbnails rounded corners, a slight box shadow, no borders, and puts the captions in an overlay along the bottom of the thumbnails. Long captions are truncated, to prevent them from covering the entire thumbnail. You can specify which design you want on the Shashin settings page.
- Responsive design for thumbnail layouts: Shashin will resize and rearrange thumbnails to best match the available space on the page. The final result for a given layout depends on several factors. Let’s say you want to display 3 thumbnails in a single row. If you enter “3” for the columns in your Shashin shortcode, Shashin will try to give you 3 columns. How many you actually get depends on how big the thumbnails are, and how wide the content area is. If the content area isn’t wide enough to accommodate the thumbnails at their full size, Shashin will scale down the thumbnails to about 90% of their actual size to maintain the 3 column layout. After that, it will let the columns start to “wrap,” so the thumbnails don’t shrink too much (that is, the number of columns will go down). If you gradually narrow and then widen your browser window you can see Shashin re-arranging and scaling its thumbnails on the fly.
Implementing responsive design for Shashin was a real challenge. This is because the traditional tools for responsive design – media queries – are not helpful with Shashin. Since Shashin is a WordPress plugin that needs to work in any theme, I couldn’t make any assumptions about the page layout. So basing layout decisions on the display width of the viewing device or the browser window is not helpful. What I need to know is the current width of the HTML element Shashin happens to appear in, which could be anything. So I implemented Shashin’s responsive design with a mix of CSS and jQuery.
Please post any support questions in the wordpress.org support forum for Shashin, not here.
I’ve been with ElectNext for a little over a year, and this past week was only the third time since I started that everyone in the company was in the same place, and the first time that it was for more than a day. There are currently 7 of us, roughly equally divided between New York, Philadelphia, and San Francisco. So a typical workday entails a good amount of time in Google Hangouts, which is a great tool for keeping a distributed time on the same page. But there are a couple things for which there is no substitute for spending time in person: one is building team relationships (here’s a great article on building distributed Agile teams), and the other is brainstorming around challenging problems. As good as Hangout is, and tools like RealtimeBoard, there’s still no substitute for a team putting their heads together in person around a whiteboard or big easel pad.
We rented a 4 bedroom/7 bed house on the north side of Lake Tahoe, right across the street from the lake. This was a workation, which means we put in at least as much work time as usual. But we also enjoyed our evenings and our surroundings. We each had a turn preparing dinner, and sat down most nights around 8:30 to eat, staying at the table until late into the night. And we took the day off on Friday for a hike up to one of Maggie’s Peaks.
Click the album cover below to see more great pictures!
If you use a mouse, hyperlinks, video conferencing, WYSIWYG word processor, multi-window user interface, shared documents, shared database, documents with images & text, keyword search, instant messaging, synchronous collaboration, asynchronous collaboration — thank Doug Engelbart
That quote is from one of Engelbart’s peers. It’s worth taking a few minutes to read the rest of his post, to learn about Doug Engelbart. Personal computing and the internet would not be what they are if it weren’t for his contributions.
About 14 years ago, when Maria and I worked at Stanford, we had dinner with him and his girlfriend, and another couple. He couldn’t have been more pleasant and down to earth. At the time I knew a bit about his history, but not the full extent of his contributions. And I left that dinner still not knowing – he was a modest man. Dave Crocker is someone who worked with him, and he wrote the following last night, after Engelbart’s daughter shared the news of his passing: “Besides the considerable technical contributions of Doug’s project at SRI, theirs was a group that did much to create the open and collaborative tone of the Internet that we’ve come to consider as automatic and natural, but were unusual in those days.”
But the mild-mannered computer scientist who created the computer mouse, windows-style personal computing, hyperlinking–the clickable links used in the World Wide Web–even e-mail and video conferencing, was ridiculed and shunted aside. For much of his career he was treated as a heretic by the industry titans who ultimately made billions off his inventions…
Engelbart is perhaps the most dramatic example of the valley’s habit of forgetting engineers whose brilliance helped build companies–and entire industries. CEOs fail to mention them in corporate press releases; they never become household names. Yet we use their products, or the fruits of their ideas, every day…
“We were doing this for humanity. It would never occur to us to try and cash in on it. That’s still where Doug’s mind is,” explains Rulifson, director of Sun’s Networking and Security Center…
Engelbart’s unwillingness to bend was in evidence when he met Steve Jobs for the first time in the early 1980s. It was 15 years since Engelbart had invented the computer mouse and other critical components for the personal computer, and Jobs was busy integrating them into his Macintosh.
Apple Computer Inc.’s hot-shot founder touted the Macintosh’s capabilities to Engelbart. But instead of applauding Jobs, who was delivering to the masses Engelbart’s new way to work, the father of personal computing was annoyed. In his opinion, Jobs had missed the most important piece of his vision: networking. Engelbart’s 1968 system introduced the idea of networking personal computer workstations so people could solve problems collaboratively. This was the whole point of the revolution.
“I said, ‘It [the Macintosh] is terribly limited. It has no access to anyone else’s documents, to e-mail, to common repositories of information, “‘ recalls Engelbart. “Steve said, ‘All the computing power you need will be on your desk top.”‘
“I told him, ‘But that’s like having an exotic office without a telephone or door.”‘ Jobs ignored Engelbart. And Engelbart was baffled.
We’d been using electronic mail since 1970 [over the government-backed ARPA network, predecessor to the Internet]. But both Apple and Microsoft Corp. ignored the network. You have to ask ‘Why?”‘ He shrugs his shoulders, a practiced gesture after 30 frustrating years…
Here is a set of highlights from his famous 1968 demo of the systems his team developed, showing early versions of computer software and hardware we now consider commonplace. In the 8th video, he shows their online, collaborative document editing system, which looks like an early version of Google Docs. In the 3rd video, he describes the empirical and evolutionary approach they took to their development process. This was another of his ideas that the industry discarded, only to finally re-discover its value, more than 30 years later, as what’s now called Agile development.
The beta version of Shashin 3.4 is ready. It has a lot of front-end design changes, which means its needs testing in a variety of browsers. So if you’re comfortable installing WordPress plugins manually, please download the beta version from GitHub and give it a try (important note: rename the folder after you unzip it to “shashin”). It especially needs testing in everyone’s favorite browser, Internet Explorer. Note since this isn’t a normal upgrade through the wordpress.org repository, you will need to deactivate and re-activate Shashin to update its settings.
The biggest new feature is responsive design. This was quite a challenge: since Shashin is a plugin that should work with any theme, the thumbnails it displays need to be responsive to their containing element (the <div> containing a post). This means I can’t just rely on media queries, as they require knowledge of the entire theme layout. So the first thing to note is that if your theme isn’t responsive, Shashin won’t be either. If your theme is responsive, Shashin thumbnails will shrink as the available width decreases. I also tried to find a happy medium for honoring the number of columns you specify for displaying your thumbnails. The rule I’m applying is this: if the thumbnails shrink to less than 80% of their intended size, then the columns will “float”, meaning that the number of columns will go down as the page gets narrower. Also, Shashin detects browser resizing, so you can expand and contract the width of your browser to see how Shashin responds.
Captions now overlay the bottom of the thumbnails, instead of appearing below them. A rule I’ve applied to displaying captions is that they will not appear if they would cover more than 30% of the image.
I’ve improved the browsing experience when you are paging through albums that contain a large number of photos. The “previous” and “next” links will scroll you to the top of the next thumbnail set as you page through them. I’ve added the navigation controls to the bottom as well, which several people have requested.
There are various other updates as well. The complete list is in the Change Log in the readme file.
Please use the comments section on this post for any feedback.
After my WordCamp Nashville presentation, I transitioned from talking about how to write clean code, to talking about how the web is transforming the world of journalism, and what it means for civic engagement. This was the topic of the BarCamp NewsInnovation talk two weeks ago in Philadelphia given by Dave Zega and I (we work together at ElectNext). I also presented a longer, more in-depth version at TransparencyCamp in Washington, DC last week, with our CEO, Keya Dannenbaum.
Both conferences were “unconferences,” which means there’s an emphasis on discussion rather than long presentations, and the schedule is determined by the conference participants themselves, on the morning of the conference. However, both had some pre-scheduled talks, including ours.
The TransparencyCamp talk was titled “Civic engagement, local journalism, and open data.” Here’s the summary:
A fundamental purpose of journalism in the United States is to inform citizens, so that they can effectively engage in democratic self-governance. The ongoing disappearance of local newspapers in the digital era is well known, resulting in the decline of traditional watchdog journalism at the local and state levels. There are discussions of “news deserts” and unchecked malfeasance by elected officials. At the same time, we’re seeing the rise of citizen journalists, the growth of organizations that harvest, enhance, and distribute an ever-expanding range of data on government activities, and the creation of new opportunities to share, discuss, and analyze information vital to civic engagement.
For the goals of achieving government transparency and effective self-governance, what has been lost and what has been gained in all these transformations? Is the net effect positive or negative, and what lies ahead? In this talk we’ll lay out the different arguments in this debate, and we’ll engage the audience in the conversation.
I was really impressed by the quality of the audience questions at both conferences, and their engagement with Twitter. Our talk generated over 40 tweets at Transparency Camp. Here are samples from both talks:
@MobileTrevor Result of losing local news is fewer voters, lower civic participation, increased corruption, etc says @mtoppa #TCamp13
@zpez how can you maintain local engagement after an acute issue is resolved? build stronger networks; tap into the ppl w/ the data #TCamp13
@_anna_shaw The ‘digital political baseball cards’ from @ElectNext are pretty darn cool… Gonna be playing around with these later. #TCamp13
@ianfroude Local papers dying, so ‘ppl have gained access to the world (intl/natl papers) but lost access to their backyard’ #TCamp13
@jmikelyons: Politicians know everything about us, we know little about them. The Big Data Divide. Big civic problem #bcni13
@emmacarew #bcni13 impressive: folks at @electnext are working directly with the mayor’s office to makes data not just available but accessible
Transparency Camp was the larger of the two – over 600 people attended. Some traveled quite a distance to be there. In our talk we had questions from people involved in the media from as far away as Poland and Uganda.
Both conferences had a great sense of community. Many of the conversations I heard around me were similar to conversations we have at ElectNext, about how to bring greater transparency to government activities, and making open government data accessible and useful. I also had an unexpected but very welcome encounter: while passing through a crowd I heard a nearby voice say “hey Mike Toppa,” and turned to see a face I hadn’t seen in over 10 years. It was a former co-worker from my time at HighWire Press. He works at the Sunlight Foundation now. It was great to catch up and compare notes on our work. After the conference, I also got to catch up with my old friends Pat and Emma, from my days at Georgetown.
Here are the videos for both talks. If you only have time for one, I recommend the TransparencyCamp talk (the first one below). Below the videos are my summaries of the sessions I attended at Transparency Camp.
Transparency Camp Notes
These are my own brief summaries of the talks I attended. Most sessions had note takers, and their notes are at the TransparencyCamp site.
- Electoral districts API talk: this was an overview of different initiatives out there, and pros and cons of different approaches. If you use maps to determine districts, you can do things like determine a district from a geo-location. But you can’t disambugate things like apartment buildings that are split between districts, which is actually fairly common (often by odd/even apt numbers or by floor). This is called “packing” or “cracking”, depending on the goals of the gerrymandering (to either dilute or concentrate the voting power of a group of voters, and/or aid or hinder turnout efforts). District boundaries can also vary for state rep vs state senator, etc. At a technical level, using maps is easier. Addresses are harder because of the volume of data involved and you can’t rely on geo-location. Google is building up data based on addresses; most others are using maps.
- A new project for city and state level engagement from opengovernment.org: they’re releasing a platform soon for facilitating citizen engagement with city councils, state reps, etc. It includes a petitioning system and lets elected officials register their own accounts, for direct online interaction with constituents. It also allows for entering info on legislation, etc, but isn’t a legislation management system.
- “Municipal Open Gov efforts don’t scale down” – this was a discussion of the challenges of providing open gov in smaller cities, which don’t have the resources of big cities like Philly, Boston, etc. Short version: the only way to make this happen is to provide systems that help solve real city management problems (i.e. transparency for transpareny’s sake isn’t going to happen if it means creating more work for already overworked staff) and give those systems an open api, so openness requires no additional effort.
- Tracking shadow campaign money: this was led by Robert Maguire from OpenSecrets. It was fascinating but depressing: after the Citizens United decision, it’s become almost impossible to track hundreds of millions of dollars in campaign money. He described a complex set of schemes involving phony non-profits and other front organizations where money is moved around repeatedly so it’s hard to track. The FEC and IRS requirements are so minimal now, it’s hard to tell where the money is coming from or how it is spent. But at Open Secrets they are able to give at least some top-level figures through IRS records, but often only a year after the fact. So they can get a rough sense of how much is being spent in total through this new shadow system, but they can’t get many specifics.
Update: here is the wordpress.tv recording of my talk. It spent several months featured on the wordpress.tv homepage:
Spring is conference season, and I’ve given four presentations in the past four weeks: two in Philadelphia, one in Nashville, and one in Washington DC. Each presentation was different, and I did most of the preparation outside of my regular work hours, so I’m looking forward to not doing any more presentations for a while 😉
I already wrote about the first presentation – Knowledge Slam, and a few days after that I headed to Nashville for their 2nd annual WordCamp. I also presented at the first one last year, which was my first time in Nashville. For both trips I was there for only a couple days, but I was able to get out and see some of the city each time, and I have to say it’s a great place. It’s a small, clean city, with very friendly people, and has culture and arts you’d normally find only in a bigger city… as long as you like country music.
My friend Caryn from grad school lives there now, and after I arrived Friday evening, I headed to the Station Inn to meet her, and see a show by Eric Brace and Peter Cooper. I’d never heard of them before, but Caryn was a fan, and after hearing the first song, so was I. Here’s a version of that song – “Ancient History” – that they recorded for Couch by Couchwest:
…If you liked that, I recommend the album.
The WordCamp was great. It had 3 tracks scheduled – one for beginners, one for users, and one for developers (a 4th was actually added on the fly, to accommodate the variety of skill levels in the beginner track). I spent the day in the developers’ track. Something I was excited to see in several of the presentations was a wider focus, showing WordPress as part of a broader ecosystem of development tools, as opposed to being the only tool in a developer’s toolkit. This came across especially in the talk about using WordPress in an enterprise software environment (unfortunately there is no information about this talk online), and Nathaniel Schweinberg’s talk on debugging strategies (many of which apply beyond WordPress).
My Clean Code talk was scheduled between those two, which was perfect, as the 10 techniques I presented are ones which you can apply to any software development project, not just WordPress. My talk went really well, with lots of good questions at the end. We even went over our scheduled time (normally that’s not allowed, but I was right before lunch, so it didn’t take away from anyone else’s speaking time). Here are some of the tweets people made during my talk:
Here are my slides, as well as the recording of my talk I made with my Flip camera (a professionally recorded version
should be available on wordpress.tv sometime in the next few weeks is now on wordpress.tv)
I presented at the Philadelphia Knowledge Slam tonight on job satisfaction and Agile. It was a lot of fun! The hardest part was putting together a coherent presentation that fit within the strict 5 minute limit, with no slides allowed. There were 10 great presentations on a wide variety of topics: the songs of Robins, the latest innovations in genetic treatments for sickle cell disease, screenwriting, cultural myths and personal myths, baking, tips for networking, the mis-measuring of educational achievement, and more.
This was my first time going – Knowledge Slam is held the 3rd Wednesday of every month. Check out the Facebook page for more info.
Short clips of each presenter were recorded. Here’s mine, followed by my complete script.
About 4 years ago I read a book by Malcolm Gladwell called “Outliers: the Story of Success.” Buried in the middle of that book he wrote a few paragraphs that, for me, were the most important part of the story. He described the 3 things that make a job rewarding. The things that make you look forward to a day at work when you get up in the morning.
First is reward for effort – this means money of course, but it also means recognition. We want our boss and our co-workers to let us know we’re doing a good job.
Second is having challenging work – work that isn’t routine and boring, but isn’t so hard that it becomes frustrating. Work that’s in that sweet spot in between, where the work engages your skills and makes you feel that you are learning and growing.
So those first two are pretty straightforward. The third one is the most interesting to me: a rewarding job is one that gives you autonomy. You have a feeling of control over your work, and you feel that your actions and decisions are meaningful. You can make things happen without someone second-guessing you all the time. It’s the opposite of feeling like a cog in a machine.
This struck a chord with me because at the time I wasn’t really happy in my job. I create web sites and web applications for a living. I’ve been doing it since ancient times – the early 1990s – when the first web pages were painted on cave walls in bison blood. And I wasn’t alone in feeling this way. Job satisfaction surveys of Americans show that between half and three quarters of Americans are unhappy in their jobs. If you consider that we spend about half of our waking lives at work, that’s a depressing statistic.
So I decided it was time for a change, and I made a terrible, terrible decision – I went into management. I joined the ranks of the people who are ultimately responsible for all those unhappy workers. I figured, there must be a better way to do this. So I did my homework, and I started learning about this thing called Agile, with a capital A. It’s a way of managing work that originated in the software industry and has been spreading to other types of work. And it’s got a great name, who doesn’t want to be agile?
But I learned it’s more than just a buzzword. Learning and following Agile practices made me fall in love with my work all over again. I would need to talk for at least an hour to explain how it all works, but since I just have a few minutes, I’ll focus on the part that relates to this idea of autonomy. In a lot of workplaces, you have responsibility, and your boss has authority. You don’t have autonomy. Managers talk about being results-oriented, but most are really more focused on control. Since you don’t have autonomy, you may not be motivated to do great work, so you’re given more policies and procedures to follow. The end result is management gets work that meets a consistent but minimal level of quality, and you don’t get a whole lot of job satisfaction. The undercurrent here is a lack of trust.
So how does Agile fix this? First, it gets management’s focus where it should be: on results, not control. And it provides some new ways of measuring progress and results that don’t depend on micro-management. And second, it adjusts peoples’ roles, so you actually have authority over the things you are responsible for. It gives you autonomy. It’s really about training management to get out of the way for the day-to-day work, to foster a learning environment, and to step in only when help is needed. It means treating people like adults, and creating an environment of trust.
And when you have trust, great things can happen. People start working together and pooling their skills to solve problems. This happened recently at General Electric. They had a water heater that was made in China. Here in the US a team of engineers, factory line workers, even sales and marketing people, all got together and completely redesigned it. By pooling their skills and experience they came up with a new design that was so much less expensive to manufacture, GE moved the manufacturing for the water heater back to the US, creating jobs here, and lowered its retail price by $300.
At the end of the day, its not policies and procedures that get the credit for good work and great products, it’s enthusiastic and empowered people.
In his recent post The Dire State of WordPress, James Shakespeare predicted doom for WordPress if it doesn’t undergo radical architectural change. Henri Bergius followed up with a similar argument in Why WordPress needs to get Decoupled. I appreciate where they are coming from: I can say without hyperbole that Bob Martin’s book Clean Code changed my life (I even had him sign it for me), but as a WordPress developer, I can also say the approach they’re taking is counter-productive.
Bergius advocates rewriting WordPress into multiple, separate software components, and Shakespeare calls for “…a fundamental rethink of the entire platform. We’re talking thousands of man-hours for no direct financial reward… But this doesn’t mean it couldn’t or shouldn’t be attempted.”
There are three big problems with thinking about it this way:
- Chasing the “grand rewrite in the sky” is typically a recipe for disaster. Being aware of this is a tenant of the Agile approach, where “the boy scout rule” is preferred (leave the code better than you found it, by making continuous incremental improvements). Here’s Bob Martin describing what happens when you have a team still maintaining the old code on a project (since you can’t just abruptly abandon it), and a “tiger team” working at the same time on the grand re-write:
Now the two teams are in a race. The tiger team must build a new system that does everything that the old system does. Not only that, they have to keep up with the changes that are continuously being made to the old system. Management will not replace the old system until the new system can do everything that the old system does. This race can go on for a very long time. I’ve seen it take 10 years. And by the time it’s done, the original members of the tiger team are long gone, and the current members are demanding that the new system be redesigned because it’s such a mess.
- Check out Yoast’s infographic on WordPress’ popularity: 72.4 million sites running WordPress, and 20-25% of all new websites are using WordPress. There are over 19,000 plugins in the wordpress.org repository and over 1,700 themes (with probably thousands more not in the repository). With that kind of user base, backwards compatibility and stability take precedence over everything else. Therefore, architectural change must be gradual, low-risk, and backwards compatible.
- When you’re not actively involved with WordPress, and you’re telling people who live and breathe WordPress development that what they’re doing is awful, it’s hard for that to not come across as antagonistic, and therefore be unproductive. There is a huge community of developers and designers who make their living coding for WordPress, and really enjoy it. While feedback and guidance from outside the community can and should be welcome, one-off prognostications of doom at some vague point in the future are likely to be met with a shrug.
Having said all that, there are some things they’re right about, and there are a couple practical steps forward I’d like to recommend:
Bergius observes that he doesn’t see WordPress developers at non-WordPress conferences, and relates a similar observation from Symfony activist Lukas Smith. This fits my experience as well. I see a very real gap between the WordPress development community and the mainstream object-oriented coding community. The WordPress community is big enough that it can get away with being insular – probably every major city in the US (not to mention the rest of the world) has a WordPress meetup group and an annual WordCamp. I’ve spoken at several WordCamps, and a common reaction to my talks is that people are intrigued, and sometimes mystified, that my talks are about various aspects of how to shoe-horn object oriented design into WordPress plugins. I also attend the Philadelphia ETE conference every year, which is the annual tech event in Philadelphia, and I’ve attended various Agile conferences in other parts of the country. Developers in Java, Ruby, C++, JavScript, Scala – you name it – they’re all there, attending each others’ sessions and learning from each other. I’m a gregarious guy, and I have yet to meet another WordPress developer at any of them.
One of the reasons WordPress is so successful is that it’s such a great platform for hacking. You don’t need a degree in computer science or years of experience to get started with it, and once you learn some basic techniques and WordPress best practices, you can be productive and successful with it. So even if a huge rewrite was a practical idea, it would be a really bad idea to introduce an architecture with many layers and abstractions, as it would make the barriers to entry too high (and if someone wants to use a full fledged web development framework, then they can use a framework – WordPress doesn’t need to become one).
However, I strongly feel that the WordPress architecture needs to become more open to modern object-oriented design practices. In my previous position at the University of Pennsylvania, I hired many developers, and interviewed some who primarily had WordPress experience, and it did not prepare them well for working outside of WordPress. They had acquired habits they needed to un-learn. I’m talking about routinely using global variables; using classes only as giant buckets of vaguely related functions; using constructors for more than just initialization; etc. As far as I know, these are design practices you really don’t see anymore in well-established development environments, except WordPress.
The WordPress architecture forces awkward design choices for developers trying to take an object oriented approach. It drives them to use Singletons to deal with all the global state (a choice I can understand but disagree with). It requires a laborious setup to do automated testing (and even after you set it all up, you still can’t do unit testing – you’re doing integration testing). In general, it forces a style of development that is very particular to WordPress, as opposed to an architecture independent style that can be commonly applied to other modern development environments.
The complaint that learning object oriented design is too hard or not worth the trouble doesn’t make sense to me. WordPress developers learn all kinds of arcane and complex practices to be successful with it. Why not instead make WordPress more compatible with mainstream development practices? Then learning those practices will allow you to be successful with more than just WordPress.
I understand how and why WordPress evolved the way it did, but I haven’t yet heard a compelling reason why the approach taken to its architecture should forever remain apart from the practices of the mainstream development community. Backwards compatibility can be provided while gradually shifting the organization of the core code.
The key step that needs to be taken is providing better encapsulation. Here’s a simple example in PHP for moving away from globals but still providing backwards compatibility (more than just that would be needed to getting a handle on all of WordPress’ globals, but with some planning it should be feasible). For hook and filter functions, they could become thin wrappers for instance methods (through the use of an injection container you can avoid unneccessary re-instantiation of objects). Doing these two things would provide backwards compatibility (so nothing breaks, and hackers can still hack), and at the same time would open the door to decoupling, real unit testing, and object-oriented design (without needing to argue about Singletons).
The WordPress community is big and vibrant. Philly WordCamp 2011 was the first WordCamp I attended, and I was thrilled by all the energy and enthusiasm. But from what I can see, WordPress has isolated itself, both as a community and as a web development environment. I don’t see a benefit for anyone in perpetuating this situation, and I see lots of potential benefits for WordPress joining the mainstream development community.
I’m speaking at WordCamp Nashville in a few weeks, where I’ll touch on some of these topics, and I’m happy to discuss this further with anyone interested.
I’ve added handling for unpublished posts to Post to Post Links II – see the new “Handling Unpublished Posts” section of Post to Post Links II error: No post found with slug "post-to-post-links-wordpress-plugin".
I’ve updated Post to Post Links II error: No post found with slug "extensible-html-editor-buttons-wordpress-plugin" with a couple bug fixes, and I’ve removed its dependency on my Post to Post Links II error: No post found with slug "toppa-plugin-libraries-for-wordpress" plugin. I will be removing this dependency for all my plugins. Having a plugin for re-usable code seemed like a good idea, but supporting it has proven difficult. I’ve learned the hard way that managing dependencies between plugins is a fragile process in WordPress, and has caused frustration for users. So going forward my plugins will be self-contained.
Next up is a Post to Post Links II error: No post found with slug "shashin-wordpress-plugin" update. PrettyPhoto is a fantastic photo viewer, and now that it’s GPL compliant, I’m integrating it with Shashin.
To reiterate what I’ve said before, I’m using the wordpress.org support forums now to provide support. Please post any questions there: