- Moving up the stack, Sears is consolidating its databases to MySQL, InfoBright, and Teradata--EMC Greenplum, Microsoft SQL Server, and Oracle (including four Exadata boxes) are on their way out, Shelley says.
- "The Holy Grail in data warehousing has always been to have all your data in one place so you can do big models on large data sets, but that hasn't been feasible either economically or in terms of technical capabilities," Shelley says, noting that Sears previously kept data anywhere from 90 days to two years. "With Hadoop we can keep everything, which is crucial because we don't want to archive or delete meaningful data."
- "ETL is an antiquated technique, and for large companies it's inefficient and wasteful because you create multiple copies of data," he says. "Everybody used ETL because they couldn't put everything in one place, but that has changed with Hadoop, and now we copy data, as a matter of principle, only when we absolutely have to copy."
- Shelley sees Hadoop as part of a larger IT ecosystem, too, and says systems such as Teradata will continue to have an important, focused role at Sears. But he's on the far end of the spectrum in terms of how much of the legacy environment Hadoop might replace. Countering Shelley's sometimes sweeping predictions of legacy system replacement, Mike Olson, CEO of Cloudera says: "It's unlikely that a brand-new entrant to the market [like Hadoop] is going to displace tools for established workloads”.
- MetaScale also offers data architecture, modeling, and management services and consulting. The big idea behind Hadoop is to bring in as much data as possible while keeping data structures simple. "People want to overcomplicate things by representing data and dividing things up into separate files," says Scott LaCosse, director of data management at Sears and MetaScale. "The object is not to save space, it's to eliminate joins, denormalize the data, and put it all in one big file where you can analyze it." It's an approach that's counterintuitive for a SQL veteran, so a big part of MetaScale's work is to help customers change their thinking: You apply schema as you pull data out to use it, rather than take the relational database approach of imposing a schema on data before it's loaded onto the platform. Hadoop holds data in its raw form, giving users the flexibility to combine and examine the data in many ways over time.
Thursday, November 15, 2012
Why Sears Is Going All-In On Hadoop is an interesting, if ‘rose coloured’ view of Hadoop from Phil Shelley, CTO at Sears. Note that he also leads a Sears subsidiary called MetaScale – which is offering Big Data architecture, consulting & services to companies outside the retail space.
A few choice quotes:
I was having a coffee with a friend last week and the conversation turned to the latest trends in technology – as it often does. His view was that ‘Big Data’ was just another in a long line of over-hyped technologies, aimed more at selling the shiniest new product than solving some real-world problem.
I think that the Big Data term is really a shorthand way of describing the escalating amount of data being generated by the actions of people and their devices as they interact with each other and the world at large. Every time we use a web-site, smartphone or other electronic service data is created and collected – to understand our behaviour, predict what we’d like to buy or where we’ll go, or perhaps show a relevant advertisement.
An even larger amount of data is beginning to be created by the ‘internet of things’ – a term used to describe the invisible devices and sensors all around us in our vehicles and transport systems, communications and power grids which collect and report on the health of these environments. For example, engines in the latest commercial aircraft capture a large volume of performance data and report any abnormal operation in real-time via satellite links. Current car models can already report back if they are involved in a crash or require roadside assistance, collecting engine and performance data can’t be too far in the future.
As the cost of collecting and storing this data continues to drop, it doesn’t take too much imagination to see the value in being able to analyse more fine-grained data on power consumption, real-time traffic, when and where we buy products, or whatever we can imagine being sensed and measured. Having this newly available data can lead to discovery of previously unknown patterns of behaviour or relationships – telling us about a new artist, restaurant or author, a nearby bargain or a group who share our passion.
So, even if you think Big Data is just an over-hyped buzzword, a tremendous and ever-growing variety and volume of data is being created by our use of web-sites devices and sensors. I don’t think this trend is likely to slow down in the foreseeable future, as ever more of our interactions move into the digital realm.
The era of Big Data is with us, no matter what we call it.
Monday, July 11, 2011
Nicholas Carr's Blog: Semidelinkification, Shirky-style:
But I did manage to read a sizable chunk of it before clicking the Instapaper 'Read Later' button (a terrific way to avoid reading long stuff without having to feel guilty about it). It was a solid piece, as you'd expect from Shirky, if marred a bit by an unappealing new-media elitism (apparently the great unwashed never made it past the sports pages). But what interests me at the moment is not the content of Shirky's post but its form, particularly the form of its linkage.
Saturday, September 04, 2010
Sunday, May 16, 2010
Friday, May 14, 2010
Saturday, May 01, 2010
Wednesday, April 28, 2010
Monday, April 26, 2010
Sunday, April 25, 2010
Sunday, April 11, 2010
Saturday, February 13, 2010
Dave Kellog, CEO of Mark Logic says the real question for FAST customers is: “what next?”
Microsoft / Fast Drops Linux and Unix Support. Should You Turn to MarkLogic As A Replacement? | Kellblog
Saturday, February 06, 2010
FAST ESP on non-Windows platforms was doomed from the moment Microsoft acquired FAST. The blog entry Microsoft Enterprise Search Blog : Innovation on Linux and UNIX confirms it:
Five years 'mainstream' support for FAST ESP 5.3 and then five more years of 'extended' support.
With our 2010 products scheduled for release in a few months, we’ve just started to plan for our next wave of products. As a part of that planning process, we have decided that in order to deliver more innovation per release in the future, the 2010 products will be the last to include a search core that runs on Linux and UNIX.
Sunday, January 31, 2010
The new UK Government ICT Strategy:
The need to transform public services and to fully exploit ICT to achieve this is accelerating. To meet increasing demand within this complex technology arena, the UK public sector has built an ICT infrastructure that in many instances duplicates solutions across different areas of Government. The ICT strategy will ensure that the infrastructure will go through a process of standardisation and simplification based on the premise of a common infrastructure designed to enable local delivery suited to local needs. Delivery will increasingly be through partnerships between the public, private and third sectors and the strategy enables greater interoperability to underpin this model. The strategy applies to all of the UK Public Sector, whether Central Government, Local Government, Wider Public Sector or Devolved Administrations. It provides a common approach to ICT that maintains local accountability and control over implementation to meet unique delivery and business requirements."
There are fourteen strands to the strategy:
- The Public Sector Network
- The Government Cloud (G-Cloud)
- Data Centres
- Government Applications Store (G-AS)
- Shared Services
- Desktop Services
- Architecture and Standards
- Open Source, Open Standards, Reuse
- Greening Government ICT
- Information Security & Assurance
- Professionalising IT enabled change
- Reliable Project Delivery
- Supply Management
- International Alignment