The future belongs to the companies and people that turn data into products

Are you looking at your hidden assets, how usage of your data creates new data? Are you mining it and creating newer products? This post speaks to all of us. It also talks about

The future belongs to the companies who figure out how to collect and use data successfully. Google, Amazon, Facebook, and LinkedIn have all tapped into their datastreams and made that the core of their success. They were the vanguard, but newer companies like bit.ly are following their path. Whether it’s mining your personal biology, building maps from the shared experience of millions of travellers, or studying the URLs that people pass to others, the next generation of successful businesses will be built around data. The part of Hal Varian’s quote that nobody remembers says it all: The ability to take data — to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it — that’s going to be a hugely important skill in the next decades.Data is indeed the new Intel Inside.

via What is data science? – O’Reilly Radar.

Data science democratized

Interesting post. I always wondered how people might use all the scatter diagrams and other data visualization graphs that are available in various scientific fields. Here’s a thought from Mac Slocum at O’Reilly radar.

Significant implications emerge when you can bounce a question, even an innocuous one, against a huge storehouse of data. If someone like me can plug questions into a system and have it do the same kind of processing once reserved for a skilled minority, that will inspire me to ask a lot more questions. It’ll inspire a lot of other people to ask questions, too. And some of those questions might even be important.

via Data science democratized – O’Reilly Radar.

My Data, Your Data, Our Data – WSJ.com

In the age of Facebook, Twitter, and Wikipedia, it is hard to believe there is still one group that prefers to be more circumspect about sharing: scientists.

Scientists worry that if they share data before publishing their findings, someone else might claim credit for a discovery they made. And even after they mine information for themselves, they frequently cling to the notion that more may be discovered, and so continue to hoard the data.

“Data is what scientists use to establish their reputation,” says Thomas A. Finholt, a research professor and associate dean for research and innovation at University of Michigan. “There is no incentive for opening up access.”

Now an ambitious project has been launched to try to change this traditional approach. Sage Bionetworks, a nonprofit with offices at the Fred Hutchinson Cancer Research Center in Seattle, is driving an effort to build an open-source collaborative effort it calls Sage Commons, a place where data and disease models can be shared in the hopes of deepening scientists’ understanding of disease biology. To succeed, its founders acknowledge, will require not just data, but a huge cultural shift.

via Sharing Databases for Disease Research – WSJ.com.

It’s All Semantics: Open Data, Linked Data & The Semantic Web

This has been on my mind for a while – connecting the dots between all three of these concepts. I’m working on a more detailed post. I am glad to see that Richard MacManus from RWW just posted an article (It’s All Semantics: Open Data, Linked Data & The Semantic Web) related to this and i think its very useful.

Open data:

data on the site is available to the public, but it doesn’t link to other data sources on the Web. It could be data that has been uploaded in CSV forma

How is it different from Linked data:

Open Data is simply ‘data on the web,’ whereas Linked Data is a ‘web of data.’

So presumably Linked Data is data that links to other data.

Good. So how is linked data different from semantic data? Or are they they the same? From the RWW post – this is pretty good:

Campbell quotes from a number of other articles, in trying to come to a conclusion about how Linked Data and the Semantic Web relate. Perhaps the best definition she found was this one by Paul Walk:

  1. data can be open, while not being linked
  2. data can be linked, while not being open
  3. data which is both open and linked is increasingly viable
  4. the Semantic Web can only function with data which is both open and linked

So point #4 explains the main difference.  Bottom line data wants to be open and linked. When that happens it will enable the development of the Semantic Web.

Good stuff!

Open government data Top 10 visualisations and applications

As the government opens up its data, here are some of the best things that have been done with it by developers so far

via The top 10 government data visualisations and applications | News | guardian.co.uk.

[tweetmeme source=”shiv17674” http://www.URL.com%5D

Open access to government data

I found this on Abhishek’s blog (titled Default Openness)

Here are his thoughts on it. I really like the concept so long as the government takes care of privacy concerns.

During recent Wired’s Disruptive By Design coenfrence chief information officer CIO of US government Vivek Kundra suggested that default data setting of United States government should be open, not secret. With this vision in mind Data.gov will democratize the data that is generated and kept by the US government which has several implications. First of all it will increase public access to high value data, making governance system more transparent, efficient, effective and accountable. Further, it will not only encourage the creative reuse of data outside the government offices, but also makes a way for new ideas, applications and opportunities. Efforts like Sunlight Labs which is trying to build open source technology and tools to facilitate a transparent and accountable governance, are proof of concept for this initiative. Sunlight Labs’s Apps for America 2 challenge is attracting a big pool of creative developers to come up with compelling design and solution that can provide easy access and deep insight for Data.gov data. Data openness is pushing hard towards a big cultural change, a compelling evidence is healthy competition between different departments of US government to make more and more data freely available online. For more information about what Data.gov is all about and what are the immediate benefits, check out this video. May be some one should pop out this kind of default openness in Science as well.

from – http://www.abhishek-tiwari.com/2009/08/default-in-science.html

Here is the video from Wired.com

http://link.brightcove.com/services/player/bcpid1813626064?bctid=26648045001

All about data

In an earlier post (Researchers need data) I referred to Cameron Neylon’s call to build infrastructure and services that could capture the output of any research, i.e the research data.

Add to that, with Google Wave coming out the playing field appears to have changed – to enhance research collaboration, the publishing process and management of research data. Again Cameron does a wonderful job of thinking through the usage scenarios. First is the process of publishing a paper, and the second scenario is the process of adding an interface to the lab record. Very nice read and in many ways it helps me understand the research workflow process at a high level.

So is this the beginning of the end for traditional publishing houses?

I don’t think so but it does mean that publishers need to adapt to this new reality. If they ignore it then they will go the way of the newspapers. The only option is to acknowledge and accept the changing workflow. The changing workflow is, people searching Google for information, people collaborating on Twitter, FriendFeed, Facebook, etc. People soon relying on workflow tools like Wave for additional collaboration and publishing (and i’m sure there will be competing products that will soon arrive).

The option for publishinig houses are to either build their own competing solution OR integrate with existing online tools. My bets are on the former, especially for large publishing houses. Regardless, the question is how much are you going to open up your interface to the rest of the world. What tools are you going to provide your users to be more productive. The traditional model of offering a search engine one top of a repository of documents is not going to cut it. It is really important to understand how your users use the information you have and how they apply it to their workspace.

Recent technology trends will bode well for customers. I have a feeeling this is only the beginning for lots more exciting new things to come.