I've decided to start a new side project. Data Mining has been an intriguing subject that I've been interested in for some time now. I wrote an automated eBay parser a while back that would alert me to new auctions via IM every hour or so. That project was fun, but violated so many of eBay's terms of service that I no longer use it.
I would like to come up with some kind of automated system for data retrieval and am in the initial phases of design/prototyping now. The design needs to be flexible enough to adapt to changing conditions encountered during the data gathering session, and also will need to be able to be controlled dynamically as my criteria change over time. Perhaps I'll integrate JBoss Rules (formerly known as Drools) into this, as a means of applying rules to the data gathering process. I'll also integrate this with our XMPP service, so I can get updates on my phone as well as controll the process from anywhere.
I'm not quite sure what I want the end product to do exactly, but I know what I don't want it to be: a Google replacement. I'm not looking to index the web, more like a "smart search" that is continuous and updates me in real time (or close to it anyway) as it discovers new data.
Anyway, I hope to have something useful in the end. If you have any comments or suggestions, drop me an email.