Press release -

Findwise unveils their content processing framework “Hydra” as Open Source

Today Sweden based company Findwise unveils their content processing framework Hydra for search driven solutions. Hydra is released to give clients, the market and the industry the best possible opportunity to serve and meet the needs in today’s growing information landscape. Hydra is now released as open source, free for all to use, develop and benefit from.

Helge Legernes, founder and CTO at Findwise says "I am pleased and proud that a Swedish software company is providing such a leading technology in document processing, that even Silicon Valley is looking towards Sweden. Our goal is that Findwise technology will be the world leader in this field. This goal is the reason why we put out this framework as free open source. It is well suited for enterprise search solutions, and Big Data Applications."

What is Hydra?

When working with free text search, the quality of the data in the index is a key factor on the quality of the results delivered and has a major impact on the information consumption experience. Hydra is designed to give the search solution the tools necessary to modify the data that is to be indexed in an efficient and flexible way. Providing a scalable and efficient pipeline which the documents pass through before being indexed into the search engine does this.

The pipeline is an essential part of any search architecture, tasked with performing invaluable data enrichment tasks. At Findwise we use the pipeline to perform all manner of tasks to power and enhance the search solutions we create. This might be a simple task like extracting headlines in documents, generating thumbnails of the document, or detecting the language they are written in. But it may also be more complex, such as Natural Language Processing tasks adding metadata about which entities are mentioned in a text (Entity Extraction) or determining if a news article is positive or negative in tone (sentiment analysis). Having a good framework that is able to handle this properly is one of the key factors to success.

Hydra is designed to work with any data. No matter if you want to make searchable a folder of unstructured PDF documents, plain text web pages, or highly structured XML data generated by your CMS, Hydra will handle it all. We've also designed Hydra to be able to feed documents to any search engine or database, be they proprietary or open source.

Hydra Technical Details

Hydra revolves around a central data repository where all documents are kept while being processed, implemented with an instance of MongoDB, a document-­oriented database. Connected to this database are the processing nodes, a standalone Java application that runs the processing stages that the user has configured. The configuration is stored and read from MongoDB as well, allowing you to create more processing nodes by simply starting up the framework on a new machine. Hydra is designed to be:

  • Scalable: the central repository as well as the number of worker nodes can scale horizontally with little to no performance loss.
  • Distributed: any processing node can work on any document ­‐ a single document may be processed on any number of physical machines
  • Fail-­safe: if a processing node goes down, this will not affect the documents in the pipeline, which are persisted centrally, and any other node can simply and automatically pick up where the other left off.
  • Robust: all stages run in separate JVMs, thus allowing for instance Tika to crash in a separate JVM, which will be automatically restarted, without stopping the processing pipeline for less problematic documents.
  • Easy to use/configure: stages can be run from your IDE during development, allowing testing against the actual data in the repository.

Related links

Topics

  • Computers, computer technology, software

Categories

  • enterprise search
  • content processing
  • hydra
  • findability
  • findwise
  • search driven solutions
  • framework

Findwise is a growing consultancy founded in 2005 by a team of experts from the enterprise search industry. We currently have offices in Sweden, Denmark, Norway, Poland and the Middle East. Findwise is on the list of Swedish Companies to watch “Gasellföretag” that annually is published by the leading Nordic business daily Dagens Industri.

We create and implement search‐driven Findability solutions for intranets, web, e-­commerce, social networking and applications. Our track record includes implementing hundreds of solutions that have provided easy access to all desired information from one or several points of access, aiding productivity, business processes, knowledge sharing and collaboration. Findability by Findwise is a holistic approach meeting your various information access needs and maximising business value from search technology investments. Findwise is a vendor independent solution expert with extensive knowledge and experience from leading search technology platforms: Autonomy IDOL, Microsoft (SharePoint and FAST Search products), Google GSA, IBM ICA/OmniFind, LucidWorks Enterprise and Apache Lucene/Solr (Open Source) to mention a few.

Our customers include: Ericsson, SKF, Armed Forces Sweden, Tine, Wolters Kluwer, Swedish Parliament, Johnston Press Plc, Vestas, Telenor, Telia, SEB, Securitas, DT Group, Husqvarna. 

Contacts

Related content