donderdag 3 augustus 2017

(Elastic) Searching?

No matter how much effort you put into bringing your information together in a repository, it remains dead and worthless unless your users can find what they are looking for.

By default Content Server comes with a number of options to search the content. 3 options to be exact: metadata only, which basically means just do regular SQL queries, Database Fulltext, the older indexing in oracle DB and OracleText.
Most people will use OracleText when fulltext searching is needed.

There are a number of other search engines around and I have started to take an interest in the NoSQL engines that are freely available on the web.

First one in Apache Lucene. You can get it for free and it does what is says it does: index content.

Second one, and the one that has the most buzz at the moment is ElasticSearch. ES is largely based on Lucene, both it offers a lot of additional features. ES is free to download and use, but it's the add-ons that will cost you, like the x-pack that will allow to implement security in your index.

Now I know that there are a number of integration's out there that already offer ES for WCC but as far as I know most are based on a crawler principle.

I started a project for myself to make my own integration. I wanted to hook ES into WCC the same way as OracleText. So basically the indexing of the document is part of the release cycle, and OTS is not used at all. In the config.cfg file SearchIndexerName will be set to ELASTIC. Searching is done from within Content Server, and it is Content Server that remains in control of the security on the documents.

Second point I wanted to achieve, is make sure that I could take advantage of the clustering capabilities of ElasticSearch. So I started of to create an integration that uses the Elastic TransportClient API.

There are 2 sides to this approach:
- it surely is very flexible, while being completely invisible to the users
but
- queries get executed on different servers via network connections, so there could be a possible overhead there.....

I am interested to see the feedback of the community.
  • What is your take on this? Sound like a good approach? 
  • Also, would it be worth while to have the possibility to combine fulltext with special fields like the DocTags fields for the Siebel integration?
  • Would you be interested in using this?
 Let me know....

Geen opmerkingen:

Een reactie posten