Tips and Tricks in a world of Mix

So , after the last post about elasticsearch that explains a bit the terms of the technology , I’m getting to real life problems.

So after I’ve done entering the data into the elasticsearch at the last post I have now to delete it all! Oh my , how did that happened!  What shell I do now ?

 

Well if you are only starting it’s not that bad , just delete the index , which will remove the data and existing mapping as well.

curl –XDELETE “http://localhost:9200/test”

WHY?

One of the demands was to make the data searchable by only few chars , and not the whole word.

So .. ?

Well actually that means that the default indexing that occurred while creating the index is not good enough , we should have defined the index settings manually suggesting from the beginning what kind of analysis should be performed on the index .

 

The nGram allows as to break the data that we enter to small tokens witch we can search later. So if you have

“Jerusalem”

and define nGram min- 7 , max – 20  ==> you”ll get indexed “[Jerusal, Jerusale, Jerusalem]”

Of course more logical is to start with two chars nGram min and go on..

Tried to set the indexing and mapping into one file but it failed with mistake

Analyzer [your_analyzer_name] not found for field [_all]

When I split it ,it worked.

To the last post I added the manual index definition : CreateIndex.js:

{

“index” : {

               “name” : “test”,

               “number_of_shrads”: 1,

               “settings” : {

                                    “analysis”:{

                                                         “filter”: {

                                                                        “your_name_for_nGram_filter”:{

                                                                                             “type” : “nGram”,

                                                                                              “min_gram”:”2”,

                                                                                               “max_gram”:”20”,

                                                                                               “token_chars”: [ “letter”, “digit” , “punctuation”, “symbol”]

                                                                                            }

                                                        }

                                           },

                                        “analyzer”:{

                                                       “your_name_for_index_analyzer”: {

                                                                      “type”:”custom”,

                                                                      “tokenizer” : “whitespace” ,

                                                                      “filter”: [“lowercase”, “asciifolding” , “your_name_for_nGram_filter”]

                                                                          } ,

                                                       “your_name_for_search_analyzer”: {

                                                                      “type”:”custom”,

                                                                      “tokenizer” : “whitespace” ,

                                                                      “filter”: [“lowercase”, “asciifolding” ]

                                                                          }

      }}}}}                                                                               

 

Than you run the curl to enter it :

curl –XPUT “http://localhost:9200/test” –d @c:\pathto\CreateIndex.js

{“acknowledged”:true}

Now we have the index settings right with autocomplete suggestions starting with 2 letters.

 

Now we reenter the mapping and data from the last post , just add some features to the mapping ;

CreateMappings.js :

{

            “mappings”:{

                         “name_of_your_object”:{

                               “_all”: {

                                              “search_analyzer”:“your_name_for_search_analyzer” ,

                                              “index_analyzer”:“your_name_for_index_analyzer” ,

                                    },

                                 “properties”:{

                                                        “field_you_don’t_wont_to_break_to_small_tokens”:{

                                                                      “type”:”string”,

                                                                     “index”:”not_analyzed”

                                                           },

                                                          “always_in_query_field” :{

                                                                          “type”:”string”,

                                                                           “include_in_all”:true

                                                              }

}}}}

          Than you run the curl

curl –XPUT “http://localhost:9200/test/name_of_your_object/_mapping” –d @C:\pathto\createMappings.js

{“acknowledged”:true}

 

Now we’ll enter the actual data as at the last post 

curl –XPOST “http://localhost:9200/test/name_of_your_object/_bulk –data-binary @c:\pathto\formatizedToIndex.json

 

Now you have data with analyzers inside the elasticsearch with autocomplete .. Happy searching!

Yield & readonly

Yield interacts with the foreach-loop. It is a contextual keyword: yield is a keyword only in certain statements. It allows each iteration in a foreach-loop be generated only when needed. In this way it can improve performance.

The nice things about using yield return is that it’s a very quick way of implementing the iterator pattern, so things are evaluated lazly.

Iterator pattern provides a way to traverse (iterate) over a collection of items without detailing the underlying structure of the collection.

 

 

readonly The readonly keyword is a modifier that you can use on fields. When a field declaration includes a readonly modifier, assignments to the fields introduced by the declaration can only occur as part of the declaration or in a constructor in the same class.

If you use a const in dll A and dll B references that const, the value of that const will be compiled into dll B. If you redeploy dll A with a new value for that const, dll B will still be using the original value.

If you use a readonly in dll A and dll B references that readonly, that readonly will always be looked up at runtime. This means if you redeploy dll A with a new value for that readonly, dll B will use that new value.

http://stackoverflow.com/questions/277010/what-are-the-benefits-to-marking-a-field-as-readonly-in-c/312840#312840

http://msdn.microsoft.com/en-us/library/acdd6hb7.aspx

Question:

is it possible to allow more than 1 internal IP to have a similar port forwarded to it?  for example, I have two devices within my LAN that run ubuntu and I’d like to be able to access them both using SSH (port 22).  Is it possible to set up the forward to go to two different LAN IPs?  It was giving me conflict errors

Answer:

I like to use the advanced settings, and still use the DHCP. Under Advanced Setting > IP Address Distribution > Connection List, you would edit the connection for the device that is assigned an IP address. There you select assign static. The only issue I find with that configuration, it does not allow you to change the IP you wish to assign, like you would in a real DHCP server. But it does allow you to set the IP as static based on the MAC address.

In regards to your SSH servers. An incoming port is an incoming port and can only go to or be forwarded to one device. 

Say SSH server 1 is running on 192.186.1.201 you would forward Any to that IP from port 22

Say SSH server 2 is running on 192.186.1.202 you would forward Any to that IP from port 2022

If you are running say WinSCP to access your SSH boxes, or what ever program, you would just use public IP:22 for SSH1, or public IP:2022 for SSH2. This would just allow you to use an alternate port for the second server, thus preventing any conflict. You will need to set your second Ubuntu box SSH server to listen on port 2022. I never used the standard port on my internet connected SSH servers anyway. To much of a security issue. It would indicate exactly what server you are running. Pretty much pick an alternate port you wish. I was running SSH, and used to set it at Windows RDP port 3389 just to confuse people trying to hack the port. Plus that RDP port was one outgoing port my employer was not blocking at the time.;-) Thus I could make a SSH connection over the RDP port and they would have no clue with the 2048 encryption set. Now everything is blocked and filtered. So they blocked all the holes.

To change the SHH port:

Once you have root access open the file /etc/ssh/sshd_config and search for Port, it should show 22 as the default value. Change 22 to any port you want that is not already being used on the system.

If you wish to use a SSH tunnel, for VNC remote desk top. You may not want to use an RDP port address.

Example for router to forward port 2022 to your second server.

http://forums.verizon.com/t5/FiOS-Internet/static-IP-and-port-forwarding-questions/td-p/500165

Elasticsearch

Important comments

Before indexing the data ,create and insert mapping .

DateFormat for elastic – “dd/MM/yyyy HH:mm:ss”    –HH means 24 hours presentation

 

So after creating “test” index repository:

$ curl -XPUT 'http://localhost:9200/test/'

Mapping

1) $curl –XPUT “http://localhost:9200/test/targets/_mapping” –d @path_to/mapping.js

 

*without putting it to file didn’t work

*without quarrels

*without @ upon the path gives

{"error":"NullPointerException[null]","status":500}

*includes :

{

“targets” : {  –name of your object

              “properties” : {   –elastic search saved term

                **fields and its mappings

                 }

          }

}

Bulk Create

2)$curl –XPOST “http://localhost:9200/test/targets/_bulk” –data-binary @path_to/data.json

*put this line before each object serialized to json

{ "create" : { "_index" : "test", "_type" : "type1", "_id" : @some_unique_param } }

 

 

the terminology

Index is a keyword summary of a large content . Index allows to search for a needed content much faster than without it.

Document Parsing  a.k.a.  text processing, text analysis, text mining and content analysis. – when data added it’s processed by the search engine to be made searchable. Scan and process of the data called document parsing. In this process we create a terms = data/words list that has a mapping = reference to the terms , save it all to the disk and keep parts in memory for faster performance.

Lucene, which Elasticsearch and Solr are working with is a full-text search engine because they go through all the text before the indexing process.

Computers has to be programmed to break up text into their distinct elements, such as words and sentences. This process is called tokenization and the different chunks, usually words, that constitute the text are called tokens.

There are many specialized tokenizers, for example CamelCase tokenizer, URL tokenizer, path tokenizer and N-gram tokenizer.

Stop words – sometimes we want to avoid certain words from being indexed. For instance, in many cases it would make no sense to store the words on, for, a, the, us, who etc. in the index.

Relevancy with this kind of handle there is  a fair amount of irrelevant results.There are ways to minimize and partially eliminate them.

 

A Token is the name of a unit that we derive from the tokenizer, and the token therefore depends on the tokenizer. A token is not necessarily a word, but a word is normally a token when dealing with text. When we store the token in the index, it is usually called a term.

Forward Index – store a list of all terms for each document that we are indexing.It’s a fast indexing but not really efficient for querying , because querying requires the search engine to look through all entries in the index for a specific term in order to return all documents containing the term

Document Terms
Grandma’s tomato soup peeled, tomatoes, carrot, basil, leaves, water, salt, stir, and, boil, …
African tomato soup 15, large, tomatoes, baobab, leaves, water, store, in, a, cool, place, …
Good ol’ tomato soup tomato, garlic, water, salt, 400, gram, chicken, fillet, cook, for, 15, minutes, …

 

 

Inverted Index – is an approach where you index by the terms to get list of the relevant documents.Conventional textbook indexing is based on inverted index.

Term Documents
baobaob African tomato soup
basil Grandma’s tomato soup
leaves African tomato soup, Grandma’s tomato soup
salt African tomato soup, Good ol’ tomato soup
tomato African tomato soup, Good ol’ tomato soup, Grandma’s tomato soup
water African tomato soup, Good ol’ tomato soup, Grandma’s tomato soup

Often both the forward and inverted index are used in search engines, where the inverted index is built by sorting the forward index by its terms.

In some search engines the index includes additional information such as frequency of the terms, e.g. how often a term occurs in each document, or the position of the term in each document. The frequency of a term is often used to calculate the relevance of a search result, whereas the position is often used to facilitate searching for phrases in a document.

 

Mapping

A schema is a description of one or more fields that describes the document type and how to handle the different fields of a document.

Indexes, types and documents

// Access whole scope
angular.element(myDomElement).scope();

// Access and change variable in scope
angular.element(myDomElement).scope().myVar = 5;
angular.element(myDomElement).scope().myArray.push(newItem);

// Update page to reflect changed variables
angular.element(myDomElement).scope().$apply();

Or if you’re using jQuery, this does the same thing…

$('#elementId').scope();
$('#elementId').scope().$apply();

Another easy way to access a DOM element from the console (as jm mentioned) is to click on it in the ‘elements’ tab, and it automatically gets stored as $0.

angular.element($0).scope();

New Vs Override

http://msdn2.microsoft.com/en-us/library/435f1dw2.aspx

The new modifier instructs the compiler to use your implementation instead of the base class implementation. Any code that is not referencing your class but the base class will use the base class implementation.

http://msdn2.microsoft.com/en-us/library/ebca9ah3.aspx

The override modifier may be used on virtual methods and must be used on abstract methods. This indicates for the compiler to use the last defined implementation of a method. Even if the method is called on a reference to the base class it will use the implementation overriding it.

http://social.msdn.microsoft.com/Forums/en-US/65e02299-300f-4b74-8f0a-679f490605f5/new-vs-override-?forum=Vsexpressvcs

http://www.mcdonaldland.info/2007/11/28/40/

http://www.cafepress.com/codergear/5033878

Tag Cloud

Follow

Get every new post delivered to your Inbox.