Difference between revisions of "Develop dataspectsSystem"

From FindAndLearn::Cookbook
Jump to: navigation, search
(Documents (non-repository))
(PULL: mediawiki-workbench)
(4 intermediate revisions by the same user not shown)
Line 19: Line 19:
 
parsoid
 
parsoid
 
dsrepositorycli["<b>dsrepository-cli</b>"]
 
dsrepositorycli["<b>dsrepository-cli</b>"]
mediawikiworkbench["<b>mediawiki-workbench</b>"]
+
dataspecter["<b>dataspecter</b>"]
 
dsdocumentcli["<b>dsdocument-cli</b>"]
 
dsdocumentcli["<b>dsdocument-cli</b>"]
  
Line 43: Line 43:
 
dsrepositorycli-->|write|datastore
 
dsrepositorycli-->|write|datastore
  
mediawikiworkbench-->|read/write|apache
+
dataspecter-->|read/write|apache
mediawikiworkbench-->|write|datastore
+
dataspecter-->|write|datastore
 +
dataspecter-->|write|elasticsearch
  
 
dsdocumentcli-->|write|datastore
 
dsdocumentcli-->|write|datastore
Line 50: Line 51:
  
 
classDef gui fill:lightblue
 
classDef gui fill:lightblue
class kibana,ui,dsrepositorycli,dsdocumentcli,mediawikiworkbench,dataspectsmediawikifeeder gui
+
class kibana,ui,dsrepositorycli,dsdocumentcli,dataspecter,dataspectsmediawikifeeder gui
 
}}
 
}}
  
Line 113: Line 114:
 
# config.yml
 
# config.yml
 
elastic-search:
 
elastic-search:
   host: <nowiki>http://localhost</nowiki>
+
   host: http://localhost
 
   port: 9200
 
   port: 9200
 
   username: elastic
 
   username: elastic
 
   password:  
 
   password:  
 
nats:
 
nats:
   host: <nowiki>http://localhost</nowiki>
+
   host: http://localhost
 
   port: 4222
 
   port: 4222
 
tika:
 
tika:
   host: <nowiki>http://localhost</nowiki>
+
   host: http://localhost
 
   port: 9998
 
   port: 9998
 
   username:
 
   username:
Line 129: Line 130:
 
|-
 
|-
 
|
 
|
 +
 
=== Mariadb ===
 
=== Mariadb ===
 
|
 
|
Line 138: Line 140:
  
 
== Feeding data from ResourceSilos to indices ==
 
== Feeding data from ResourceSilos to indices ==
 +
 +
{{#mermaid:
 +
graph LR
 +
DSR[<b>dataspecter</b><br>QA]
 +
RS[<b>ResourceSilo</b>]
 +
subgraph RELENG domain-specific
 +
DS[<b>datastore</b><br>SIGINT<br>resource-silo-specific<br>domain-agnostic]
 +
IN[<b>indexer</b><br>SIGINT<br>domain-specific]
 +
ES[Elasticsearch]
 +
end
 +
RS-->DS
 +
IN-->ES
 +
DS-->|reindex|IN
 +
DSR-->DS
 +
DSR-->ES
 +
classDef gui fill:lightblue
 +
class RS gui
 +
}}
  
 
=== MediaWiki ===
 
=== MediaWiki ===
Line 146: Line 166:
 
* [[C1317875508]]
 
* [[C1317875508]]
  
==== PULL: mediawiki-workbench ====
+
==== PULL: dataspecter ====
 +
 
 +
https://github.com/dataspects/dataspecter
  
https://github.com/dataspects/mediawiki-workbench
 
 
=== Code Repository (non-git) ===
 
=== Code Repository (non-git) ===
  
Line 181: Line 202:
 
}}
 
}}
  
== Relevance engineering (RELENG) ==
+
== Signals Intelligence (SIGINT) ==
 +
 
 +
== Relevance Engineering (RELENG) ==
  
 
  ui/relevanceEngineering
 
  ui/relevanceEngineering
 +
 +
== Quality Assurance (QA) ==
  
 
== UI design (results display and interaction design (faceting, drilldown)) ==
 
== UI design (results display and interaction design (faceting, drilldown)) ==

Revision as of 11:07, 10 November 2019

Run the stack[edit | edit source]

Standard Services dataspects Services

Mongo[edit | edit source]

Elasticsearch[edit | edit source]

Kibana[edit | edit source]

ELASTICSEARCH_HOSTS=http://127.0.0.1:9200

Tika[edit | edit source]

Redis[edit | edit source]

Nats[edit | edit source]

Ui[edit | edit source]

PORT=3000
REDIS_HOST=<HOSTNAME>
REDIS_PORT=6379
SESSION_SECRET=
MONGODB_URI=mongodb://<HOSTNAME>:27017/<DB_NAME>
SMTP_SERVER=
SMTP_USERNAME=
SMTP_PASSWORD=
FROM_EMAIL=
FROM_NAME=
SITE_URL=https://ui.dataspects.com
OTP_DOMAIN=ui.dataspects.com
API_KEY=
ES_NODE=http://<HOSTNAME>:9200
DOCUMENT_DATA_STORE_API_URL=http://<HOSTNAME>:3003
DOCUMENT_DATA_STORE_API_MASTER_KEY=
APM_SERVER_URL=http://<HOSTNAME>:8200

Datastore[edit | edit source]

https://github.com/dataspects/datastore

go/src/github.com/dataspects/datastore$ go run main.go --c config.yml --p 3001 --d datastore.db
# config.yml
apikey: ""
nats:
  host: localhost
  port: 4222
tika:
  host: localhost
  port: 9998

Indexer[edit | edit source]

https://github.com/dataspects/indexer

go/src/github.com/dataspects/indexer$ go run main.go --c config.yml
# config.yml
elastic-search:
  host: http://localhost
  port: 9200
  username: elastic
  password: 
nats:
  host: http://localhost
  port: 4222
tika:
  host: http://localhost
  port: 9998
  username:
  password:

Mariadb[edit | edit source]

Apache[edit | edit source]

Parsoid[edit | edit source]

Ideally you run all services except UI, datastore and indexer by https://github.com/dataspects/dataspectsSystems.

Feeding data from ResourceSilos to indices[edit | edit source]

MediaWiki[edit | edit source]

PUSH: DataspectsMediaWikiFeeder[edit | edit source]

PULL: dataspecter[edit | edit source]

https://github.com/dataspects/dataspecter

Code Repository (non-git)[edit | edit source]

https://github.com/dataspects/dsrepository-cli

From Repository to Datastore by https://github.com/dataspects/dsrepository-cli

Data fed?
Configuration

Configure the datastore:

<ID>      automatic
<Label>   Shown in information sources list on https://ui.dataspects.com/search
<API Key> automatic
<Regex>   Only file names matching this regex will be fed to the datastore (Regex Tester - Golang)

Configure the feeder:

user@workstation:/yourrepo$ ./dsrepository-cli \
                              --id  <ID>      # From https://ui.dataspects.com/datastores/repositories/code\
                              --url https://datastore.dataspects.com/repositories/code \
                              --key <API Key> # From https://ui.dataspects.com/datastores/repositories/code
Subsequent pipeline

Items fed to the datastore will be fed on to an Indexer.

Currently the CodeIndexer doesn't index the actual file but scans it for (?i)error *(\d{4}:\d{2})(.*), creates an ErrorCode entity named \d{4}:\d{2} with an annotation "OccursInFile".

Tools


Git Repository[edit | edit source]

Documents (non-repository)[edit | edit source]

https://github.com/dataspects/dsdocument-cli

From File system to Datastore by https://github.com/dataspects/dsdocument-cli

Data fed?
Configuration

Configure the datastore:

<Datastore ID>        automatic (e.g. 12)
<Datastore Label>     Shown in information sources list on https://ui.dataspects.com/search
<Datastore API Key>   automatic (e.g. c8b89bc3-0139-11wa-8ef3-8c164563716b)
<Datastore Doc Regex> Only file names matching this regex will be fed to the datastore (Regex Tester - Golang)

Configure and run the feeder to index matching files in and below the current folder:

user@workstation:/yourfolder$ ./dsdocument-cli \
                              --id  <Datastore ID>      # From https://ui.dataspects.com/datastores/files \
                              --url https://datastore.dataspects.com \
                              --key <Datastore API Key> # From https://ui.dataspects.com/datastores/files
Subsequent pipeline
Tools


Signals Intelligence (SIGINT)[edit | edit source]

Relevance Engineering (RELENG)[edit | edit source]

ui/relevanceEngineering

Quality Assurance (QA)[edit | edit source]

UI design (results display and interaction design (faceting, drilldown))[edit | edit source]

  • Helper: uncomment link(rel='stylesheet', href='/css/dataspects-meta.css') in dataspects-ui/views/layout.pug
  • RequestTypes: mainRequest, predicateRequest, entityTypeRequest, actionRequest